Introduction to Bioinformatics

FINAL - Fall Quarter 2006

Due date: Friday, November 17, 2006; 5 pm (Late submissions will not be accepted)

Please email your final to final@bioinformaticscourses.com with the subject line "IntroBioinform FA2006 final". It is important that your email include your full name somewhere.

Introduction

The February 17, 2004 release of new structures to the PDB included several structures of the Src kinase SH2 domain co-crystallized with different inhibitors. These structures illustrate amongst other things the reasons for high redundancy in the PDB archive. Scientifically, they demonstrate that low affinity fragments of high affinity inhibitors show the same specificity requirements. This is an important fact for inhibitor design strategies, like fragment co-crystallization, which involves the soaking of crystal in solutions of various small molecules. It is also important knowledge to have for the design of docking applications for example. The following questions revolve around the protein kinase Src and related family members.

Final Questions

  1. What are the PDB ids for the 27 structures?(Hint: they are consecutive alphanumerically)
  2. What other domains can you find in the human Src kinase protein? Please provide a search result from a protein family database search. (Hint: you can but don't have to get the protein sequence at SwissProt through a link on one of the structure's pages at the PDB)
  3. The Src family of tyrosine kinases contains 8 closely related sequences. All have entries with human sequences available in SwissProt. Using BLAST at NCBI, find those 8 sequences by using just the kinase domain of Src and report the SwissProt entry names and unique ids.
  4. Supply a dendrogram file based on a multiple sequence alignment of these 8 sequences. What are the two groups within the Src family?
  5. The gene of the Src family protein kinase LCK has a specific role in the development of T-cells and patients or mice with defective LCK function exhibit a severe immune disorder. What is the chromosomal region the LCK gene is found on in humans and what is the immune disorder called?
  6. Genbank contains a "reference gene sequence" for LCK. What is its accession number and does GENSCAN correctly find all exons in it? Make sure you submit your sequence in the right format! The most common mistake with this question is that students use mRNA sequences instead of genomic sequences. Please make sure you understand the difference. Your answer should contain the PDF file generated by GENSCAN and a list of the exons labeled as true positives, false negatives, and false positives.