Contents

  1. Introduction to Protein Analysis
  2. Obtain a sequence of interest.
  3. Identify ORF's and translate into protein
  4. Identify Similar Proteins from the Databases
  5. Align your sequence vs similar sequences and look for Gene Families
  6. Determine the putative function of your protein
  7. Determine the putative structure of your protein
  8. Protein Structure Visualization Tools
  9. Other Interesting Things You Can Do With Proteins

VII. Determine the putative structure of your protein

  1. Fold and secondary structure prediction
    1. PredictProtein Server
      http://www.embl-heidelberg.de/Services/sander/predictprotein/
      http://dodo.cpmc.columbia.edu/predictprotein/ USA mirror
      Example
      • There are many programs, mostly running on neural nets, that are used to predict the protein structure
        Figure 1
      • The default is to run through them all sequentially. The order is as follows:
        • Blast (for fast database search vs SWISSPROT)
        • Maxhom (for multiple sequence alignment of similar sequences identified by BLAST)
        • ProSite (scanning for functional motifs) reported only if hit found
        • SEG (detection of composition-biased regions) reported only if more than 10 residues of low-complexity found
        • ProDom (scanning for the putative domain structure for your protein) reported only if hit found
        • Coils (prediction of coiled-coil regions) reported only if hit found
        • PHDsec (prediction of secondary structure)
        • PHDacc (prediction of solvent accessibility)
        • PHDhtm (prediction of transmembrane helices and their topology) reported only if hit found
          • NOTE: By default, the threshold for what is considered to be a membrane helix is rather restrictive. This has two consequences:

            • Almost no false positives (proteins identified to contain membrane helices that do actually NOT contain membrane helices),

            • Some membrane proteins may be missed
      • The results are returned via email or ftp
    2. Meta PP
      http://dodo.cpmc.columbia.edu/predictprotein/submit_meta.html
      • You submit your sequence once (paste into WWW form), and you get results from currently more than 10 different programs via email. The following services are available at the moment:
      • Miscellaneous services
        • SignalP - prediction of presence and location of signal peptide cleavage sites in amino acid sequences from different organisms
        • NetOglyc - prediction of mucin type GalNAc O-glycosylation sites in mammalian proteins.
        • NetPicoRNA - predictions of cleavage sites of picornaviral proteases.
        • ChloroP - predictions of whether or not a protein contains an N-terminal chloroplast transit peptide, cTP, and of probable sites for cleavage of the transit peptide.
      • Secondary structure prediction
        • JPRED - consensus method for protein secondary structure prediction.
      • Transmembrane helices
        • TMHMM - prediction the location of transmembrane helices and their topology.
        • TOPPRED - prediction of location and orientation of transmembrane helices.
        • DAS - prediction of location of transmembrane helices.
      • Threading (remote homology search)
        • FRSVR - prediction-based threading, also incorporating purely sequence-based database searches.
        • SAMT98 - hidden Markov model method (SAM-T98) for finding remote homologs of protein sequences.
      • Homology modeling
        • SWISS-MODEL - prediction of 3D structure by homology modeling (automated server).
        • CPHmodels - prediction of 3D structure by homology modeling through a collection of methods and databases developed to predict protein structures.
  2. Transmembrane region prediction
    1. TMHMM
      http://www.cbs.dtu.dk/services/TMHMM-1.0/
    2. TOPPRED
      http://www.biokemi.su.se/~server/toppred2/
  3. Coiled coil region prediction
    1. MultiCoil
      http://gaiberg.wi.mit.edu/cgi-bin/multicoil.pl
      The MultiCoil program predicts the location of coiled-coil regions in amino acid sequences and classifies the predictions as dimeric or trimeric. The method is based on the PairCoil algorithm.
  4. Tertiary structure prediction by homology modeling
    1. Swiss-Model - Automated Protein Modeling Server
      http://www.expasy.ch/swissmod/SWISS-MODEL.html
      • A free service that generates a PDB coordinate file of your protein sequence of interest
      • Methods of operation
        • Finds all proteins with similarities to your protein and that have a known structure
        • Select as templates those matches with > 25 % identity and a projected model size of > 20 residues. This also allows detection of domains which may be modeled separately.
        • Creates input files
        • Generates model
        • Performs energy minimization
        • Returns results to you as an email attachment
      • Accessing Swiss Model 3D
        • Use the 'First approach' mode
        • Fill in required information (name, email, title)
        • Provide sequence or accession number (SWISSPROT only?)
        • You may alter the P(N) or E value
        • You may provide your own templates to use for alignment purposes
        • Result options
          • Normal - PDB model coordinates and log file
          • Swiss PdbViewer - PDB model coordinates and log file as a Swiss PdbViewer project file
          • Short - PDB coordinates only
        • Fold Recognition
          • You have the option of having your sequence forwarded to another server for fold prediction
      • Viewing results
        • Swiss PdbViewer
        • RasMol
        • MAGE
    2. ModBase - Database of comparative protein structure models
      http://pipe.rockefeller.edu/modbase/
      • Currently, the database contains approximately 15,000 reliable models for substantial segments of approximately 4,000 proteins in the genomes of Saccharomyces cerevisiae, Mycoplasma genitalium, Methanococcus jannaschii, Caenorhabditis elegans, and Escherichia coli.
      • The database also contains the alignments on which the models were based and model evaluations.
      • Database is searchable by keyword or sequence alignment
      • The models are derived by ModPipe, an automated modeling pipeline relying on the programs PSI-BLAST and MODELLER.
  5. Align your protein structure vs other protein structures
    1. DALI Server - Automated Protein Structure Alignment
      http://www.ebi.ac.uk/dali/
      • Compares protein structure in 3D
      • Submit coordinate files
        • Must include at least the CA coordinates
      • Multiple alignment of structural neighbors is emailed back to you
        • A 3D structural overlay