Contents
- Introduction to Protein Analysis
- Obtain a sequence of interest.
- Identify ORF's and translate into protein
- Identify Similar Proteins from the Databases
- Align your sequence vs similar sequences and look for Gene Families
- Determine the putative function of your protein
- Determine the putative structure of your protein
- Protein Structure Visualization Tools
- Other Interesting Things You Can Do With Proteins
I. Introduction to Protein Analysis
There are many things that you can do to analyze proteins on the WWW. Those things often can follow an orderly progression, from identifying an open reading frame, obtaining the translation, to looking within that translation for motifs and structure. Here we will follow the progression of protein discovery using the tools available on the Internet.
II. Obtain a sequence of interest.
If you do not already have a sequence of interest, you can find one through our old friend, BLAST at NCBI, or through a protein sequence search at ENTREZ.
- BLAST:
http://ncbi.nlm.nih.gov/
You should already be familiar with the basic function of BLAST. You can identify proteins of interest by searching a nucleotide string against GenBank using BLASTX or TBLASTX. This will return protein sequences that are identical or similar to the translation product of your gene of interest. These sequences can then be copied and used as queries for further studies.
- Entrez
http://www.ncbi.nlm.nih.gov/Entrez/
The Entrez interface lets you search for a protein sequence or a 3D molecular structure using instead of a specific sequence, things like the name, ID#, author name, etc. This is a quick entry point for people who want to investigate known proteins or structures.
III. Identify ORF's and translate into protein
If you already have a DNA sequence of interest, you need to be able to translate it so that you can perform further analysis. There are many tools available in addition to the ones provided here
- Predictive methods using nucleotide sequences
- Simple translation - no introns
Example
- ExPASy Translation Tool (at Swiss Institute of Bioinformatics)
http://www.expasy.ch/tools/dna.html
- NCBI ORF Finder
http://www.ncbi.nlm.nih.gov/gorf/gorf.html
- Complex methods - Predict promoter, splice sites, translation initiation, termination sites
Example
- GENSCAN
http://genes.mit.edu/GENSCAN.html
- Predicts complete gene structures, including exons, introns, promoter and poly-adenylation signals, in genomic sequences.
- Allows for partial genes as well as complete genes and for the occurrence of multiple genes in a single sequence, on either or both DNA strands.
- The program is based on a probabilistic model of gene structure/compositional properties and does not make use of protein sequence homology information.
- The text output of the program is a list of one or more (or possibly zero) predicted genes together with the corresponding peptide sequences. The graphical output is a diagram of the locations of the predicted exons
- Sequences up to 200 kilobases (kb) in length may be submitted to the web server.
- BCM SearchLauncher
http://dot.imgen.bcm.tmc.edu:933l/
- Predictive methods using protein sequences
- Protein identity based on composition
Example
- AACompIdent
http://www.expasy.ch/tools/aacomp/
- Allows the identification of a protein from its amino acid composition
- Searches the SWISS-PROT and / or TrEMBL databases for proteins, whose amino acid compositions are closest to the amino acid composition given.
- AACompSim
http://www.expasy.ch/tools/aacsim/
- Allows the comparison of the amino acid composition of a SWISS-PROT entry with all other SWISS-PROT entries so as to find the proteins whose amino acid compositions are closest to that of the selected entry
- TagIdent
http://www.expasy.ch/tools/tagident.html
- Creates a list of proteins close to a given pI and Mw
- Allows the identification of proteins by matching a short sequence tag of up to 6 amino acids against proteins in the SWISS-PROT/TrEMBL databases close to a given pI and Mw
- Allows the identification of proteins by their mass, if this mass has been determined by mass spectrometric techniques for one or more species and with an optional keyword
- Physical properties based on sequence
- compute pI/MW
http://www.expasy.ch/tools/pi_tool.html
- Allows the computation of the theoretical pI (isoelectric point) and Mw (molecular weight) for a list of SWISS-PROT and/or TrEMBL entries or for a user entered sequence
- peptideMass
http://www.expasy.ch/tools/peptide-mass.html
- Cleaves one or more protein sequences from the SWISS-PROT and/or TrEMBL databases or a user-entered protein sequence with a chosen enzyme
- Computes the masses of the generated peptides
- Also returns theoretical isoelectric point and mass values for the proteins of interest
- If desired, PeptideMass can return the mass of peptides known to carry posttranslational modifications, and can highlight peptides whose masses may be affected by database conflicts, isoforms or splicing variants.
- SAPS
http://www.isrec.isb-sib.ch/software/SAPS_form.html
- Evaluates by statistical criteria a wide variety of protein sequence properties
- The output is organized in the following sections:
- File name
- Sequence printout
- Compositional analysis
- Charge distributional analysis (charge clusters; high scoring (un)charged segments; charge runs and patterns)
- Distribution of other amino acid types (high scoring hydrophobic and transmembrane segments; cysteine spacings)
- Repetitive structures (in the amino acid alphabet and in a 11-letter reduced alphabet)
- Multiplets (counts, spacings, and clusters in the amino acid and charge alphabets)
- Periodicity analysis
- Spacing analysis
|
|