HIV Protease (4HVP)


Benton Wong 

Introduction to Structural Bioinformatics 

Spring 2003

       





I.    General Information

The human immunodeficiency virus (HIV) is formed from long polyprotein chains during viral replication.  An HIV protease is an enzyme that cleaves the peptide bonds (at specific points on these long polyproteins) into smaller pieces to create the active protein components that will become a mature virus. The HIV protease is not present in mammalian cells.  It is unique because it can cleave between a phenylalanine and tyrosine or proline (which no human enzyme is capable of performing).

The HIV-1 protease is an enzyme with two symmetrical subunits (identical polypeptides of 99 amino each chain having an N-terminal Pro and C-terminal Phe, 1 alpha helix and 9 beta strands). The active site is located where the two subunits meet. The active site cleft is "covered" by flaps that extend from Met-46 to Lys-55, as shown in figure 1d. The HIV-1 protease is an aspartic acid protease; aspartate 25 plays a key role in binding the substrate.  

By blocking the action of the specific enzyme, this prevents the polyprotein from being cleaved, resulting in immature, noninfective virions being produced. 4HVP explores how an inhibitor can induce this action on the enzyme.  4HVP is a synthetic HIV-1 protease coupled with a substrate-based inhibitor (a ligand).  The asymmetric inhibitor lies in a single orientation and makes extensive interactions between the two subunits of the homodimeric protein. This liganded enzyme underwent substantial changes (compared to the unliganded enzyme) especially in the region corresponding to the flaps, where backbone movements as large as 7Å were observed.  This inhibiting action can serve as a potential AIDS therapuetic.

II.    Images
   





   

                                    (a)


                                                                    
                                     (b)
Figure 1. (a-b) RASMOL generated images of a HIV Protease (4HVP) from different angles - domains 4HVP:A is indicated in red, 4HVP:B is indicated in blue, turns are in white, the helices are in orange, the ligand is in green.




Figure 2. The flaps (red) that cover the active site.
                                                                                                                       

 














Figure 3. The location of the aspartate 25 (red) in relation to the protein.



Figure 4. 
The location of the active sites (green) with respect to the ligand (yellow) and residues that interact with the ligand (red).

















Figure 5. Ligand: ACE-THR-ILE-NLE-NLE-GLN-ARG-NH2



Figure 6. Residues (red) on chains 4HVP:A and 4HVP:B that interact with the ligand.

• Images of the domains in 4HVP


                            (a)


                                        (b)

Figure 7. a) 4HVP:A (from CATH) b) 4HVP:A (generated from RASMOL) - The red area is a helix, the blue area are turns, and the grey areas are sheets.



                            (a)



                                    (b)

Figure 8. a) 4HVP:B (from CATH) b) 4HVP:B (generated from RASMOL) - The red area is a helix, the blue area are turns, and the grey areas are sheets.

III.    Structural Classifications


• Structural Classification of Proteins (SCOP) -
SCOP Classifications for 4HVP

The SCOP database is based on evolutionary relationships generated by visual comparisons and inspections of automated structure alignments.  Protein in this database are classified and placed into particular groups within the SCOP hierarchy to reflect both the structural and evolutionary relatedness.   In the SCOP hierarchy, the principal levels are: Class (based on secondary element compositions), Fold (based on common core structures), Superfamily (share common strucutre and function), and Family (share clear common evolutionary origin).

Class
All beta proteins
Fold
Acid proteases
barrel, closed; n=6, S=10, complex topology
Superfamily
Acid proteases
Family
Retroviral protease (retropepsin)
dimer of identical mono-domain chains, each containing (6,10) barrel

• CATH - CATH Classifications for 4HVP (2.40.70.10)

CATH is  hierarchical domain classification of protien structures which groups proteins on four levels: Class (C), Architecture(A), Topology(T), and Homologous superfamily (H).  For Class, proteins are grouped by their secondary structure composition and packing.  Architecture provides a description of the orientation of the secondary strucutre regardless of connectivity,  Topology groups structures based on their secondary structure orientation and connectivity.  Homologous superfamilies groups proteins based on structure, sequence, or function.

Class, C-level
Mainly Beta
Architecture, A-level
Barrel
Topology (Fold Family)
Cathepsin D, subunit A, domain 1
Homologous Superfamily, H-level
Acid Proteases


*Within members of the same family and superfamily for both of these classification systems, there is some conservation of function as indicated by their functional annotations; close structural neighbors from these family and superfamily tend to be some form of enzyme; in some cases, certain key functional residues are conserved in structural neighbors (see Clustal W multiple sequence alignment below).

*There are analogues for the HIV-1 protease; for more information, please refer to:
M Baca & S Kent. Protein backbone engineering through total chemical synthesis: new insight into the mechanism of HIV-1 protease catalysis. Tetrahedron 56 (2000) 9503-9513.




IV. Experimental Details
          
Resolution [Å]:
2.30
R-Factor:
0.176
Space Group:
P 21 21 21
            
Unit Cell Dimensions:


dim [Å]:
a 51.70
b 59.20
c 62.45
angles[°]:
alpha 90.00
beta 90.00
gamma 90.00

Experimental Notes:

Experimental Method - X-Ray Diffraction

Structural Refinement - Initial refinement was performed by program X-PLOR.  Final refinement  was performed by the program PROLSQ.  There were 7813 reflections in the resolution range 10 to 2.25 Å
.

Reference:

Miller M, Schneider J, Sathyanarayana BK, Toth MV, Marshall GR, Clawson L, Selk L, Kent SB, Wlodawer A. Structure of complex of synthetic HIV-1 protease with a substrate-based inhibitor at 2.3 A resolution. Science 1989 Dec 1 ;246(4934): 1149-52.


V. Secondary Structure Assignments

The DSSP method is commonly used for secondary structure calculations.  Two sources are list below: Sequence Details and PDB report, along with their associated abbreviations for the DSSP codes.

Secondary structure as reported with in the Sequence Details (red):
S = Bend, E = Extended, Beta Strand, T = Turn, H = Helix


Secondary structure as reported in the PDB report (green): S = Strand, T = Turn, H = Helix, blank = coil

4HVP:A
1	 	       PQITLWQRPL VTIRIGGQLK EALLDTGADD TVLEEMNLPG KWKPKMIGGI 
Sequence Details SSS E EEEE SS EE EEEE TT SS EE S SS EEEEEE S
PDB Report TTT  S SSSS TT SS SSS TT TT T TT SSSSSSST

51 GGFIKVRQYD QIPVEIXGHK AIGTVLVGPT PVNIIGRNLL TQIGXTLNF
Sequence Details S EEEEEEE SEEEEETTEE EEE EEEES SS EE HHHH HHHT

PDB Report TSSSSSSS T T TT TT HHHH HHH ????





4HVP:B

1 PQITLWQRPL VTIRIGGQLK EALLDTGADD TVLEEMNLPG KWKPKMIGGI
Sequence Details SSS E EEEEETTEEE EEEE TT SS EEES S EEEEEEET
PDB Report TTT S SSSSSTTSSS SSS TT TT T T SSSSSSST

51 GGFIKVRQYD QIPVEIXGHK AIGTVLVGPT PVNIIGRNLL TQIGXTLNF
Sequence Detail S EEEEEEE SEEEEETTEE EEE EEEES SS EE HHHH HHHT
PDB Report
TSSSSSSS T T TT HHHH TT ????





VI. Structural Neighbors

A. Structural Neighbors of 4HVP
CATH
FSSP
VAST
CE1
1fivA0

Hydrolase (Acid Proteinase)
1fmb
eiav protease
Mutant
1kzkB
Je-2147-Hiv Protease Complex
1f7a
Pol Polyprotein
1idaA0

Hydrolase (Acid Proteinase)
2rspB
Rous sarcoma 
virus protease
(RSV PR)

1idaA
Human Immunodeficiency Virus Type 2 (Hiv-2) Protease Complexed With The Inhibitor Bila 1906 Containing The Hydroxyethylamine Dipeptide Isostere
1b6j
Retropepsin
2rspA0

Hydrolase (Aspartyl Proteinase)
1smrA
Renin complex 
with the inhibitor ch-66

1b11A
Structure Of Feline Immunodeficiency Virus Protease Complexed With Tl-3-093.
1b6n
Retropepsin
3psg01

Hydrolase (Acid Proteinase Zymogen)
1eagA
aspartic proteinase

1fmb
Eiav Protease Complexed With The Inhibitor Hby-793
1b6l
Retropepsin
1eagA1

Hydrolase (Aspartic Proteinase)
2er7E
Endothia 
aspartic
proteinase

1baiA
Structural Basis For Specificity Of Retroviral Proteases
1b6o
Retropepsin
1fmb00

Hydrolase (Acid Proteinase)
1fknA
memapsin 2 fragment 
(beta-secretase)
peptide inhibitor

2rpmA
Rmp-Pepstatin A Complex
1ohr
Aspartylprotease
Chain
1empE2

Hydrolase (Acid Proteinase)
2rmpA
mucoropepsin

1m4hA
Crystal Structure Of Beta-Secretase Complexed With Inhibitor Om00-3
1b6m
Retropepsin
1mpp02

Hydrolase (Acid Proteinase)
1lywA
cathepsin d

1gvu
Endothiapepsin Complex With H189
1b6p
Retropepsin
2apr02

Hydrolase (Aspartic Protease)
1e32A
p97 fragment

1j71
Structure Of The Extracellular Aspartic Proteinase From Candida Tropicalis Yeast..
1b6k
Retropepsin
3psg02

Hydrolase (Acid Proteinase Zymogen)
1cz4A
vcp-like atpase
 fragment

1lyw
Cathepsin D At Ph 7.5
3thl
Protease

1eagA2

Hydrolase (Aspartic Protease)
1aw8B
l-aspartate-
alpha-decarboxylase
 biological_unit

1rnf
X-Ray Crystal Structure Of Unliganded Human Ribonuclease 4
1g2k
Protease Retropepsin
1smeA2

Aspartyl Protease
1qcsA
n-ethylmaleimide 
sensitive factor
 (nsf-n) fragment
1rbd
Ribonuclease S (E.C.3.1.27.5) Mutant With Met 13 Replaced By Alpha-Amino-Normal-Butyric Acid (M13aba)
1fg8
Protease Retropepsin
1qdmA2

Hydrolase
1cr5A
sec18p 
(residues 22 - 210)
fragment

1l1nB
Poliovirus 3c Proteinase
1ffo
Protease
Retropepsin
1pfzA2

Aspartic Protease Zymogen
2napA
periplasmic 
nitrate reductase
biological_unit

1dylA
9 Angstrom Resolution Cryo-Em Reconstruction Structure Of Semliki Forest Virus (Sfv) And Fitting Of The Capsid Protein Structure In The Em Density
1fej
Proteas
Retropepsin
1fknA1

Hydrolase
1kgfA
formate dehydrogenase, 
nitrate-inducible, major subunit

1c2dA
Recruiting Zinc To Mediate Potent, Specific Inhibition Of Serine Proteas

1daz
Peptide
Inhibitor
1fknA2

Hydrolase
1dmr
dmso 
reductase
biological_unit

1gio
Nmr Solution Structure Of Bovine Angiogenin
1fgc
Protease
Retropepsin
1lywA0

Aspartic Protease



1lyaB0

Lysosomal Aspartic Protease




NOTE: The structural neighbors in bold are cross referenced in at least one of the other database.

1.    Using the criteria, Z-Score > 4.5, RMSD < 2.0Å, and Sequence identity > 70.00% and excluding HIV protease sequences, the list of structural neighbors was generated by CE. A structural alignment of these proteins is shown below in figure 9.




Figure 9.  Structural alignment of several structural neighbors of 4HVP found using CE.

B. Multiple Sequence Alignment of Select Structural Neighbors via Clustal W

• Alignment Scores
  Alignment Scores of 4HVP vs. Select Structural Neighbors
1EAG
1FMB
1FKN
1LYW
4HVP
21
30
12
4






Figure 10. Multiple alignment sequence output of some of the above structures generated by Clustal W. Comparing 4HVP to the other structural neighbors, the conserved residues in 4HVP are Leu5, Asp25 (plays a key role in binding a substrate), Thr26, and Gly27. Residues in 4HVP that have a strong structural conservation are Ile5, Ala22, Leu23, Leu24, Val32 (interacts with the ligand), Leu32, Leu33, Met36, Asn37, Leu38, Ile 62, Val76, Val82 (interacts with the ligand), Leu89 and Leu90 (helix).


C. Comparison Between the Structural and Sequence Relationships

As seen from the multiple alignment sequence, structure is more conserved than sequence.  From an evolutionary perspective, key residues such as Asp25 (an important active site) do not change as to conserve function.  It is not surprising that this residue is present in structural neighbors from the same category.  However, a majority of the sequence alignments come from conserved positions which have structural importance.  Residues do not necessarily match in these positions (as a result of evolution).