Molecular modelling of peptides and protein-ligand complexes using knowledge-based potentials

 
Principal Investigator :  Debasisa Mohanty

Ph D Students
Gitanjali Yadav
Mohd Zeeshan Ansari
Pankaj Kamra
Narendra Kumar

Collaborator
Rajesh S Gokhale

The main theme of the research project is to understand the structural principles that govern binding of various ligands to proteins and folding of peptides/proteins to stable conformations, and use these structural principles for developing computational approaches for structure prediction of peptides/proteins and protein-ligand complexes. The objective is to investigate, whether knowledge-based methods can be used for predicting the (1) substrate specificity of proteins involved in biosynthesis of polyketides and peptide antibiotics, (2) bound conformation of peptides in MHC-peptide complexes and ranking of peptides as per their binding free energy and (3) structures for short peptides and folds for proteins of unknown function in various genomes.

A.    Substrate specificity of polyketide synthases (PKSs) and non-ribosomal peptide synthetases (NRPSs)

Polyketide synthases (PKSs)

The robustness of our computational protocol for identification of PKS domains and prediction of their starter and extender specificity, has been demonstrated in the analysis of various type I iterative clusters and also additional modular PKS clusters whose sequences were available after the development of our computational approach. Comparative analysis of the sequences of modular and iterative PKSs has been carried out to identify patterns which can be used to predict whether a PKS cluster is modular or iterative. Distinction between modular and iterative PKS and prediction of the number of iterative condensations carried out by a PKS module, are essential for identifying biosynthetic products of uncharacterized PKS clusters. This comparative analysis has revealed that modular and iterative PKSs differ in terms of linker length in the minimal module. There are also distinct differences in sequences of domains that lead to separate clustering of modular and iterative PKSs in a multiple alignment. In order to quantify these differences, separate profile Hidden Markov Models (HMMs) have been built for KS domains of modular and iterative PKSs using a training set of sequences. Benchmarking of these profile HMMs on a test set of 106 iterative and 260 modular domains indicate that, iterative or modular characteristic of a PKS protein can be predicted with an accuracy of 97%. The HMM analysis also led to the identification of 11 positions, which have distinctly different residue occurrence patterns in the sequence of the modular and iterative KS domains. Analysis of the structural models of modular and iterative KS domains revealed that majority of these 11 positions occur in the vicinity of the putaive active site. These results indicate that differences between modular and iterative KS domains may originate from the differences in the catalytic pockets of these enzymes. The structural models of different iterarive KS domains is being analyzed in details to investigate possible correlation between active site geomtery and number of iterations.

Chalcone Synthase (CHS)

Putative active site residues have been identified for all CHS like proteins with known substrates using comparative modelling approach. Active site geometry in these homology models have been analyzed in detail to find out residues which control selection of starter units and cyclization of the polyketide chain. However, based on these modelling studies, it was not possible to explain how certain bacterial CHS make products with long aliphatic chains. Since the M.tuberculosis KASIII has a fold similar to CHS and is known to accept myristoyl CoA, these CHS like proteins from M.tuberculosis were modelled using the M.tuberculosis KASIII structure as template. Analysis of the substrate binding sites in this new model could explain the experimentally observed substrate specificity of pks18 protein from M.tuberculosis.

Nonribosomal peptide synthetases (NRPSs)

A novel computational method has been developed for correct identification of NRPS domains in a query sequence and predicting their catalytic activity or substrate specificity. This method uses a knowledge based approach. Sequences of a diverse set of C, A, T, M and TE domains have been obtained from threading analysis of a training set of 22 experimentally characterized NRPS and hybrid NRPS/PKS clusters. These curated sequences of various NRPS domains have been catalogued in NRPSDB, a database of nonribosomal peptide synthetases. For identification of a given NRPS domain in a polypeptide sequence, all the available sequences of that domain in NRPSDB are pair wise aligned with the query sequence. From the set of overlapping alignments, the alignment having lowest E-value and length above a predefined cut-off is chosen as the best match with the query. This computational method can also discriminate between condensation (C), cyclization (Cy) and epimerization (E) domains using information from Cy and E domains in the training set. This is achieved by aligning the query domain with the various C, Cy and E domains present in NRPSDB and identifying whether the closest match to the query domain is a condensation, cyclization or epimerization domain. Prediction of Cy domain is further confirmed by searching for the conserved DxxxxD motif. This computational protocol, SEARCHNRPS also identifies the putative active site residues of A and C domains present in the query from their alignment with structural templates. By comparing these putative active site residues with the active sites of A domains of known specificity in NRPSDB, SEARCHNRPS attempts to predict the specificity of the A domain in the query. High prediction accuracy of SEARCHNRPS has been demonstrated by benchmarking on a test set of NRPS clusters. Sequence analysis of condensation domains from the training and test set has given interesting information about the substrate specificity of this domain. These domains form two major clusters as per the chirality of the donor peptidyl group. Within each of these two groups, further clustering is also observed depending upon the chirality of the acceptor peptidyl group.

Acyl CoA synthetases

Acyl CoA synthetases belong to a superfamily of acyl adenylate/thioester forming enzymes, which share a conserved AMP binding domain. All members of this superfamily, catalyze the formation of an adenylate intermediate with the carboxyl moiety of the respective substrate, which is then esterified with either CoA (Coumarate CoA ligases, acetyl CoA ligases, fatty acyl CoA synthetases) or enzyme bound 4’phosphopantethiene (peptide synthetases) or oxidized by molecular oxygen (luciferases). Many of these proteins are present as domains in PKS/NRPS clusters or as ORFs adjacent to PKS and NRPS clusters and play a crucial role in biosynthesis of PKS/NRPS products. Because of the high degree of sequence divergence between members of the superfamily, prediction of their functional subfamily and substrate specificity has been a difficult task. However, the availability of crystal structures of several members of ACS family, has established that they adopt similar three dimensional structure despite very low sequence identity. Using the available crystal structures as templates, putative active site residues have been identified for all known members of this superfamily. The correlation between active site residue pattern and known functional specificity has been analyzed in detail to derive specificity code for various functional subfamilies. This information can be used for in silico prediction of functional subfamily and substrate specificity of uncharacterized ACS like proteins. Molecular modelling of various substrates in the putative active sites of these ACS proteins, are also being carried out to understand the structural basis of diverse substrate specificity.

B.    MHC-peptide interactions

The structure based approach for identification of MHC binding peptides is being used to analyze complexes between class II MHC molecules and their peptide ligands.

Publications

Original peer-reviewed articles

1.     *Yadav G, Gokhale RS and Mohanty D (2003) SEARCHPKS: A program for detection and analysis of polyketide synthase domains. Nucl Acids Res 31:3654-3658 (*in press last year, since published).

2.     Ansari MZ, Yadav G, Gokhale RS and Mohanty D (2004) NRPS-PKS: A knowledge-based resource for analysis of NRPS/PKS megasynthases. Nucl Acids Res (in press).

3.     Saxena P, Yadav G, Mohanty D and Gokhale RS (2003) A new family of type III polyketide synthases in Mycobacterium tuberculosis. J Biol Chem 278:44780-44790.

4.     Trivedi OA, Arora P, Sridharan V, Tickoo R, Mohanty D and Gokhale RS (2004) Enzymic activation and transfer of fatty acids as acyl-adenylates in mycobacteria. Nature (in press).