Molecular modelling of peptides and protein-ligand complexes using knowledge-based potentials

 
Principal Investigator :  Debasisa Mohanty

Project Associates / Assistants
Vishal Jain

Ph D Students
Gitanjali Yadav
Mohd Zeeshan Ansari
Pankaj Kamra

Collaborators
Rajesh S Gokhale

The main theme of the research project is to understand the structural principles that govern binding of various ligands to proteins and folding of peptides/proteins to stable conformations, and use these structural principles for developing computational approaches for structure prediction of peptides/proteins and protein-ligand complexes. The specific objective of the project is to investigate whether knowledge-based potentials i.e. scoring functions obtained from analysis of structural features in databases of known protein structures can be used for predicting the (1) substrate specificity of proteins involved in biosynthesis of polyketides and peptide antibiotics, (2) bound conformation of peptides in MHC-peptide complexes and ranking of peptides as per their binding free energy and (3) structures for short peptides and folds for proteins of unknown function in various genomes.

A.    Substrate specificity of polyketide synthases (PKSs) and non-ribosomal peptide synthases (NRPSs)

Modular polyketide synthases (PKSs)

Since the analysis of the active site residues in various AT domains indicated a strong correlation between the pattern of active site residues and their substrate specificities, molecular modelling studies of the substrates in the active sites of AT domains were carried out to understand the stereochemical basis for the observed correlation. Modelling of a malonate group in the active site of AT domain indicated that, except Phe-200, all other residues which are in contact with the substrate are conserved in both malonate and methylmalonate specific AT domains. The other residues which show a high degree of substrate specific variation are relatively away from the substrate, while Phe-200 is in contact with the methylene group of the malonate moiety. This indicates that, even though both positions 200 and 93 show strong correlation with substrate specificity, residue at position 200 would control substrate specificity to a larger extent than the residue at position 93. A change from malonate to methylmalonate requires addition of a methyl group at one of the methylene protons either in R or in S configuration. Addition of a methyl group in R configuration results unfavourable steric clashes between this methyl carbon and the Cb of Phe-200/Ser-200 as well as His-201 which is a conserved residue in all AT domains. This immediately explains the structural basis of the chiral selectivity of acyltransferase domains for 2-S methylmalonyl-CoA instead of 2-R methylmalonyl-CoA. On the other hand, the steric incompatibility of Phe-200 with methylmalonate in S configuration explains the observation that all AT domains that have Phe-200 accept only malonate. In contrast, all the methylmalonyl CoA accepting AT domains have a F200S mutation, where a bulky Phe at position 200 changes to relatively small Ser residue, thus avoiding potential steric clash by making additional space for the methyl group in the active site cavity. These results are in qualitative agreement with the experimental results from site directed mutagenesis studies reported in the literature. These results also indicate that using molecular modelling approach, it is possible to design AT domains specific for substrates other than malonate and methylmalonate.

The computational protocol for prediction of domain organization and substrate specificity of modular PKS cluster was used to identify potential modular PKS clusters in several microbial genomes. The program could successfully detect all the annotated PKS clusters, predict their domain organization and also substrate specificities for various AT domains. It may be noted that, these domain organization and substrate specificities of uncharacterized PKS clusters are bonafide blind predictions by our computational approach. Even though the results from 19 characterized PKS clusters indicate the prediction accuracy of our computational method to be high, these predictions can actually be tested only after the availability of experimental results on these PKS clusters. An intriguing observation in the Bacillus subtilis genome was the identification of PKS-like clusters with an unusual domain organization. These modules lacked the core acyltransferase domain, which is an essential part of a minimal polyketide module. No putative AT domain could be identified in these amino acid stretches even after detailed sequence analyses. Lack of AT domains in B. subtilis PKSX cluster has been reported earlier. However, the present systematic search for identification of potential PKS domains in Bacillus subtilis genome indicated that, AT domains were present separately, as independent ORFs, adjacent to the proteins containing these unusual domains. Such domain architecture has features of both type I and type II PKS. It would be interesting to explore whether this arrangement offers a new dimension to the metabolic diversity of these complex enzyme systems and how these extra-modular AT domains may be exercising substrate selection.

Chalcone Synthase (CHS)

Docking of various substrates in the active site of the structural models of different plant and bacterial CHS-like proteins indicated that there is a correlation between active site cavity volume and the final product of a given CHS. This could explain the structural basis of the specificity for resveratrol, benzylacetone, 2-pyrone and THN. However, based on these modelling studies, it is not possible to explain how certain bacterial CHS make products with long aliphatic chains. Since the M. tuberculosis KASIII has a fold similar to CHS and is known to accept myristoyl CoA, comparative modelling work is in progress to explore if structural models of bacterial CHSs have substrate binding sites similar to M. tuberculosis KAS III.

Nonribosomal peptide synthetases (NRPSs)

Nonribosomal peptides are synthesized in many bacteria and fungi by large multifunctional proteins called nonribosomal peptide synthetases (NRPSs). NRPSs have an organization of domains and modules similar to modular PKSs and synthesize nonribosomal peptides using assembly-line enzymology. A unique feature of NRPS system is the ability to synthesize peptides incorporating proteinogenic as well as non-proteinogenic amino acids and the products include many important antibiotics, immuno suppressants, veterinary agents and agrochemicals. Availability of the sequence information for a large number of NRPS clusters having experimentally characterized products, presents an opportunity to develop knowledge based in silico methods for understanding the role of individual domains and inter domain interactions in controlling substrate the specificity of various NRPS proteins.

Since correct identification of the various catalytic domains present in a NRPS protein is essential for understanding sequence to product relationship in NRPS system, systematic analysis of domain organization has been carried out for 40 NRPS clusters. Due to the relatively large sequence variability in the protein families corresponding to various NRPS domains, domain identification could not be carried out using pair BLAST and single templates. However, the availability of representative structural folds for majority of NRPS domains permitted correct identification of domain boundaries by threading methods. Using the domain boundaries predicted by threading, the sequences of various types of inter domain linker regions were extracted. It was interesting to note that unlike the PKS system, various inter domain linkers in the NRPS proteins are in fact very short stretches of amino acids ranging from 10 to 20 amino acids. These short lengths of the linker regions are likely to impose a spatial constraint on the relative orientation of the domains in a module. Attempt is being made to obtain structural models for NRPS modules using linker length as a constraint in the docking simulations of various domains in a module. Analysis of inter domain linker sequences have also indicated that, there is a six residue overlap between the C-terminus of the condensation (C) and N-terminus of the adenylation (A) domains. This can be attributed to the fact that the structural templates for C and A domains used in threading correspond to a stand alone C domain and an extreme N-terminus A domain, thus exact boundaries between these two domains can not be determined unambiguously. However, this overlapping region has conserved sequence features and has similar helical structure in both the structures. Work is in progress to investigate if these features can be used to obtain a structural model of a complete NRPS module. Since adenylation domains are responsible for selection of amino acids during biosynthesis of NRPS products, structure based analysis of the active site residues of various A domains with known specificity and attempt is being made to understand the structural basis of the observed correlation between the substrate and active site residue pattern.

Acyl CoA synthetases

Structural modelling of acyl CoA synthetase like proteins have also been carried out to identify amino acids which are responsible for making these proteins specific for fatty acyl substrates despite the fact that these proteins are likely to adopt a structure similar to the adenylation domains of NRPS. Apart from structural modelling, an alternate approach based on phylogenetic analysis is being pursued to identify putative active site residues from sequence alone.

B.    MHC-peptide interactions

The multi scale approach for prediction of MHC binding peptides have been tested on known class I MHC binders listed in the MHCPEP database. Using the information given in MHCPEP, the sequences of the respective antigens have been extracted from SWISSPROT and threaded on the peptide backbone in the structural templates of various MHC alleles. Analysis on this data set consisting of approximately 3500 peptides for 8 different alleles indicate that known class I MHC binding peptides have a high rank in most of the cases. The high ranking peptides are being analyzed using docking methods to see if MHC binders can be ranked as per their binding affinity.

C.    Structure prediction of peptides/proteins

Automated computational tools have been developed for large scale threading of ORFs from genomes and further analysis of the results from threading. Using these tools fold prediction has been carried out for approximately 1500 ORFs from M.tuberculosis with unknown function. It is found that, out of these 1500 unknown ORFs for 250 proteins it is possible to assign one of the known structural folds with high statistical confidence. Based on the assigned fold, these proteins have been classified as putative oxidoreductase, hydrolase, transferase, isomerase, lyase and ligase. Further analysis involving PSI-BLAST, active site residue patterns and gene neighbours are in progress to assign specific functions to these proteins.

Publications

Original peer-reviewed articles

1.     Yadav G, Gokhale RS and Mohanty D (2003) Computational approach for prediction of domain organization and substrate specificity of modular polyketide synthases. J Mol Biol 328:335-363.

2.     Yadav G, Gokhale RS and Mohanty D (2003) SEARCHPKS: A program for detection and analysis of polyketide synthase domains. Nucl Acids Res (in press).