|
Molecular modelling of peptides and protein-ligand complexes using knowledge-based potentials |
| Principal Investigator : Debasisa Mohanty
Project Associates
/ Assistants Ph D Students Collaborators The
main theme of the research project is to understand the structural principles
that govern binding of various ligands to proteins and folding of
peptides/proteins to stable conformations, and use these structural principles
for developing computational approaches for structure prediction of
peptides/proteins and protein-ligand complexes. The specific objective of the
project is to investigate whether knowledge-based potentials i.e. scoring
functions obtained from analysis of structural features in databases of known
protein structures can be used for predicting the (1) substrate specificity of
proteins involved in biosynthesis of polyketides and peptide antibiotics, (2)
bound conformation of peptides in MHC-peptide complexes and ranking of
peptides as per their binding free energy and (3) structures for short
peptides and folds for proteins of unknown function in various genomes. A.
Substrate specificity of polyketide synthases (PKSs) and non-ribosomal
peptide synthases (NRPSs) Modular
polyketide synthases (PKSs) Since
the analysis of the active site residues in various AT domains indicated a
strong correlation between the pattern of active site residues and their
substrate specificities, molecular modelling studies of the substrates in the
active sites of AT domains were carried out to understand the stereochemical
basis for the observed correlation. Modelling of a malonate group in the
active site of AT domain indicated that, except Phe-200, all other residues
which are in contact with the substrate are conserved in both malonate and
methylmalonate specific AT domains. The other residues which show a high
degree of substrate specific variation are relatively away from the substrate,
while Phe-200 is in contact with the methylene group of the malonate moiety.
This indicates that, even though both positions 200 and 93 show strong
correlation with substrate specificity, residue at position 200 would control
substrate specificity to a larger extent than the residue at position 93. A
change from malonate to methylmalonate requires addition of a methyl group at
one of the methylene protons either in R or in S configuration. Addition of a
methyl group in R configuration results unfavourable steric clashes between
this methyl carbon and the Cb
of Phe-200/Ser-200 as well as His-201 which is a conserved residue in all AT
domains. This immediately explains the structural basis of the chiral
selectivity of acyltransferase domains for 2-S methylmalonyl-CoA instead of
2-R methylmalonyl-CoA. On the other hand, the steric incompatibility of
Phe-200 with methylmalonate in S configuration explains the observation that
all AT domains that have Phe-200 accept only malonate. In contrast, all the
methylmalonyl CoA accepting AT domains have a F200S mutation, where a bulky
Phe at position 200 changes to relatively small Ser residue, thus avoiding
potential steric clash by making additional space for the methyl group in the
active site cavity. These results are in qualitative agreement with the
experimental results from site directed mutagenesis studies reported in the
literature. These results also indicate that using molecular modelling
approach, it is possible to design AT domains specific for substrates other
than malonate and methylmalonate. The
computational protocol for prediction of domain organization and substrate
specificity of modular PKS cluster was used to identify potential modular PKS
clusters in several microbial genomes. The program could successfully detect
all the annotated PKS clusters, predict their domain organization and also
substrate specificities for various AT domains. It may be noted that, these
domain organization and substrate specificities of uncharacterized PKS
clusters are bonafide blind predictions by our computational approach. Even
though the results from 19 characterized PKS clusters indicate the prediction
accuracy of our computational method to be high, these predictions can
actually be tested only after the availability of experimental results on
these PKS clusters. An intriguing observation in the Bacillus subtilis
genome was the identification of PKS-like clusters with an unusual domain
organization. These modules lacked the core acyltransferase domain, which is
an essential part of a minimal polyketide module. No putative AT domain could
be identified in these amino acid stretches even after detailed sequence
analyses. Lack of AT domains in B. subtilis PKSX cluster has been
reported earlier. However, the present systematic search for identification of
potential PKS domains in Bacillus subtilis genome indicated that, AT
domains were present separately, as independent ORFs, adjacent to the proteins
containing these unusual domains. Such domain architecture has features of
both type I and type II PKS. It would be interesting to explore whether this
arrangement offers a new dimension to the metabolic diversity of these complex
enzyme systems and how these extra-modular AT domains may be exercising
substrate selection. Chalcone
Synthase (CHS) Docking
of various substrates in the active site of the structural models of different
plant and bacterial CHS-like proteins indicated that there is a correlation
between active site cavity volume and the final product of a given CHS. This
could explain the structural basis of the specificity for resveratrol,
benzylacetone, 2-pyrone and THN. However, based on these modelling studies, it
is not possible to explain how certain bacterial CHS make products with long
aliphatic chains. Since the M. tuberculosis KASIII has a fold similar
to CHS and is known to accept myristoyl CoA, comparative modelling work is in
progress to explore if structural models of bacterial CHSs have substrate
binding sites similar to M. tuberculosis KAS III. Nonribosomal
peptide synthetases (NRPSs) Nonribosomal
peptides are synthesized in many bacteria and fungi by large multifunctional
proteins called nonribosomal peptide synthetases (NRPSs). NRPSs have an
organization of domains and modules similar to modular PKSs and synthesize
nonribosomal peptides using assembly-line enzymology. A unique feature of NRPS
system is the ability to synthesize peptides incorporating proteinogenic as
well as non-proteinogenic amino acids and the products include many important
antibiotics, immuno suppressants, veterinary agents and agrochemicals.
Availability of the sequence information for a large number of NRPS clusters
having experimentally characterized products, presents an opportunity to
develop knowledge based in silico methods for understanding the role of
individual domains and inter domain interactions in controlling substrate the
specificity of various NRPS proteins. Since
correct identification of the various catalytic domains present in a NRPS
protein is essential for understanding sequence to product relationship in
NRPS system, systematic analysis of domain organization has been carried out
for 40 NRPS clusters. Due to the relatively large sequence variability in the
protein families corresponding to various NRPS domains, domain identification
could not be carried out using pair BLAST and single templates. However, the
availability of representative structural folds for majority of NRPS domains
permitted correct identification of domain boundaries by threading methods.
Using the domain boundaries predicted by threading, the sequences of various
types of inter domain linker regions were extracted. It was interesting to
note that unlike the PKS system, various inter domain linkers in the NRPS
proteins are in fact very short stretches of amino acids ranging from 10 to 20
amino acids. These short lengths of the linker regions are likely to impose a
spatial constraint on the relative orientation of the domains in a module.
Attempt is being made to obtain structural models for NRPS modules using
linker length as a constraint in the docking simulations of various domains in
a module. Analysis of inter domain linker sequences have also indicated that,
there is a six residue overlap between the C-terminus of the condensation (C)
and N-terminus of the adenylation (A) domains. This can be attributed to the
fact that the structural templates for C and A domains used in threading
correspond to a stand alone C domain and an extreme N-terminus A domain, thus
exact boundaries between these two domains can not be determined
unambiguously. However, this overlapping region has conserved sequence
features and has similar helical structure in both the structures. Work is in
progress to investigate if these features can be used to obtain a structural
model of a complete NRPS module. Since adenylation domains are responsible for
selection of amino acids during biosynthesis of NRPS products, structure based
analysis of the active site residues of various A domains with known
specificity and attempt is being made to understand the structural basis of
the observed correlation between the substrate and active site residue pattern. Acyl
CoA synthetases Structural
modelling of acyl CoA synthetase like proteins have also been carried out to
identify amino acids which are responsible for making these proteins specific
for fatty acyl substrates despite the fact that these proteins are likely to
adopt a structure similar to the adenylation domains of NRPS. Apart from
structural modelling, an alternate approach based on phylogenetic analysis is
being pursued to identify putative active site residues from sequence alone. B.
MHC-peptide interactions The
multi scale approach for prediction of MHC binding peptides have been tested
on known class I MHC binders listed in the MHCPEP database. Using the
information given in MHCPEP, the sequences of the respective antigens have
been extracted from SWISSPROT and threaded on the peptide backbone in the
structural templates of various MHC alleles. Analysis on this data set
consisting of approximately 3500 peptides for 8 different alleles indicate
that known class I MHC binding peptides have a high rank in most of the cases.
The high ranking peptides are being analyzed using docking methods to see if
MHC binders can be ranked as per their binding affinity. C.
Structure prediction of peptides/proteins Automated computational tools have been developed for large scale threading of ORFs from genomes and further analysis of the results from threading. Using these tools fold prediction has been carried out for approximately 1500 ORFs from M.tuberculosis with unknown function. It is found that, out of these 1500 unknown ORFs for 250 proteins it is possible to assign one of the known structural folds with high statistical confidence. Based on the assigned fold, these proteins have been classified as putative oxidoreductase, hydrolase, transferase, isomerase, lyase and ligase. Further analysis involving PSI-BLAST, active site residue patterns and gene neighbours are in progress to assign specific functions to these proteins. Publications Original
peer-reviewed articles 1.
Yadav G, Gokhale RS and Mohanty D (2003) Computational approach for
prediction of domain organization and substrate specificity of modular
polyketide synthases. J Mol Biol 328:335-363. 2.
Yadav G, Gokhale RS and Mohanty D (2003) SEARCHPKS: A program for detection
and analysis of polyketide synthase domains. Nucl Acids Res (in press). |