|
Molecular
modelling of peptides and protein-ligand complexes using knowledge-based
potentials |
| Principal Investigator : Debasisa Mohanty
Ph
D Students
Collaborator The
main theme of the research project is to understand the structural principles
that govern binding of various ligands to proteins and folding of
peptides/proteins to stable conformations, and use these structural principles
for developing computational approaches for structure prediction of
peptides/proteins and protein-ligand complexes. The objective is to
investigate, whether knowledge-based methods can be used for predicting the
(1) substrate specificity of proteins involved in biosynthesis of polyketides
and peptide antibiotics, (2) bound conformation of peptides in MHC-peptide
complexes and ranking of peptides as per their binding free energy and (3)
structures for short peptides and folds for proteins of unknown function in
various genomes. A.
Substrate specificity of polyketide synthases (PKSs) and non-ribosomal peptide
synthetases (NRPSs) Polyketide
synthases (PKSs) The
robustness of our computational protocol for identification of PKS domains and
prediction of their starter and extender specificity, has been demonstrated in
the analysis of various type I iterative clusters and also additional modular
PKS clusters whose sequences were available after the development of our
computational approach. Comparative analysis of the sequences of modular and
iterative PKSs has been carried out to identify patterns which can be used to
predict whether a PKS cluster is modular or iterative. Distinction between
modular and iterative PKS and prediction of the number of iterative
condensations carried out by a PKS module, are essential for identifying
biosynthetic products of uncharacterized PKS clusters. This comparative
analysis has revealed that modular and iterative PKSs differ in terms of
linker length in the minimal module. There are also distinct differences in
sequences of domains that lead to separate clustering of modular and iterative
PKSs in a multiple alignment. In order to quantify these differences, separate
profile Hidden Markov Models (HMMs) have been built for KS domains of modular
and iterative PKSs using a training set of sequences. Benchmarking of these
profile HMMs on a test set of 106 iterative and 260 modular domains indicate
that, iterative or modular characteristic of a PKS protein can be predicted
with an accuracy of 97%. The HMM analysis also led to the identification of 11
positions, which have distinctly different residue occurrence patterns in the
sequence of the modular and iterative KS domains. Analysis of the structural
models of modular and iterative KS domains revealed that majority of these 11
positions occur in the vicinity of the putaive active site. These results
indicate that differences between modular and iterative KS domains may
originate from the differences in the catalytic pockets of these enzymes. The
structural models of different iterarive KS domains is being analyzed in
details to investigate possible correlation between active site geomtery and
number of iterations. Chalcone
Synthase (CHS) Putative
active site residues have been identified for all CHS like proteins with known
substrates using comparative modelling approach. Active site geometry in these
homology models have been analyzed in detail to find out residues which
control selection of starter units and cyclization of the polyketide chain.
However, based on these modelling studies, it was not possible to explain how
certain bacterial CHS make products with long aliphatic chains. Since the M.tuberculosis
KASIII has a fold similar to CHS and is known to accept myristoyl CoA, these
CHS like proteins from M.tuberculosis were modelled using the M.tuberculosis
KASIII structure as template. Analysis of the substrate binding sites in this
new model could explain the experimentally observed substrate specificity of
pks18 protein from M.tuberculosis. Nonribosomal
peptide synthetases (NRPSs) A
novel computational method has been developed for correct identification of
NRPS domains in a query sequence and predicting their catalytic activity or
substrate specificity. This method uses a knowledge based approach. Sequences
of a diverse set of C, A, T, M and TE domains have been obtained from
threading analysis of a training set of 22 experimentally characterized NRPS
and hybrid NRPS/PKS clusters. These curated sequences of various NRPS domains
have been catalogued in NRPSDB, a database of nonribosomal peptide synthetases.
For identification of a given NRPS domain in a polypeptide sequence, all the
available sequences of that domain in NRPSDB are pair wise aligned with the
query sequence. From the set of overlapping alignments, the alignment having
lowest E-value and length above a predefined cut-off is chosen as the best
match with the query. This computational method can also discriminate between
condensation (C), cyclization (Cy) and epimerization (E) domains using
information from Cy and E domains in the training set. This is achieved by
aligning the query domain with the various C, Cy and E domains present in
NRPSDB and identifying whether the closest match to the query domain is a
condensation, cyclization or epimerization domain. Prediction of Cy domain is
further confirmed by searching for the conserved DxxxxD motif. This
computational protocol, SEARCHNRPS also identifies the putative active site
residues of A and C domains present in the query from their alignment with
structural templates. By comparing these putative active site residues with
the active sites of A domains of known specificity in NRPSDB, SEARCHNRPS
attempts to predict the specificity of the A domain in the query. High
prediction accuracy of SEARCHNRPS has been demonstrated by benchmarking on a
test set of NRPS clusters. Sequence analysis of condensation domains from the
training and test set has given interesting information about the substrate
specificity of this domain. These domains form two major clusters as per the
chirality of the donor peptidyl group. Within each of these two groups,
further clustering is also observed depending upon the chirality of the
acceptor peptidyl group. Acyl
CoA synthetases Acyl
CoA synthetases belong to a superfamily of acyl adenylate/thioester forming
enzymes, which share a conserved AMP binding domain. All members of this
superfamily, catalyze the formation of an adenylate intermediate with the
carboxyl moiety of the respective substrate, which is then esterified with
either CoA (Coumarate CoA ligases, acetyl CoA ligases, fatty acyl CoA
synthetases) or enzyme bound 4’phosphopantethiene (peptide synthetases) or
oxidized by molecular oxygen (luciferases). Many of these proteins are present
as domains in PKS/NRPS clusters or as ORFs adjacent to PKS and NRPS clusters
and play a crucial role in biosynthesis of PKS/NRPS products. Because of the
high degree of sequence divergence between members of the superfamily,
prediction of their functional subfamily and substrate specificity has been a
difficult task. However, the availability of crystal structures of several
members of ACS family, has established that they adopt similar three
dimensional structure despite very low sequence identity. Using the available
crystal structures as templates, putative active site residues have been
identified for all known members of this superfamily. The correlation between
active site residue pattern and known functional specificity has been analyzed
in detail to derive specificity code for various functional subfamilies. This
information can be used for in silico prediction of functional
subfamily and substrate specificity of uncharacterized ACS like proteins.
Molecular modelling of various substrates in the putative active sites of
these ACS proteins, are also being carried out to understand the structural
basis of diverse substrate specificity. B.
MHC-peptide interactions The
structure based approach for identification of MHC binding peptides is being
used to analyze complexes between class II MHC molecules and their peptide
ligands.
Publications
Original
peer-reviewed articles 1. *Yadav G, Gokhale RS and Mohanty D (2003) SEARCHPKS: A program for detection and analysis of polyketide synthase domains. Nucl Acids Res 31:3654-3658 (*in press last year, since published). 2. Ansari MZ, Yadav G, Gokhale RS and Mohanty D (2004) NRPS-PKS: A knowledge-based resource for analysis of NRPS/PKS megasynthases. Nucl Acids Res (in press). 3. Saxena P, Yadav G, Mohanty D and Gokhale RS (2003) A new family of type III polyketide synthases in Mycobacterium tuberculosis. J Biol Chem 278:44780-44790. 4. Trivedi OA, Arora P, Sridharan V, Tickoo R, Mohanty D and Gokhale RS (2004) Enzymic activation and transfer of fatty acids as acyl-adenylates in mycobacteria. Nature (in press). |