|
Molecular modelling of peptides and protein-ligand complexes using knowledge-based potentials |
|
Principal Investigator : Debasisa Mohanty PhD
Students Collaborators The
main theme of the research project is to understand the structural principles
that govern binding of various ligands to proteins and folding of
peptides/proteins to stable conformations, and use these structural principles
for developing computational approaches for structure prediction of
peptides/proteins and protein-ligand complexes. The specific objective of the
project is to investigate, whether knowledge-based potentials i.e. scoring
functions obtained from analysis of structural features in databases of known
protein structures can be used for predicting the (a) substrate specificity of
proteins involved in biosynthesis of polyketides and peptide antibiotics, (b)
bound conformation of peptides in MHC-peptide complexes and ranking of
peptides as per their binding free energy and (c) structures for short
peptides and folds for proteins of unknown function in various genomes. A.
Substrate specificity of
proteins involved in polyketide biosynthesis Our
computational approach was successful in detecting the correlation between the
substrate specificity of CHS like proteins and their active site residues, but
no such correlation could be found in case of KS domains of modular polyketide
synthases (PKS). The key difference between CHS and KS domain of modular PKS
is that, while CHS is a mono functional enzyme which carries out polyketide
synthesis by taking different starter and extender units, KS domains are part
of the multi-enzyme complexes which contain several other domians for
activities such as phosphopantetheine binding (ACP), acyl transfer (AT),
ketoreduction (KR), dehydration (DH) and enoylreduction (ER), etc. Thus a
systematic analysis involving all the domains of modular PKS was necessary for
understanding their substrate specificity. Analysis
of various domains present in modular PKS We
have carried out a detailed computational analysis of the amino acid sequences
of various modular PKS with known substrates to decipher the relationship
between their amino acid sequence and substrate specificity. Since, the
polyketide product synthesized by a given gene cluster is determined by the
number of modules present in the cluster and type of domains in each of the
modules, the first step in the computational prediction of the polyketide
product is the correct identification of the domains present in a given ORF.
The results of our sequence analysis indicated that all domains except ACP
have enough sequence homology so that they can be detected by pairwise
comparison with a reference template sequence of the corresponding domain.
However, the ACP domains show a high degree of sequence variability and hence
multiple reference templates are required for their identification by pairwise
sequence alignment. Our domain identification protocol has been tested on all
the modular PKS with known substrates and the results have also been compared
with the CDD server from NCBI, which is widely used for identification of
protein domains. While CDD gives an ambiguous prediction for reductive domains
as it can not distinguish between ER and KR domains, our program is able to
correctly predict ER and KR domains. In some cases, we have also identified
ACP domains which are not predicted by CDD, but the presence of such ACP
domains in the sequence is consistent with the experimentally characterised
polyketide product. The
next step in the computational prediction of the polyketide product is to
determine the specificity of the AT domains for various types of extender
units. The active site residues in various AT domains have been identified by
multiple sequence alignment and pairwise alignment of the sequence of each AT
domain with the crystal structure of acyl transferase from E. Coli.
fatty acid synthase. Comparison of the active site sequences using
evolutionary dendogramas indicate that there are distinct patterns of active
site residues characterizing specificity for malonate, methyl malonate and
other unusual substrates. Docking of various types of substrates on the
homology models of AT domains are being carried out to understand the
structural basis of their specificity, which would presumably permit more
accurate prediction. Based
on the results of our computational analysis of modular PKS sequences, we have
developed a web enabled software for prediction of PKS domains in a given
protein sequence. The program pictorially depicts the various domains and
inter domain linker regions, with clickable links to their sequences in FASTA
format. Using this program we have developed a searchable database for modular
PKS in collaboration with Dr Gokhale’s group. This database gives the domain
organization of each modular PKS and the chemical structure of the polyketide
product. It also permits search of domains in terms of their specificity for
various extender units or level of sequence similarity/divergence from a
selected domain. This database is not only useful for our knowledge-based
approach, which involves regular addition of new sequence data and analysis
using different training and test sets, it will also be a vaulable tool for
choice of domains in rational design of novel polyketides. Substrate
specificity of chalcone synthase (CHS) Using
our computational approach active site residues have been identified for all
CHS like proteins with known substrates. Active site geometry in these
homology models have been analyzed in detail to find out residues which
control selection of starter units and cyclization of the polyketide chain.
Recent crystallographic and biochemical studies on CHS mutants have
demonstrated that, one can indeed produce altered polyketide products by
mutating these residues and the mutant CHS structures have very low RMSD (less
than 1.0 Å)
from the native CHS. These experimental studies give further validity to our
assumption that substrate specificity for CHS like proteins can be predicted
by our knowledge based approach. Cavity volumes have been computed for
structural models of various CHS like proteins and compared with the sizes of
the polyketide products synthesized by these proteins and number of
condensation steps involved in their synthesis. Attempt is being made to
investigate, given a PKS sequence, whether one can predict the number of
condensations steps it catalyzes and the cyclisation pattern of the polyketide
intermediate. Computational
analysis of acyl CoA synthetase like proteins from M. tb genome In
M. tb genome, 36 genes have been annotated as acyl CoA synthetase like
proteins. It has been postulated that, these enzymes are involved in synthesis
of fatty acyl CoA from fatty acid and CoA in presence of ATP. However, the
exact substrate specificity of these enzymes are not known. Since, many of
these proteins are located adjacent to PKS or NRPS clusters in M. tb
and are believed to be involved in loading of starter units to the PKS
cluster, identification of their substrate specificity is crucial for in
silico prediction of polyketide products. Our computational analysis
indicates that these sequences are likely to adopt a AMP binding fold similar
to the adenylation domains of NRPS proteins. Detailed analysis involving
entire sequence and active site residues indicate that, 12 of these proteins
have many features similar to adenylation domains of NRPS proteins, which show
specificity for amino acid like substrates rather than medium or longer chain
fatty acids. Rest of the acyl CoA synthetase like proteins show features
similar to known fatty acyl CoA ligase like proteins, which take fatty acids
like substrates. Handling of such different types of substrates by proteins
having same structural fold could possibly be achieved by presence of two
different binding site on the structure, adjacent to the conserved AMP binding
site. Docking of different types of substrates are being carried out to
understand the substrate specificity of these proteins. B.
MHC-peptide interactions The
computational protocol developed for prediction of class I MHC binding
peptides, could predict the sequence of the peptide and its bound conformation
with reasonable accuracy for 19 different MHC-peptide complexes available in
PDB. However, for further validation, the approach had to be tested on a much
larger data set. Therefore, detailed analysis of various class I MHC binding
peptide sequences available in MHCPEP database is being carried out. The
purpose of this analysis is to investigate, whether allele specific contacts
inferred from the MHCPEP database could be predicted by a combination of
residue based statistical potential and rotamer library, or allele specific
amino acid pairing frequencies have to be incorporated in our protocol to
achieve optimal prediction. Attempt is also being made to select few high
scoring peptides based on our knowledge-based modelling approach and carry out
detailed molecular dynamics or Monte Carlo simulations to address the issue of
flexibility of the MHC-peptide complex. C.
Structure prediction of peptides/proteins Apart
from structure prediction of peptides, the knowledge-based method has also
been used for structure prediction of proteins. The M. tb genome
contains a large number of protein sequences which do not show any detectable
sequence similarity with any known protein, thus they have been categorized as
proteins with unknown functions. We have used threading methods to predict
whether these sequences adopt one of the known structural folds, even in
absence of any detectable sequence similarity. It was found that for many of
these unknown proteins, it is possible to assign a fold with high statistical
confidence. We have also tried to check, if these sequences also contain the
conserved catalytic or active site residues, which are required for a function
compatible with the assigned structural fold. The presence of such conserved
residues further ascertain the reliability of the fold prediction method. |