Introduction
Molecular simulations have come to play a central role in structural biology
and biophysics; they are beginning to be used in cell and systems biology as
well. Simulations help us develop our intuition about the behavior of models
which link biological structures to function. Protein folding, molecular
recognition and ligand binding, biological machines used for transport and
signaling, these are some of the research areas that have been greatly
enriched by computational approaches based on molecular simulations. The
design of effective potentials for modeling solvation effects implicitly, the
development of new sampling methods based on replica exchange molecular
dynamics, and the exploration of rare events using network models are themes
in the Levy group which run through our current research highlighted below.
New multi-scale models, effective potentials, and sampling methods for molecular simulations
Implicit Solvent Models
Water plays a fundamental role in virtually all biological processes.
The accurate modeling of hydration thermodynamics is therefore
essential for studying protein conformational equilibria, aggregation,
and binding. Explicit solvent models provide the most detailed
description of hydration phenomena, but they have inherent limitations
that motivate the search for other ways to represent solvation.
Implicit solvent models, which are based on the statistical mechanics
concept of the solvent potential of mean force, are very useful
alternatives to explicit solvation for modeling protein folding and
binding.
We have developed an implicit solvent effective potential (AGBNP)
that is suitable for molecular dynamics simulations and high-
resolution modeling. It is based on a novel implementation of the
pairwise descreening Generalized Born model for the electrostatic
component and a nonpolar hydration free energy estimator. The model
is fully analytical with first derivatives and is computationally
efficient and has been incorporated into the IMPACT molecular
simulation program.
Gallicchio, E., K. Paris, and R.M. Levy
(2009). The AGBNP2 Implicit Solvent Model. J. Chem. Theory and
Comput., 5, 2544-2564. DOI: 10.1021/ct900234u. PMCID:
PMC2857935.
Replica Exchange Molecular Dynamics
One of the key challenges in the computer simulation of proteins at
the atomic level is the sampling of conformational space. The
efficiency of many common sampling protocols such as Monte-Carlo (MC)
and Molecular Dynamics (MD) is limited by the need to cross high free-
energy barriers and rugged energy landscapes. In the Replica Exchange
(RE) algorithm many coupled simulations are run in parallel; the
coupling is achieved by exchanging either thermodynamic or
Hamiltonian parameters. We are exploring novel implementations of
replica exchange to accelerate the convergence of protein folding
simulations, and the calculation of protein-ligand binding free
energies. In the latter example, the REMD coupling parameter is the
protein-ligand interaction energy.
Network Models, Rare Events, and Transition Paths
We can think of the data generated by replica exchange simulations of
protein folding and protein-ligand binding as generating a trace
through phase space. We can organize the data using graph theoretic
ideas, where the nodes are conformations and the edges represent
possible jumps between closely related nodes. We use the magic of
histogram re-weighting to assign relative probabilities to the jumps
(edges) which connect the nodes. Furthermore, we can construct
transition paths on the graphs which correspond to rare transitions
between stable states, and then use the tools of transition path
theory to analyze the transition path ensemble, in order to determine
the number of truly different important paths and their fluxes. This
is a form of multi-scale modeling where the underlying data is
derived from atomic simulations, while the transition paths are
constructed ex post facto.
Gallicchio, E., M. Andrec, A.K. Felts, and R.M.
Levy (2005). Temperature Weighted Histogram Analysis Method, Replica
Exchange, and Transition Paths. J. Phys. Chem. B, 109, 6722-6731.
DOI: 10.1021/jp045294f.
Gallicchio, E., R.M. Levy, and M. Parashar
(2008). Asynchronous Replica Exchange for Molecular Simulations. J.
Comput. Chem., 29, 788-794. DOI: 10.1002/jcc.20839. PMCID:
PMC2977925.
Zheng, W., M. Andrec, E. Gallicchio, and R.M.
Levy (2007). Simulating replica exchange simulations of protein
folding with a kinetic network model. Proc. Natl. Acad. Sci. USA,
104, 15340-15345. DOI: 10.1073/pnas.0704418104. PMCID: PMC2000486.
Zheng, W., M. Andrec, E. Gallicchio, and R.M.
Levy 2008. Simple Continuous and Discrete Models for Simulating
Replica Exchange Simulations of Protein Folding. J. Phys. Chem. B,
112, 6083-6093. DOI: 10.1021/jp076377+. PMCID: PMC2978075.
Protein folding and misfolding: fundamental physics, health-related applications
Kinetic Network Models and Protein Folding
Protein folding is a fundamental problem in modern molecular
biophysics and is an example of a slow process occurring via rare
events in a high-dimensional configurational space. For
this reason, it is difficult for an all-atom simulation to obtain
meaningful information on the kinetics and pathways of such
processes. A number of strategies for addressing this problem have
been proposed over the years that involve focusing on the
important slow processes while neglecting the less interesting
rapid kinetics by simplification of the state space or reduction
of dimensionality. Generalized ensemble methods such as replica
exchange molecular dynamics (REMD) have been developed that
enhance the ability to obtain accurate canonical populations in
complex systems by increasing sampling efficiency. However, since
REMD involves temperature swaps between MD trajectories, it is not
straightforward to obtain kinetic information from such
simulations. We have chosen to make use of a kinetic network model
in which the nodes correspond to discrete molecular conformations
from REMD simulation trajectories (rather than macrostates), and
the edges are derived from an ansatz based on structural
similarity. Since the network is a discretized representation of
the system and does not require additional energy and force
evaluations, there is a considerable gain in efficiency, allowing
us to study much slower kinetic processes than would be accessible
using conventional MD. Since the network topology can be
constructed based on virtually all degrees of freedom, this allows
for multiple pathways and complex transition states. Application
of the kinetic network model to study the folding of mini-proteins
such as Trp-Cage, and to the conformational reorganization of the
HIV-1 Protease receptor pocket, provides us with detailed
information about the rich variety of transition pathways both
their structural characteristics and their kinetics.
Andrec, M., A.K. Felts, E. Gallicchio, and R.M.
Levy (2005). Protein Folding Pathways from Replica Exchange
Simulations and a Kinetic Network Model. Proc. Natl. Acad. Sci.
USA, 102, 6801-6806. DOI: 10.1073/pnas.0408970102. PMCID:
PMC1100763
Zheng, W., M. Andrec, E. Gallicchio, and R.M.
Levy (2009). Recovering Kinetics from a Simplified Protein Folding
Model Using Replica Exchange Simulations: A Kinetic Network and Effective
Stochastic Dynamics. J. Phys. Chem. B, 113, 11702-11709. DOI:
10.1021/jp900445t. PMCID: PMC2975981.
Health-related applications: α synuclein and Parkinson's disease
Unlike globular proteins with stable native structures, partially
folded proteins, and natively unfolded proteins are best characterized
as conformational ensembles of rapidly interconverting structures.
Structural characterization of these dynamic systems is critical to
understanding the basis for protein misfolding that results in protein
aggregation and disease. In collaboration with Jean Baum's research
group we are integrating experiments and computational models to
develop a molecular picture of the steps involved in the aggregation of
a synuclein, an intrinsically disordered protein that appears in
aggregated forms in the brains of patients with Parkinson's disease.
The conversion from the monomeric disordered state to the highly
ordered aggregate fibril form is both complex and not well understood.
Aggregation rates are sensitive to changes in amino acid sequence and
environmental conditions. We are focusing on the early steps in the
aggregation process.
Weinstock, D.S., C. Narayanan, J. Baum, and R.M.
Levy (2008). Correlation Between 13C-alpha Chemical Shifts and Helix
Content of Peptide Ensembles. Protein Science, 17, 950-954. DOI:
10.1110/ps.073365408. PMCID: PMC2327285.
Wu, K-P, D.S. Weinstock, C. Narayanan, R.M. Levy,
and J. Baum (2009). Structural reorganization of a-synuclein at low
pH observed by NMR and REMD simulations. J. Mol. Biol., 391,
784-796. DOI: 10.1016/j.jmb.2009.06.063. PMCID: PMC2766395.
The energy landscape for ligand binding to protein receptors and drug design
Understanding how to accurately predict binding free energies of small molecule
ligands to protein receptor targets is a very active area of research in our
lab. The statistical mechanics of binding can be formulated in many alternative
ways; we are pursuing approaches particularly well suited to modern cluster
computing environments with highly parallelized computation.
Gallicchio E., and R.M. Levy (2012). Prediction
of SAMPL3 Host-Guest Affinities withthe Binding Energy Distribution Analysis
Method (BEDAM). J Comp Aided Mol Design, 26, 505-516. DOI:
10.1007/s10822-012-9552-3. PMCID: PMC3383899.
Statistical mechanics and the free energy of binding
A new method has been developed for estimating protein-ligand affinities with
implicit solvation called the Binding Energy Distribution Analysis Method
(BEDAM) which is a statistical mechanics formulation for the free energy of
binding based on the analysis of the distribution of binding energies between
the ligand and the protein receptor obtained in a fictitious ensemble in which
the ligand resides in the binding pocket without interacting with the receptor.
This framework is developed based on implicit solvation, Hamiltonian replica
exchange parallel molecular dynamics sampling, and reweighting techniques. The
formulation takes into account ligand-receptor interactions, multiple binding
modes, and conformational entropy losses and is currently being applied to a
series of important medicinal targets.
Receptor reorganization in protein-ligand binding: HIV-1 Protease and
Reverse Transcriptase
We are simultaneously investigating computational approaches to
calculate receptor strain energies that are a component of binding free
energies. The overall affinity of a ligand for a receptor can be expressed as a
balance between the strength of the interactions of the ligand for a particular
binding-competent conformation of the receptor and the probability of
occurrence of that conformation in the absence of the ligand. For receptors
that do not experience much conformational change upon binding, the affinity is
primarily based on the interaction between the ligand and receptor. However,
for flexible receptors such as HIV-1 reverse transcriptase (RT), HIV-1 protease
and a number of kinases, the conformational changes in the receptor may require
inclusion of receptor reorganization or strain energy to properly model the
binding of ligands to that protein. We are exploring the use of bioinformatic
techniques and simulations to model the receptor strain energy for flexible
receptors.
We are also working with collaborators at Rutgers (Eddy Arnold group) and the
University of Pittsburgh School of Medicine (Mike Parniak group) to identify
lead compounds that will inhibit the ribonuclease H activity of HIV-1 RT.
Structure based design of inhibitors for the RNase H function of RT has lagged
behind inhibitor design for the polymerase function of RT due to limited
structural information about the binding modes of inhibitors to the RNase H
active site. Our current strategy attempts to leverage experimental information
generated by our collaborators which has identified active compounds in large
libraries by comparing this information with the predictions of enrichment for
the same compounds when they are docked to many putative receptor sites on the
RNase H domain.
Paris, K.A., O. Haq, A.K. Felts, K. Das, E.
Arnold, and R.M. Levy (2009). Conformational Landscape of the Human
Immunodeficiency Virus Type 1 Reverse Transcriptase Non-Nucleoside
Inhibitor Binding Pocket: Lessons for Inhibitor Design from a Cluster
Analysis of Many Crystal Structures. J. Med. Chem., 52,
6413-6420. DOI: 10.1021/jm900854h. PMCID: PMC3182518.
Lapelosa, M., G. Ferstandig Arnold, E. Gallicchio,
E. Arnold, and R.M Levy (2010). Antigenic Characteristics of
Rhinovirus Chimeras Designed in silico for Enhanced Presentation of HIV-1
gp41 Epitopes. J. Mol. Biol., 397, 752-766. DOI:
10.1016/j.jmb.2010.01.064. PMCID: PMC2940250.
Lapelosa, M., E. Gallicchio, G. Ferstandig Arnold,
E. Arnold and R.M. Levy (2009). In silico Vaccine Design Based on
Molecular Simulations of Rhinovirus Chimeras Presenting HIV-1 gp41
Epitopes. J. Mol. Biol., 385, 675-691. DOI:
10.1016/j.jmb.2008.10.089. PMCID: PMC2649764.
Protein stability and the evolution of drug resistance: bioinformatics and biophysics
Mutational patterns in HIV-1 Protease
Proteins evolve through random mutagenesis and their evolutionary selection is
constrained by structural, functional and environmental limitations.
Thermodynamic stability is by far the most important structural factor, as most
proteins need to be folded in order to function. The stability range of
proteins is narrow, and is estimated experimentally to be ~ 10 kcal/mol or
less. Proteins operate "on a knife's edge," whereby a single highly deleterious
mutation could potentially lead to an unfolded protein. Drug resistance is
acquired through mutations. Primary mutations are typically destabilizing, and
must be accompanied by or correlated with compensatory stabilizing mutations.
We are developing statistical methods to model observed mutational patterns
within proteins evolving in response to drug pressure. The system we are
currently working on is HIV-1 protease for which we have access to over 40,000
publicly available amino acid sequences, isolated from patients undergoing
chemotherapy. The mutational patterns in HIV protease are highly non-trivial
and involve primary mutations that confer drug resistance, compensatory
stabilizing effects, and viral evolvability. Our aim is to rationalize the
mutational patterns using coarse grained biophysical models and to thereby
relate them to the structure and energetics of the enzyme.
Computational mutagenesis methods, ranging from statistical, empirical, and
physical approaches, have been useful for understanding and predicting protein
stabilities. Most existing methods are limited because they are unable to
reproduce correctly the magnitude of the free energy change (ddG), the
difference in free energy of unfolding between wild-type and mutant proteins (>
1.0 kcal/mol deviation from experimental ddG values). In order to overcome this
problem, we are developing a computational approach that uses sampling combined
with a linear interaction energy (LIE) method to predict the changes in the
free energy of the native state induced by mutations. Initial tests of the
method have shown an unsigned error between calculated and experimental values
of <1 kcal/mol. Further development of the LIE functional form in combination
with sampling techniques is an active, promising project which will allow for
more accurate prediction of mutational effects on protein stability which are
important to the understanding of drug-resistant mutations that are formed in
many protein targets including HIV-1 protease.