Ron Levy group

Exploring
molecular
landscapes

Introduction

Molecular simulations have come to play a central role in structural biology and biophysics; they are beginning to be used in cell and systems biology as well. Simulations help us develop our intuition about the behavior of models which link biological structures to function. Protein folding, molecular recognition and ligand binding, biological machines used for transport and signaling, these are some of the research areas that have been greatly enriched by computational approaches based on molecular simulations. The design of effective potentials for modeling solvation effects implicitly, the development of new sampling methods based on replica exchange molecular dynamics, and the exploration of rare events using network models are themes in the Levy group which run through our current research highlighted below.

New multi-scale models, effective potentials, and sampling methods for molecular simulations

Implicit Solvent Models

Water plays a fundamental role in virtually all biological processes. The accurate modeling of hydration thermodynamics is therefore essential for studying protein conformational equilibria, aggregation, and binding. Explicit solvent models provide the most detailed description of hydration phenomena, but they have inherent limitations that motivate the search for other ways to represent solvation. Implicit solvent models, which are based on the statistical mechanics concept of the solvent potential of mean force, are very useful alternatives to explicit solvation for modeling protein folding and binding.

We have developed an implicit solvent effective potential (AGBNP) that is suitable for molecular dynamics simulations and high- resolution modeling. It is based on a novel implementation of the pairwise descreening Generalized Born model for the electrostatic component and a nonpolar hydration free energy estimator. The model is fully analytical with first derivatives and is computationally efficient and has been incorporated into the IMPACT molecular simulation program.

Gallicchio, E., K. Paris, and R.M. Levy (2009). The AGBNP2 Implicit Solvent Model. J. Chem. Theory and Comput., 5, 2544-2564. DOI: 10.1021/ct900234u. PMCID: PMC2857935.

Replica Exchange Molecular Dynamics

One of the key challenges in the computer simulation of proteins at the atomic level is the sampling of conformational space. The efficiency of many common sampling protocols such as Monte-Carlo (MC) and Molecular Dynamics (MD) is limited by the need to cross high free- energy barriers and rugged energy landscapes. In the Replica Exchange (RE) algorithm many coupled simulations are run in parallel; the coupling is achieved by exchanging either thermodynamic or Hamiltonian parameters. We are exploring novel implementations of replica exchange to accelerate the convergence of protein folding simulations, and the calculation of protein-ligand binding free energies. In the latter example, the REMD coupling parameter is the protein-ligand interaction energy.

Network Models, Rare Events, and Transition Paths

We can think of the data generated by replica exchange simulations of protein folding and protein-ligand binding as generating a trace through phase space. We can organize the data using graph theoretic ideas, where the nodes are conformations and the edges represent possible jumps between closely related nodes. We use the magic of histogram re-weighting to assign relative probabilities to the jumps (edges) which connect the nodes. Furthermore, we can construct transition paths on the graphs which correspond to rare transitions between stable states, and then use the tools of transition path theory to analyze the transition path ensemble, in order to determine the number of truly different important paths and their fluxes. This is a form of multi-scale modeling where the underlying data is derived from atomic simulations, while the transition paths are constructed ex post facto.

Gallicchio, E., M. Andrec, A.K. Felts, and R.M. Levy (2005). Temperature Weighted Histogram Analysis Method, Replica Exchange, and Transition Paths. J. Phys. Chem. B, 109, 6722-6731. DOI: 10.1021/jp045294f.
Gallicchio, E., R.M. Levy, and M. Parashar (2008). Asynchronous Replica Exchange for Molecular Simulations. J. Comput. Chem., 29, 788-794. DOI: 10.1002/jcc.20839. PMCID: PMC2977925.
Zheng, W., M. Andrec, E. Gallicchio, and R.M. Levy (2007). Simulating replica exchange simulations of protein folding with a kinetic network model. Proc. Natl. Acad. Sci. USA, 104, 15340-15345. DOI: 10.1073/pnas.0704418104. PMCID: PMC2000486.
Zheng, W., M. Andrec, E. Gallicchio, and R.M. Levy 2008. Simple Continuous and Discrete Models for Simulating Replica Exchange Simulations of Protein Folding. J. Phys. Chem. B, 112, 6083-6093. DOI: 10.1021/jp076377+. PMCID: PMC2978075.

Protein folding and misfolding: fundamental physics, health-related applications

Kinetic Network Models and Protein Folding

Protein folding is a fundamental problem in modern molecular biophysics and is an example of a slow process occurring via rare events in a high-dimensional configurational space. For this reason, it is difficult for an all-atom simulation to obtain meaningful information on the kinetics and pathways of such processes. A number of strategies for addressing this problem have been proposed over the years that involve focusing on the important slow processes while neglecting the less interesting rapid kinetics by simplification of the state space or reduction of dimensionality. Generalized ensemble methods such as replica exchange molecular dynamics (REMD) have been developed that enhance the ability to obtain accurate canonical populations in complex systems by increasing sampling efficiency. However, since REMD involves temperature swaps between MD trajectories, it is not straightforward to obtain kinetic information from such simulations. We have chosen to make use of a kinetic network model in which the nodes correspond to discrete molecular conformations from REMD simulation trajectories (rather than macrostates), and the edges are derived from an ansatz based on structural similarity. Since the network is a discretized representation of the system and does not require additional energy and force evaluations, there is a considerable gain in efficiency, allowing us to study much slower kinetic processes than would be accessible using conventional MD. Since the network topology can be constructed based on virtually all degrees of freedom, this allows for multiple pathways and complex transition states. Application of the kinetic network model to study the folding of mini-proteins such as Trp-Cage, and to the conformational reorganization of the HIV-1 Protease receptor pocket, provides us with detailed information about the rich variety of transition pathways both their structural characteristics and their kinetics.

Andrec, M., A.K. Felts, E. Gallicchio, and R.M. Levy (2005). Protein Folding Pathways from Replica Exchange Simulations and a Kinetic Network Model. Proc. Natl. Acad. Sci. USA, 102, 6801-6806. DOI: 10.1073/pnas.0408970102. PMCID: PMC1100763
Zheng, W., M. Andrec, E. Gallicchio, and R.M. Levy (2009). Recovering Kinetics from a Simplified Protein Folding Model Using Replica Exchange Simulations: A Kinetic Network and Effective Stochastic Dynamics. J. Phys. Chem. B, 113, 11702-11709. DOI: 10.1021/jp900445t. PMCID: PMC2975981.

Health-related applications: α synuclein and Parkinson's disease

Unlike globular proteins with stable native structures, partially folded proteins, and natively unfolded proteins are best characterized as conformational ensembles of rapidly interconverting structures. Structural characterization of these dynamic systems is critical to understanding the basis for protein misfolding that results in protein aggregation and disease. In collaboration with Jean Baum's research group we are integrating experiments and computational models to develop a molecular picture of the steps involved in the aggregation of a synuclein, an intrinsically disordered protein that appears in aggregated forms in the brains of patients with Parkinson's disease. The conversion from the monomeric disordered state to the highly ordered aggregate fibril form is both complex and not well understood. Aggregation rates are sensitive to changes in amino acid sequence and environmental conditions. We are focusing on the early steps in the aggregation process.

Weinstock, D.S., C. Narayanan, J. Baum, and R.M. Levy (2008). Correlation Between 13C-alpha Chemical Shifts and Helix Content of Peptide Ensembles. Protein Science, 17, 950-954. DOI: 10.1110/ps.073365408. PMCID: PMC2327285.
Wu, K-P, D.S. Weinstock, C. Narayanan, R.M. Levy, and J. Baum (2009). Structural reorganization of a-synuclein at low pH observed by NMR and REMD simulations. J. Mol. Biol., 391, 784-796. DOI: 10.1016/j.jmb.2009.06.063. PMCID: PMC2766395.

The energy landscape for ligand binding to protein receptors and drug design

Understanding how to accurately predict binding free energies of small molecule ligands to protein receptor targets is a very active area of research in our lab. The statistical mechanics of binding can be formulated in many alternative ways; we are pursuing approaches particularly well suited to modern cluster computing environments with highly parallelized computation.

Gallicchio E., and R.M. Levy (2012). Prediction of SAMPL3 Host-Guest Affinities withthe Binding Energy Distribution Analysis Method (BEDAM). J Comp Aided Mol Design, 26, 505-516. DOI: 10.1007/s10822-012-9552-3. PMCID: PMC3383899.

Statistical mechanics and the free energy of binding

A new method has been developed for estimating protein-ligand affinities with implicit solvation called the Binding Energy Distribution Analysis Method (BEDAM) which is a statistical mechanics formulation for the free energy of binding based on the analysis of the distribution of binding energies between the ligand and the protein receptor obtained in a fictitious ensemble in which the ligand resides in the binding pocket without interacting with the receptor. This framework is developed based on implicit solvation, Hamiltonian replica exchange parallel molecular dynamics sampling, and reweighting techniques. The formulation takes into account ligand-receptor interactions, multiple binding modes, and conformational entropy losses and is currently being applied to a series of important medicinal targets.

Receptor reorganization in protein-ligand binding: HIV-1 Protease and Reverse Transcriptase

We are simultaneously investigating computational approaches to calculate receptor strain energies that are a component of binding free energies. The overall affinity of a ligand for a receptor can be expressed as a balance between the strength of the interactions of the ligand for a particular binding-competent conformation of the receptor and the probability of occurrence of that conformation in the absence of the ligand. For receptors that do not experience much conformational change upon binding, the affinity is primarily based on the interaction between the ligand and receptor. However, for flexible receptors such as HIV-1 reverse transcriptase (RT), HIV-1 protease and a number of kinases, the conformational changes in the receptor may require inclusion of receptor reorganization or strain energy to properly model the binding of ligands to that protein. We are exploring the use of bioinformatic techniques and simulations to model the receptor strain energy for flexible receptors.

We are also working with collaborators at Rutgers (Eddy Arnold group) and the University of Pittsburgh School of Medicine (Mike Parniak group) to identify lead compounds that will inhibit the ribonuclease H activity of HIV-1 RT. Structure based design of inhibitors for the RNase H function of RT has lagged behind inhibitor design for the polymerase function of RT due to limited structural information about the binding modes of inhibitors to the RNase H active site. Our current strategy attempts to leverage experimental information generated by our collaborators which has identified active compounds in large libraries by comparing this information with the predictions of enrichment for the same compounds when they are docked to many putative receptor sites on the RNase H domain.

Paris, K.A., O. Haq, A.K. Felts, K. Das, E. Arnold, and R.M. Levy (2009). Conformational Landscape of the Human Immunodeficiency Virus Type 1 Reverse Transcriptase Non-Nucleoside Inhibitor Binding Pocket: Lessons for Inhibitor Design from a Cluster Analysis of Many Crystal Structures. J. Med. Chem., 52, 6413-6420. DOI: 10.1021/jm900854h. PMCID: PMC3182518.
Lapelosa, M., G. Ferstandig Arnold, E. Gallicchio, E. Arnold, and R.M Levy (2010). Antigenic Characteristics of Rhinovirus Chimeras Designed in silico for Enhanced Presentation of HIV-1 gp41 Epitopes. J. Mol. Biol., 397, 752-766. DOI: 10.1016/j.jmb.2010.01.064. PMCID: PMC2940250.
Lapelosa, M., E. Gallicchio, G. Ferstandig Arnold, E. Arnold and R.M. Levy (2009). In silico Vaccine Design Based on Molecular Simulations of Rhinovirus Chimeras Presenting HIV-1 gp41 Epitopes. J. Mol. Biol., 385, 675-691. DOI: 10.1016/j.jmb.2008.10.089. PMCID: PMC2649764.

Protein stability and the evolution of drug resistance: bioinformatics and biophysics

Mutational patterns in HIV-1 Protease

Proteins evolve through random mutagenesis and their evolutionary selection is constrained by structural, functional and environmental limitations. Thermodynamic stability is by far the most important structural factor, as most proteins need to be folded in order to function. The stability range of proteins is narrow, and is estimated experimentally to be ~ 10 kcal/mol or less. Proteins operate "on a knife's edge," whereby a single highly deleterious mutation could potentially lead to an unfolded protein. Drug resistance is acquired through mutations. Primary mutations are typically destabilizing, and must be accompanied by or correlated with compensatory stabilizing mutations. We are developing statistical methods to model observed mutational patterns within proteins evolving in response to drug pressure. The system we are currently working on is HIV-1 protease for which we have access to over 40,000 publicly available amino acid sequences, isolated from patients undergoing chemotherapy. The mutational patterns in HIV protease are highly non-trivial and involve primary mutations that confer drug resistance, compensatory stabilizing effects, and viral evolvability. Our aim is to rationalize the mutational patterns using coarse grained biophysical models and to thereby relate them to the structure and energetics of the enzyme.

Computational mutagenesis methods, ranging from statistical, empirical, and physical approaches, have been useful for understanding and predicting protein stabilities. Most existing methods are limited because they are unable to reproduce correctly the magnitude of the free energy change (ddG), the difference in free energy of unfolding between wild-type and mutant proteins (> 1.0 kcal/mol deviation from experimental ddG values). In order to overcome this problem, we are developing a computational approach that uses sampling combined with a linear interaction energy (LIE) method to predict the changes in the free energy of the native state induced by mutations. Initial tests of the method have shown an unsigned error between calculated and experimental values of <1 kcal/mol. Further development of the LIE functional form in combination with sampling techniques is an active, promising project which will allow for more accurate prediction of mutational effects on protein stability which are important to the understanding of drug-resistant mutations that are formed in many protein targets including HIV-1 protease.