# UWHAM and SWHAM

## Table of Contents

## 1 Introduction

### 1.1 UWHAM

The unbinned weighted histogram analysis method (UWHAM) is an algorithm for multi-state free energy estimation and thermodynamic reweighting developed in the Levy Group by Emilio Gallicchio in collaboration with Zhiqiang Tan of Rutgers University. UWHAM is a binless extension of traditional WHAM.

Suppose \(M\) parallel (independent or coupled) simulations in the canonical ensemble are run at \(M\) \(\lambda\)-states. Let \(X_{i}^{(\alpha)}\) be the \(i^{th}\) microstate observed at the \(\alpha^{th}\) \(\lambda\)-state. The probability of observing \(X_{i}^{(\alpha)}\) at the \(\gamma^{th}\) \(\lambda\)-state is

\begin{equation} P_{\gamma}(X_{i}^{(\alpha)})\sim\frac{q_{\gamma}(\{x\}_{\alpha i})}{Z_{\gamma}} =\frac{\exp\{-\beta_{\gamma}E_{\gamma}(\{x\}_{\alpha i})\}}{Z_{\gamma}}\,, \end{equation}where \(q_{\gamma}(\{x\}_{\alpha i})=\exp\{-\beta_{\gamma}E_{\gamma}(\{x\}_{\alpha i})\}\) is the unnormalized probability of observing \(X_{i}^{(\alpha)}\) during a simulation in the canonical ensemble at the \(\gamma^{th}\) \(\lambda\)-state; \(\{x\}_{\alpha i}\) is the coordinates of the microstate \(X_{i}^{(\alpha)}\); \(\beta_{\gamma}\) is the inverse temperature of the \(\gamma^{th}\) \(\lambda\)-state; \(E_{\gamma}(\{x\}_{\alpha i})\) is the potential energy of the microstate \(X_{i}^{(\alpha)}\) at the \(\gamma^{th}\) \(\lambda\)-state; and \(Z_{\gamma}\) is the partition function of the \(\gamma^{th}\) \(\lambda\)-state. The UWHAM equations are

\begin{align} \hat{Z}_{\alpha} & =\sum_{i=1}^{N}q_{\alpha}(u_{i})\hat{\Omega}(u_{i})\nonumber \\ \hat{\Omega}(u_{i}) & =\frac{1}{\sum_{\kappa=1}^{M}N_{\kappa}\hat{Z}_{\kappa}^{-1}q_{\kappa}(u_{i})}\,,\label{eq:uwham} \end{align}where \(N=\sum_{\alpha=1}^{M}N_{\alpha}\) is the total number of observations; \(u_{i}\) is the reduced (energy) coordinate of the \(i^{th}\) observation and \(\Omega(u_{i})\) is its density of state.

### 1.2 SWHAM

There is a computational bottleneck in scaling up UWHAM. Suppose there are \(M\) \(\lambda\)-states and totally \(N\) observations to analyze. At minimum,numerical solution of the UWHAM equations requires evaluating M unnormalized density functions for each observation. The total number of function evaluations is of order \(nM^2\), where \(n=N/M\) is the average sample size per \(\lambda\)-state. These unnormalized density values need to be either computed during every iteration of the numerical solution or precomputed and stored in memory. Such a high computational cost presents a serious limitation on the use of UWHAM for large-scale simulations. To remove the computational bottleneck, Levy group and Zhiqiang Tan developed stochastic solver of UWHAM equations by using the protocols of Generalized-ensemble algorithms.

#### 1.2.1 RE-SWHAM

RE-SWHAM is an algorithm we developed to solve the UWHAM equations stochastically by applying the replica exchange (RE) protocol to resample the raw data generated by multi-state simulations. Beforehand, the observations observed at each \(\lambda\)-state are collected to serve as the database for that \(\lambda\)-state. Then RE-SWHAM analyses are run by cycles like RE simulations. Each cycle consists of a ``move'' procedure and an ``exchange'' procedure. During the move procedure of RE-SWAHM, an observation in the database of a \(\lambda\)-state is randomly chosen with equal probability to associate with the replica at that \(\lambda\)-state. This is analogous to the situation that the MD simulation in the move procedure of explicit RE is long so that there is no correlations between the initial and final structures of the MD simulation. During the exchange procedure of RE-SWHAM, we randomly choose two replicas and attempt to swap them based on the Metropolis criterion. If the swap is accepted, in addition to swapping the replicas, the observation associated with the replica is also swapped to the database of the other \(\lambda\)-state. Usually this step is repeated multiple time to approach the infinite swapping limit for the best sampling efficiency. At the end of the exchange procedure, the observation associated with replica at each \(\lambda\)-state is recorded as the output of that \(\lambda\)-state like RE simulations. The output of RE-SWHAM at each \(\lambda\)-state is a sample of all observations according to their UWHAM weights.

#### 1.2.2 ST-SWHAM

ST-SWHAM is an algorithm we developed to solve the UWHAM equations stochastically by applying the serial tempering (ST) protocol to resample the raw data generated by multi-state simulations. Like the RE-SWHAM analysis, the observations observed at the each \(\lambda\)-state are collected to serve as the database for that \(\lambda\)-state beforehand. Then ST simulations are run by cycles, and each cycle consists of a ``move'' procedure and a ``jump'' procedure. During the move procedure of ST-SWAHM, an observation in the database of a \(\lambda\)-state is randomly chosen with equal probability to associate with the walker at that \(\lambda\)-state. During the jump procedure, the walker jumps to the \(\alpha^{th}\) \(\lambda\)-state according to probability

\begin{equation} p(\alpha|u_{i};\zeta,\pi^{0}) =\frac{\pi_{\alpha}^{0}e^{\zeta_{\alpha}}q_{\alpha}(u_{i})}{\sum_{\kappa=1}^{M}\pi_{\kappa}^{0}e^{\zeta_{\kappa}}q_{\kappa}(u_{i})}\,, \label{eq:ST_jump} \end{equation}where \(\zeta_{\kappa}=-\ln Z_{\kappa}\) is the unitless free energy of the \(\kappa^{th}\) \(\lambda\)-states; \(\pi_{\kappa}\) is the observed proportion of the \(\kappa^{th}\) \(\lambda\)-state during the ST-SWHAM analysis; and \(\pi_{\kappa}^{0}=N_{\kappa}/N\) is the proportion of the raw data generated at the \(\kappa^{th}\) \(\lambda\)-state during the multi-state simulations.The values of \(\zeta_{\kappa}\) are adjusted during the analysis of ST-SWHAM until the observed proportion of the replica being at the \(\kappa^{th}\) \(\lambda\)-state \(\pi_{\kappa}\) agrees with \(\pi_{\kappa}^{0}\). It can be shown that \(\zeta_{\kappa}\) is the UWHAM estimate of free energy of the \(\kappa^{th}\) \(\lambda\)-state when every pair of \(\pi_{\kappa}\) and \(\pi_{\kappa}^{0}\) agree with each other.

The jump of the replica following Eq.(\ref{eq:ST_jump}) was referred to as the global jump proposal in Ref(3) because the replica can reach any \(\lambda\)-state of the system by one jump. According to Eq.(\ref{eq:ST_jump}), every jump of the replica requires calculations of \(M\) exponential functions, where \(M\) is the total number of \(\lambda\)-states. When the total number of states is large, ST-SWHAM analyses using the global jump proposal also cost long time to converge. In our software package, we implemented a much faster approximation solver of UWHAM — ST-SWHAM using a local jump proposal. This algorithm was referred to as local WHAM in Ref(3) because the replica can only be at the \(\lambda\)-states that are the local neighbors of the initial \(\lambda\)-state at the end of the jump procedure if the number of jumps per cycle is finite. Suppose the replica that associates with the observation \(u_{i}\) is at the \(\gamma^{th}\) \(\lambda\)-state before jumping. The procedure of performing one jump in local WHAM is as follows:

- select a trial \(\lambda\)-state with uniform probabilities from the nearest neighbors of the \(\gamma^{th}\) \(\lambda\)-state, suppose the chosen \(\lambda\)-state is the \(\alpha^{th}\) \(\lambda\)-state
accept the \(\alpha^{th}\) \(\lambda\)-state as the new \(\lambda\)-state to jump to according to the Metropolis probability

\begin{equation} \min\left\{1,\frac{\Gamma(\alpha,\gamma)}{\Gamma(\gamma,\alpha)}\frac{p(\alpha|u_{i};\zeta,\pi^{0})}{p(\gamma|u_{i};\zeta,\pi^{0})}\right\}\,,\label{eq:ST-SWHAM_Metropolis} \end{equation}where \(p(\alpha|u_{i};\zeta,\pi^{0})\) and \(p(\gamma|u_{i};\zeta,\pi^{0})\) are defined by Eq.(\ref{eq:ST_jump}); and \(\Gamma(\gamma,\alpha)\) is the probability of choosing the \(\alpha^{th}\) \(\lambda\)-state as the trial \(\lambda\)-state when the replica is at the \(\gamma^{th}\) \(\lambda\)-state originally. Namely, \(\Gamma(\gamma,\alpha)=1/n_{\gamma}\), where \(n_{\gamma}\) is the total number of the nearest neighbors of the \(\gamma^{th}\) \(\lambda\)-state if the \(\alpha^{th}\) \(\lambda\)-state is one of the nearest neighbors of the \(\gamma^{th}\) \(\lambda\)-state; \(\Gamma(\gamma,\alpha)=0\) otherwise.

As can be seen, the replica can only be at the original \(\lambda\)-state or one of its nearest neighbors after one jump. However, the replica can diffuse further away from the original \(\lambda\)-state by repeating this one jump procedure multiple times. When the number of jumps per cycle increases to infinite, the results of local WHAM converges to the results of ST-SWHAM that uses the global jump proposal. Therefore, the jump of the replica following Eq.(\ref{eq:ST_jump}) is the infinite jump limit in serial tempering simulations, which is an analogous of the infinite swapping limit in replica exchange simulations.

### 1.3 Stratified RE-SWHAM

When applying UWHAM and its stochastic solvers, the base assumption is that the simulations at each λ-state are “approximately” equilibrated. However, this assumption might not always hold. To handle such situations, we developed an analysis tool called Stratified-UWHAM to compute free energy and expectations from a multi-state ensemble when the simulations at a subset of λ-states are far from being equilibrated.

The Stratified-UWHAM algorithm is based on coarse-graining. Firstly, we coarse-grained the whole phase space into macrostates, which correspond to energy basins separated by free energy barriers. Then all the \(\lambda\)-states in the system are divided into two groups \((S_{1}\:,S_{2})\). For the \(\lambda\)-states in the first group \(S_{1}\), the simulations are approximately equilibrated among all the macrostates. Namely, all the macrostates are well connected at each \(\lambda\)-state in group \(S_{1}\). For the \(\lambda\)-states in the second group \(S_{2}\), the simulations are only locally equilibrated. Namely, the coarse-grained set of macrostates forms a disconnected network of microstate clusters at each \(\lambda\)-state in group \(S_{2}\). We assume that the set of observations \(\{X_{i}^{(\alpha)}:i=1,2,\cdots N_{\alpha}\}\) observed at the \(\lambda\)-states in the \(S_{1}\) group is independent drawn from distribution

\begin{equation} P(X_{i}^{(\alpha)})\sim\frac{q_{\alpha}(\{x\}_{\alpha i})}{Z_{\alpha}}\,\,\,\textrm{for}\,\alpha\in S_{1}\,. \label{eq:dist_S1} \end{equation}And the set of observations \(\{X_{i}^{(\alpha)}:X_{i}^{(\alpha)}\in R_{k},i=1,2,\cdots N_{\alpha}\}\) observed at the \(\lambda$\)-states in the \(S_{2}\) group is independent drawn from distribution

\begin{equation} P(X_{i}^{(\alpha)}|(X_{i}^{(\alpha)}\in R_{k}))\sim\frac{q_{\alpha k}(\{x\}_{\alpha i})}{Z_{\alpha k}}\,\,\,\textrm{for}\,\alpha\in S_{2}\,, \label{eq:dist_S2} \end{equation}where \(q_{\alpha k}(\{x\}_{\alpha i})=q_{\alpha}(\{x\}_{\alpha i})\delta(\{x\}_{\alpha i}\in R_{k})\). Here \(\delta(\{x\}_{\alpha i})\in A\) is the indicator function for a macrostate cluster \(A\); \(Z_{\alpha}\) and \(Z_{\alpha k}\) are the partition functions. The likelihood function of the simulation data of this model is

\begin{equation} \prod_{\alpha\in S_{2}}\prod_{k}\prod_{i:X_{i}^{(\alpha)}\in R_{k}}\left[\frac{1}{Z_{\alpha k}}q_{\alpha}(u_{\alpha i}) \Omega(u_{\alpha i})\right]\times\prod_{\alpha\in S_{1}}\prod_{i=1}^{N_{\alpha}} \left[\frac{1}{Z_{\alpha}}q_{\alpha}(u_{\alpha i})\Omega(u_{\alpha i})\right]\,, \label{eq:SUWHAM_likelihood} \end{equation}where \(u_{\alpha i}\) is the reduced (energy) coordinate of the microstate \(X_{i}^{(\alpha)}\).

The estimating equations from the maximization of Eq.(\ref{eq:SUWHAM_likelihood}) can be solved in the form of UWHAM equations Eq.(\ref{eq:uwham}) with an expanded set of \(\lambda\)-states. When solving the Stratified-UWHAM equations by solving the original UWHAM equations with an expanded set of \(\lambda\)-states, the total number of \(\lambda\)-states of the system increases from \(M_{1}+M_{2}\) to \(M_{1}+\sum_{\alpha=1}^{M_{2}}K_{\alpha}\), where \(K_{\alpha}\) is the total number of disconnected macrostate clusters at the \(\alpha^{th}\) \(\lambda\)-state. Apparently, it is more computationally expensive to solve the Stratified-UWHAM equations. We proposed a stochastic solver for the Stratified-UWHAM equations called Stratified RE-SWHAM, which is illustrated in the following figure. Stratified RE-SWHAM is a mutant of RE-WHAM. There are two differences between these two algorithms. First, when the observations observed at each \(\lambda\)-state are collected to serve as the database for that \(\lambda\)-state, each observation is tagged by the macrostate which it belongs to. The second difference is the Move procedure in the replica exchange cycle. For the \(\lambda\)-states in the \(S_{1}\) group, an observation in the database is randomly chosen with equal probability to associate with the replica at that \(\lambda\)-state. However, for the \(\lambda\)-states in the \(S_{2}\) group, an observation which is in the same connected macrostate cluster as the observation associated with the replica at that \(\lambda\)-state is chosen with equal probability. In Ref 4, we proved that the output os Stratified RE-SWHAM at a \(\lambda\)-state is the estimate of the equilibrium distribution of that \(\lambda\)-state according to the Stratified-UWHAM equations.

## 2 Obtaining UWHAM and SWHAM

The source code of UWHAM and SWHAM can be found under the following Github repository:

https://github.com/pittbin/UWHAM-and-SWHAM

## 3 Documentation and Tutorials

### 3.1 Manuals

### 3.2 Examples

The python scripts used in these tutorials were tested by "Python 3.6.3 :: Anaconda, Inc.".

- 01) Umbrella Sampling: Double Well Potential
- 02) Umbrella Sampling: Alanine Dipeptide
- 03) Temperature Replcia Exchange: Alanine Dipeptide
- 04) Free Energe Perturbation: Solvation Free Energy
- 05) Hamitonian Replica Exchange: BEDAM and Binding
- 06) Temperature and Hamiltonian Replica Exchange
- 07) Two Binding Modes of the \(\beta\)-cyclodextrin Heptanoate Complex

## 4 Getting help

Contact Dr. Bin Zhang through email (bin.w.zhang AT temple DOT edu).

## 5 references:

- Theory of binless multi-state free energy estimation with applications to protein-ligand binding (link)
- A Stochastic Solution to the Unbinned WHAM Equations (link)
- Locally Weighted Histogram Analysis and Stochastic Solution for Large-Scale Multi-State Free Energy Estimation (link)
- Stratified UWHAM and Its Stochastic Approximation for Multicanonical Simulations Which Are Far from Equilibrium (link)