Two Dimensional Umbrella Sampling

1 Introduction

This example shows how to apply UWHAM and ST-SWHAM to analyze the data generated by two dimensional Umbrella sampling and construct the free energy profile.

1.1 System

We construct the free energy profile (the Ramachandran plot) of an alanine dipeptide (AlaD) molecule in implicit solvent. The simulations were preformed by using the GROMACS 5.1.2 simulation package with the Amber99SB force field and the OBC GB implicit solvent model. To explore the free energy surface, we applied $$24 \times 24$$ parabolic potential restraint to perform Umbrella Sampling by using the PLUMED plugin. The biasing potentials are

$$U_u[i][j](\phi, \psi) = \frac{1}{2} k_{\phi}[i] (\phi - \phi_0[i])^2 + \frac{1}{2} k_{\psi}[j] (\psi - \psi_0[j])^2 \,, \label{eq:2Dbias}$$

where $$k_{\phi}[i] = k_{\psi}[j] = 200 kJ/(mol \times (rad)^2)$$ are the force constants of the restraint potentials. The minimum positions of the restraint potentials are defined by $$\phi_0[i] = (2\pi/24) \times (i-0.5)-\pi$$ and $$\psi_0[j] = (2\pi/24) \times (j-0.5)-\pi$$, where $$i$$ and $$j$$ range from $$1$$ to $$24$$. For each umbrella, we ran a $$10\ ns$$ independent simulation. The configuration of the AlaD molecule was recorded every $$2\ ps$$. Therefore, there are $$24 \times 24=576$$ states in this example. For each state, the independent simulation generated $$5000$$ data points.

2 Files

2.1 Data Files

• data.tgz: Unzipping this file generates a directory called "Data" which contains $$576$$ data files. Each data file contains three columns of data. The numbers in the first column are the times of data points when they are recorded. The data in the second column and the third columns are the time sequences of the $$\psi$$ and $$\phi$$ dihedral angles of AlaD respectively.
• info.dat: This file contains five columns of data. The first column shows the relative path of each data file. The second and the third columns show the minimums of each restraint potential of Umbrella Sampling — $$\phi_0[i]$$ and $$\psi_0[j]$$ respectively. The fourth and the fifth columns are the force constants of each restraint potential $$k_{\phi}[i]$$ and $$k_{\psi}[j]$$ respectively. Note that in this example all the $$k_{\phi}[i]$$ and $$k_{\psi}[j]$$ are the same, but they can be different if necessary.

2.2 Python Scripts

• prepare.py: Python script generates the input file for UWHAM.
• reweight.py: Python script gets rid of the bias of the output of UWHAM.
• 2DHistogram.py: Python script constructs a two dimensional histogram.

2.3 Others

• AlaD_us_uwham.gnu: input file for gnuplot to make pictures based on the UWHAM analysis.
• AlaD_us_swham.gnu: input file for gnuplot to make pictures based on the SWHAM analysis.

3 UWHAM Analysis

Please read the manual of UWHAM before practicing. Note this analysis requires to generate a 21 GB large input file first. It takes about 8 hours for UWHAM program to converge when using a single thread of an Intel(R) Xeon(R) CPU (E5-2660 v3 2.60GHz). The minimum requirement of memory to run this analysis is 64 GB.

1. The first step is to construct the biasing matrix and create the input file for UWHAM. As we explained previously, there are $$576$$ states in this example. Therefore, the input file contains $$576$$ data blocks. The first line of each block is a comment line starting with a "#" sign. After the comment line, there is a matrix contains $$5000$$ rows (corresponding to the $$5000$$ data points observed at this state) and $$576$$ columns (corresponding to the total number of states). The matrix element at the $$n^{th}$$ row and the $$m^{th}$$ column, where $$m=(i-1) \times 24 + j$$ ($$i$$ and $$j$$ are the $$\phi$$ and $$\psi$$ dihedral angel indexes), represents the bias potential for the $$n^{th}$$ observation at the $$m^{th}$$ state

$$U_b[n][m] = \frac{1}{2} k_{\phi}[i] (\phi[n] - \phi_0[i])^2 + \frac{1}{2} k_{\psi}[j] (\psi[n] - \psi_0[j])^2 \,. \label{eq:bias}$$

The input for UWHAM can be generated by running the python script:

./prepare.py -f info.dat > input4uwham.dat


Note in this example, the force constants are in units of $$kJ/(mol \times (rad)^2)$$.

2. Run UWHAM

/path of UWHAM/uwham.o -d input4uwham.dat -i 2 -t 300 -q 5 -m 1 -e 1.0e-10 -w 0


The explanations for options are as follows:

 -d input4uwham.dat read the file "input4uwham.dat" for the biasing matrix -t 300 all the simulations were run at $$300\ K$$ -i 2 the unit of input data is $$kJ/mol$$ -q 5 -m 1 run the self-consistent iteration 5 times, then switch to the Convex Newton-Raphson iteration -e 1.0e-10 stop after the relative changes the unitless free energy estimates are smaller than 1.0e-10 -w 0 write the weight of each data point at the $$\lambda$$-state at which it was observed into the "weight.data" file
3. Reweighting:
• The file "weight.data" contains a $$576,000 \times 1$$ matrix. Each row corresponding to one observations. For the $$k^{th}$$ observation $$u_k$$, which is the $$n^{th}$$ observation observed at the $$m^{th}$$ state (where $$k = (m-1) \times 5000 + n$$.), the unnormalized probability of observing it at the $$m^{th}$$ state,

$$w(u_k, m) = \Omega(u_k) \exp\{-\beta U_b[n][m]\} \,. \label{eq:weight}$$

is the number in the $$k^{th}$$ row of the "weight.data" file. Here $$U_b[n][m]$$ is the element at the $$n^{th}$$ row and $$m^{th}$$ column of the $$m^{th}$$ biasing matrix (block); $$\Omega(u_k)$$ is the effective density of state

$$\Omega(u_k) \propto \Omega_0(u_k) \exp\{-\beta U(u_k)\} \,, \label{eq:eDoS}$$

where $$\Omega_0(u_k)$$ is the (traditional) density of state, and $$U(u_k)$$ is the total potential energy of AlaD defined by the Amber99SB force field and the OBC GB implicit solvent model.

• To construct the two dimensional free energy profile, we need to calculate the probability of observing each data point when there is no bias applied to the system, namely, $$\Omega(u_k)$$ according to Eq.(\ref{eq:eDoS}). For the $$k^{th}$$ observation — the $$n^{th}$$ observation at the $$m$$ state — this probability is

$$p(u_k) \propto \Omega(u_k) = w(u_k, m)/\exp\{-\beta U_b[n][m]\} \,. \label{eq:reweighting}$$

This reweighting procedure is done by running

./reweight.py -w weights.data -f info.dat > uwham_results.dat

• The file "uwham_results.dat" contains the final results of the UWHAM analysis. The first and the second columns in the file "uwham_results.dat" are the $$\phi$$ and $$\psi$$ coordinates of each observation, and the third column is the unnormalized probability of observing this observation when there is no bias applied to the system.
4. Plot the free energy profile
• First we construct a two dimensional histogram based on the results of the UWHAM analysis by running the command line

./2DHistogram.py -f uwham_results.dat -i 1,2 -w uwham_results.dat -j 3 -n -3.1415926,-3.1415926 -x 3.1415926,3.1415926 -b 90,90 -e -o 1 > AlaD_us_uwham.hist


The explanations for options are:

 -f uwham_results.dat -i 1,2 read the first and the second columns of the file "uwham_results.dat" as the coordinates of data points -w uwham_results.dat -j 3 read the third column of the file "uwham_results.dat" as the weights for data points -n -3.1415926,-3.1415926 the minimum of each dimension is -3.1415926 -x 3.1415926,3.1415926 the maximum of each dimension is 3.1415926 -b 90,90 totally 90 bins in each dimension of the histogram -e calculate the free energy for each bin -o 1 change the unit of free energy estimates from $$k_B T$$ to $$kcal/mol$$

The output file "AlaD_us_uwham.hist" contains four columns of data. The first and the second columns are the $$phi$$ and $$psi$$ values which together decide the position of each bin; the third column is the probability density (height) of each bin; and the fourth column is the free energy estimate in unit of $$kcal/mol$$. Note this histogram has been normalized.

• You can plot the two dimensional free energy surface using the first, the second and the fourth columns of the file "AlaD_us_uwham.hist". This can be done by running gnuplot (>v5.0) and loading the file "AlaD_us_uwham.gnu". If nothing goes wrong, the resulting plot should look like this

4 ST-SWHAM Analysis

Next we apply ST-SWHAM to analyze the data.

1. First, run the following shell script to generate the input file for ST-SWHAM:

for i in {01..24}
do
for j in {01..24}
do
echo "# state" >> input4swham.dat
perl -wanl -e ' printf("%12g %12g\n", $F[1],$F[2]) ' Data/colvar-$i-$j.dat >> input4swham.dat
done
done


As can be seen, this script combines all the information of observations together, and separate them into $$576$$ blocks by using lines that begin with a "#" character. For each observation, two properties, the $$\phi$$ and $$psi$$ dihedral angles of the configuration, are recorded. We also need to change the "info.dat" a little bit to prepare an input file containing the information of all $$\lambda$$-states:

perl -wnal -e ' printf("%12g %12g %12g %12g 6.2831853 6.2831853\n", $F[3],$F[4], $F[1],$F[2]) ' info.dat > stateinfo.dat


Note the last two columns tell the SWHAM program that the period of the Ramachandran plot is $$2 \pi$$.

2. Run ST-SWHAM:

mv weights.data uwham_weights.data

/path of SWHAM/swham.o -d input4swham.dat -f stateinfo.dat -u 5 -t 300 -i 2 -q 30 -n 90 -g 30 -m -s 1-576 -w


The explanations for options are:

 -d input4swham.dat read the file "input4swham.dat" for the information of observations -f stateinfo.dat read the file "stateinfo.dat" for the information of states -u 5 use the type 5 potential function implemented in SWHAM program -i 2 the potential energies in the "input4swham.dat" are in units of $$kJ/mol$$ -t 300 all the simulations were run at $$300\ K$$ -g 30 -m run ST-SWHAM and print out the estimates of free energies every $$30,000,000$$ cycles -q 30 -n 90 -m equilibrate the system for $$30,000,000$$ cycles first ; then run ST-SWHAM for $$90,000,000$$ cycles -s 1-576 print out the free energy estimates for all $$\lambda$$-states -w write the weight of each data point at the $$\lambda$$-state at which it was observed into the "weight.data" file

Note that we renamed the "weights.data" generated by UWHAM so that it will not be overwritten by the "weights.data" file generated by ST-SWHAM.

3. Then we need to reweight the probability of each observation. This step is as the same as the one in the UWHAM analysis

./reweight.py -w weights.data -f info.dat -t 300.0 > swham_results.dat

4. Now we can construct a two dimensional histogram based on the results of the ST-SWHAM analysis:

./2DHistogram.py -f swham_results.dat -i 1,2 -w swham_results.dat -j 3 -n -3.1415926,-3.1415926 -x 3.1415926,3.1415926 -b 90,90 -e -o 1 > AlaD_us_swham.hist


You can plot the two dimensional free energy surface using the first, the second and the fourth columns of the file "AlaD_us_swham.hist". This can be done by running gnuplot (>v5.0) and loading file "AlaD_us_swham.gnu". The resulting plot should look like this

5. The previous ST-SWHAM analysis (step 2)) uses the global jump algorithm. It costs about two and half hours to finish when using a single thread of an Intel(R) Xeon(R) CPU (E5-2660 v3 2.60GHz). However, It costs much less time to run this analysis by using the local jump algorithm

/path of SWHAM/swham.o -d input4swham.dat -f stateinfo.dat -u 5 -t 300 -i 2 -q 30 -n 90 -g 30 -m -s 1-576 -w -x 12


Note only one option '-x 12' is added to the command line. The new option means to try 12 local jump attempts per ST cycle.

Created: 2019-09-18 Wed 12:25

Validate