Klauda Lab Simulation Wiki

Umbrella Sampling Method for PMF Created by Robert Allsopp

This is an implementation of umbrella sampling to generate a PMF in NAMD systems this example uses distanceZ for a Z dimensional reaction coordinate.

Downolad the files from here instructions.tar.gz , this link has updated files for different systems to run PMF pulling or setup windows in bash vs csh for example pmf_files.tar

Download these python files if to automate various things, don't need to individually restrain each lipid with a colvar though it is helpful to parallelize the jobs and setup the windows and they were updated here: key_files_for_reference.tar

This attached code works for Matlab 2021b to plot the exact histogram to see how good is the overlap is between the windows. dmps_1.tar

The best way to do membrane restraints it actually to use the harmonic restraints specified with the VMD .ref process, in this case the beta values of the “segname MEMB and name C2” are what need to be specified to be restrain

Good spring constants for smaller molecules such as ethanol are 1-1.5 and good values for peptides appear to be between 5-10.

Process Summary:

Setp 1: Implementing a moving Colvar (Collective Variables) to restrain the peptide, antibiotic… along the reaction coordinate. Umbrella sampling is required to move and hold the object of interest as it populates a path along the reaction coordinate that isn't favorable. This is then used to generate “windows” individual simulations at different points along the reaction coordinate, that are equilibrated and then run to collect data that is processed by WHAM.

Step 2: Implement a stationary Colvar that holds the peptide, antibiotic, or object of interest in place while it equilibrates for a 1-4ns

Step 3: Running simulation for official data collecting and calculating the PMF using WHAM. WHAM lookes at the deviation between the equilibrium coordinates of the spring and the spring constant to calculate the reaction coordinate.

These pulling simulations are short and could run in the debug mode change this in the .csh file.

   #!/bin/bash

   sbatch -A energybio dyn.csh

Step 1: Copy the files in the instructions file into the directory. dyn_Umbrella.csh is a modified version of the dyn.csh file that implements the moving colective variable restraint. The csh file does all the work with sed and echo commands to update the position of the restraints and pass the information down a line of files.

Before moving on be sure to recenter the trajectory so that most water is above the peptide or antibiotic, there still needs to be some or else the lipids will be too close to the PBC. The recenter_PMF folder is the exact same procedure as the one used for the EDP but the references to MEMB are replaced different selections and can be adjusted for different systems. Run this on the last dcd file in the trajectory of data.

Run this by updating the file that should be run, the paths if needed, and the toppar and toppar.str file and folder also update the cryst.str file, this is done by opening the dyn???.xsc.gz file gunzip it if needed to remove the .gz and copy the three numbers in it the first two numbers separated by a bunch of zeros should be the same and the third is the z dimension. Then run it with:

   The main thing to change is in the recenter3.inp scroll down to see: 
   coor stat sele segid MEMB .and. resid 1 : 60 .and. type C213 end
   This has selected the membrane and residues 1 to 60 either (the top) or the bottom 
   would be 61:120, and type C213. To control the position or change for a different 
   lipid type adjust the C213 and the resid 1 : 60. Download the step5_assembly.psf 
   and the last dyn150.dcd file into VMD and go to Graphics->Representations->Create Rep
   ->under atom selection select segname PROA, coloring method ResType, Drawing Method 
   NewCartoon, select segname MEMB. Then select in the main selection window mouse -> 
   query and click on different positions of the lipid chain The new name of the atom 
   that is chosen to center around is the in the row that corresponds to the 
   "name: H14R" for example in the black VMD window that looks like command prompt.

   ./recenter1.scr
   ./recenter2.scr
   ./recenter3.scr

Wait after each command until the previous one leaves the queue check with squeue -u $username

This will recenter around a certain carbon atom.

   It my need to be adjusted to a different leaflet (1-60 changed to 61-120 and it depends on the number of lipids in the system) or a 
   different carbon position if there isn't enough water then the peptide will not be able to fully solvate.

Then copy the file for instance dyn150-recenter.dcd to the mkpdb_2 folder to make a PDB and PSF of the file for restarting the program. To run this program it is necessary to open the getframe.inp file and check the paths and the names of the files that will be saved. Then type:

   ./getframe.scr

Now the system should be shifted so that much of the water in the system is above the peptide or antibiotic, and the PDB itself can be viewed in VMD.

Then open the restart.inp file and check that the setup is correct like the temperature and stuff.

Then in the restart.inp file check these lines, and go back to the mkpdb_2 folder and open the getframe.out file and scroll down to the bottom and copy the ?XTLA ?XTLB ?XTLC values and past them in the a, b, c values below

    cellBasisVector1     $a   0.0   0.0;        # vector to the next image
    cellBasisVector2     $d    $b   0.0;
    cellBasisVector3    0.0   0.0    $c;
    cellOrigin          0.0   0.0 $zcen;        # the *center* of the cell

    cellBasisVector1     65.0759254   0.0   0.0;        # vector to the next image
    cellBasisVector2     $d    65.0759254   0.0;
    cellBasisVector3    0.0   0.0    79.2041191;
    cellOrigin          0.0   0.0 $zcen;        # the *center* of the cell

The step5_input.colvar_pulling_restart.str file is setup so that there is a weak spring on all the important atoms. To change this for another system it is important to change the atom types and resid numbers to match ones used in the system. Select to atoms to that will be restrained by inputting the atom numbers that match in the .pdb or .crd file. I used a pure system initially so that I could identify the common difference in the numbering and generate the list in Excel and transpose and paste it into the .str file.

   main {
        atomNumbers 307	441	575	709	843	977...
        }

This selects all the N atoms at the top of the lipid. For the top and bottom respectively to measure the position for later use in the restart_after_recenter.traj file.

It is also good to check the approximate positioning of the peptide and membrane heavy atoms that were selected. I load into VMD and use the representations 1) segname PROA 2) segname MEMB and name N (or whatever the heavy atom is and set Drawing Method to Licorice) then click on the different representations in query mode and find the Z position in the other window and update the positions in the .str file.

Run the restart.csh file by typing:

    sbatch restart.csh

The membrane restraints are required to prevent the lipids from comming off when the peptide is pulled off, and the peptide can't be pulled across the Periodic Boundary Condition so more water is should be partitioned to one side than the other.

Check by downloading the restart.psf and the restart_after_recenter.dcd file. Load them into VMD and first the .psf then the .dcd, and under Graphics→ representations select create Rep change the selected Atoms in the first to be segname PROA or the name of the antibiotic in the PDB. A second selection of segname MEMB, and a third selection of water. This should make clear where the water is and make sure the system is correct before proceeding. Double click on the selection box to turn water on or off. Or other components on and off.

What needs to be changed now is the dyn-2.inp files the bottom part needs to match the example. I think the run line is redundant. Everything in the file should corespond to something in the existing file except for a couple of things at the end. If in doubt it is possible to check with the step7*inp file since that is the template of the dyn-2.inp. There are also some lines at the top of the step7*inp that are missing in the dyn-2.inp file that go to the dyn-1.inp so don't worry about them. Just eliminate the lines at the end and save over dyn-2.inp.

Next it is important to update and define the atoms that are going to be held by the restraint. There are a few options all of which have been implemented to show the different scenarios. Below is an introduction to different constraint syntax but I strongly recommend restraining each atom one at a time since otherwise the membrane will fluctuate too much and the lipids around the peptide will almost come off the membrane.

      psfSegID PROA
atomNameResidueRange CA 1-20

or if it is just one atom it is also possible to use this.

atomNumbers 302

Lastly there is a third option that is similar to the first option that uses VMD. This is primarily used for writing a constraints file to preserve the secondary structure, but cannot be used along with a colvar for umbrella sampling with WHAM because it biases the system. It might be possible to use with ABF though, but you'll have to look into using it, but something similar could also be used to do a colvar if needed. Set up constraints if needed on the object that is being pulled.

1. Open the pdb file in vmd

2. Then in vmd console:

set sel [atomselect top all]

$sel set beta 0

$sel set occupancy 0

set smd [atomselect top “protein and resid 1 and name CA”]

$smd set occupancy 1

$sel writepdb prot_smd.ref

$sel set occupancy 0

set fixed [atomselect top “protein and not resid 1 and backbone and noh”]

$fixed set beta 1

$sel writepdb prot_fixed.ref

This is helpful to generate a constraints file similar to how it is written in the first membrane restraints.

Basically this is how to tell namd which atoms to connect onto the spring.

   In practice it is best to use the step5_input.colvar_pulling_flatten_template.str file as a starting point. Make sure there are restrains 
   for each lipid in the system and rename then according to if they are in the upper or lower leaflet. Next pick a heavy atom in the head 
   group of the lipid. If the membrane is a pure single component membrane then each atom is spaced at regular intervals so manually update 
   he atom number for each entry. For a multicomponent membrane it is more work and it might not be the same atom each time, and if that is 
   the case then you've got to select them in the step5_input.colvar_pulling_restart.str file to get the average position as well.
   
   Then update the position of the centers of the individual atoms from the restart_after_recenter.traj file matching the average of the 
   Upper_full and Lower_full to the individual centers of each lipid in the upper and lower leaflet.

Adjust and calibrate the constraint constants as needed.

The second thing that needs to be updated are these two lines:

    set center = `echo "$center/4 -0.25" | bc -l`
    set targetcenter = `echo "$targetcenter/4 - 0" | bc -l`

These two lines define the starting and ending points specifically the -15.25 and the -15 lines are the starting center and the targetcenter. Then this value gets incremented by $center/4, (whatever is inside the center file divided by 4). The exact position is found in the restart_after_recenter.traj file under the peptide_pull column, and then subtract 0.25 from it for centers line and don't subtract anything from it for the target centers line since it gets increased by 0.25 in the first step. Targetcenter is the end point of the center of mass that needs to be defined before use. The center refers to the center of mass of the defined peptide, antibiotic … and is a user defined selection of atoms. This value can be generated from VMD but it is much easier to run the NAMD program with a low spring constant to get the output of the dyn.*traj file and use those values, use a weak spring constants and pick the first value. It can say something about the periodic cell has become too small or something along those lines but there should be at least the first step in the *traj file that has the starting positions of all the atoms that have colvars turned on for them.

The other variables to check are the values inside the files center and targetcenter they should exist and be set to 0 for the first run

Now it is time to look in more detail at the step5_input.colvar_pulling_flatten_template.str file. In this file the lines start by stating the frequency for the colective variables and the output. This can be adjusted but don't go below 50 becuase preformance suffers below about a frequency of 50. There are two components, the colvar and the harmonic. I needed to use the membrane restraints while the peptide is pulled. Otherwise the membrane will start drifting up with the peptide or antibiotic.

Scroll down some to see my colvar defined as peptide_pull the program ignores the stuff that has been hashtaged out of the input file. So the colvar section is about the same as before for the membrane section the difference is that the harmonic section uses targetCenter and targetNumSteps and targetNumStages. These implement a moving center so that the system has time to respond and does'nt give an error for being forced too fast.

Section out all the rest of the colvars so that there is only the peptide_pull and all the individual lipids selected for now. The other ones were used to restrain atoms on the ends of the peptide to flatten the orientation of the peptide as it leaves the membrane surface. So monitor the simulation for that moment and update the atom selections if needed for different systems. If there is a charged end termini then it is important to monitor that as well since it will likely come back down to the membrane surface, but there is litter to do about it.

Also make sure that all the references to the run number of steps are consistent between the dyn.csh and the dyn-2.inp and that the smit value is also consistent with the run value. Also it may be necessary to calibrate the spring, pulling speed numstages numsteps all of that, but for most applications it should be fine. It is a good idea to use steps of 0.25 A as is done in this example to have flexibility to make a finer grid when needed.

Everything is setup by now and should be ready to run the simulation. These are short runs that can be handled in the debug queue of deepthought2. Just calculate how far that the object should be moved and set the last.seqno accordingly to stop the simulation at the desired time.

Step 2: This step should be done on expanse or another super computer that is faster since these will be production runs.

This is the exact same as step1 except the lines in the step5_input.colvar_pulling_flatten_template.str, delete the lines that causes this to be moving constraint. Specifically the targetcenter line and the numstages numsteps lines.

To do this efficiently there is a protocol to help with setting everything up.

Run some test simulations. Do not use the pulled simulations since this was pulled, start from a pulled simulation point by gunzip the dyn301.vel dyn301.coor dyn301.xsc and copy them to dyn.vel dyn.coor dyn.xsc. Then run a short or long simulaiton to get a good idea of the spread of the simulation. Plot the histogram of the peptide or antibiotic position in one of the windows. This can be done by opening the dyn301.colvars.traj file for instance in excel from an existing opened excel window choosing space deliminated, open the file and delete all of the columns except for the peptide_pull column (this will be automated later on to work with WHAM). Plot the histogram distribution. For the peptide example it covered a spread of 0.75 A approximately so I spaced the windows 0.5 angstroms apart. There must be overlap between the simulation windows for the WHAM to work.

Next use the already decided upon points of for the start and end of the reaction coordinate and divide this into equally spaced grid that achieves overlap. I used 0.5 spacing with a spread of 0.75. Then these sets of files that have all the trajectory data will be copied to new folders and named window1_-14.5, window2_-14 window3_-13.5 … window15… window70… until the grid is completed. But before proceeding make sure all the files are correctly updated before hand.

For instance the last.seqno should be set to 2 steps ahead of where it is currently (we want to run one set of 2ns simulation to equilibrate the system and another simulation of clean data to collect for data analysis)

Update the dyn.csh there are a few options. I recommend just leaving it the same and running it how it is written. Just update this line to whatever the window z position is set to. There is a way to automate the process so that it is feasible to do 70 windows. To do this place the automation.sed script in the folder that can access all the windows. Then open this file in nano to view it. Just update the path to the windows where the dyn_production.csh file is located, add the /namd/, and replace this for all of the lines. Then adjust all of the final positions to hold the restraints. Lastly it is a good idea to switch the output name from dyn.csh to something else that is new for all the windows. Do not use dyn.csh unless you delete it from all the windows. Just update the dyn.scr to activate the new .csh file.

Lastly type chmod u+x automation.sed and then type ./automation.sed

This will correct this line of code to hold it stationary for the next 4 ns or so of production run simulation time.

set center = `echo “6.5” | bc -l`

Run this for two production runs as indicted above. The first time it is running for equilibration and the second run is the actual data that will go into WHAM. Lastly, I recommend adjusting the frequency in the step5*pulling.str file to maybe 50 since more data is good on these steps.

Step3) Collecting the data and inputing it into WHAM.

Outline of the last steps: Use an automated process to copy all the individual .traj files back with new names to deepthought2 in a new folder Use sed commands to trim the .traj files to something useable Lastly, use the wham command to calculate the PMF. Plot it by opening it as a space deliminated file Done

So to get started move the automated_copy file to a place where it can easily access all of the winodws folders. Then adjust the paths and the names of the files. All you have to do is type in the password for each copy instead of taking a more coplicated approach.

Next use the program “data_3” again it is necessary to chomd u+x data_3 and then update the names of the files what this does is erases all the columns except the ones that are needed for the WHAM calculation.

Then in the DT2 directory move the file INPUT_2 into the directory that has all of the .traj files. Open this file and see that it has the path to all the files. The lines at the top are the examples of how to run the program with the hashtag removed. The other things that need to be done is the first number after the path is the equilibrium position of the spring. The second number is the K constant. Be careful with the K constant because if the grid is turned on and the width is not 1 then the k value is rescaled.