Klauda Lab Simulation Wiki

Multi-lipid SA Average

Instruction to obtain surface area per lipid of multicomponent system
Original Authors: Xiaohong Zhuang and Viviana Monje
Updated Authors: Yalun Yu and Jeffery Klauda
Updated rtfpsf folder: Joshua Lucker

The script is in ZT1 path: /afs/shell.umd.edu/project/energybio/shared/jbklauda/scripts/area_mult_lipids
or using zip file: area_mult_lipids.gz

How to Edit and Run the Scripts:

Edit the scripts:

1. def_nlip.str:
(i) Update the resname of lipids. The variable names used for resnames with prefix rnc for sterol, rng for glycerol lipids, rns for sphingo lipids. If you do not have a lipid of a specific class then remove the rn* line. You should only have a list of lipids that exists for your system.

set rnc1 = SITO
set rnc2 = STIG
…
set rng1 = PLPC
…
set rns1 = NSM
…

(ii) Update number of types of each lipid group (in integers)
set ctyp = 2 ! number of types of sterols
set gtyp = 7 ! number of types of glycerolphos lipids
set styp = 0 ! number of types of sphingo lipids

2. area.csh: Update the path of qhull-2003.1, if needed. Current setup is relative to the areas directory.

3. area.c: #define NTOT 134 /* number of atoms being looked */

# (NTOT in area.c (# of atoms) is not NTOT in area*.inp (# of frames of each dcd file))

(NTOT=#chol+3*#lip since the code uses 1 atom to define each chol, and 3 atoms for each lipid to calculate SA/lipid. These are numbers per leaflet NOT bilayer)

For example with the hypocotyl plant membrane system, with 28 sterols and 72 lipids, (per leaflet), NTOT=28+3*72=244

Must compile this C code with gcc: gcc area.c -o area.exe. This will overwrite the old executable (area.exe)

4. area-a.inp & area-b.inp:

Update file paths: in rtfpsf.str, for step5_assembly.str, crystal_image.str, and dyn@N.dcd
TOPPAR Files: It is easiest if you copy the toppar.str file and its associated toppar directory from the CHARMM-GUI/CHARMM folder to the areas directory to use for these calculations.
Update the system size a,b,c based on dyn.xsc (the 2nd-4th non-zero values) in your dyn directory
Update the number of frames in each DCD: calc NTOT = 1000
Update atom numbers:

! *.csh (directory) (#chol + 3*#lipids + 1) (#chol + 3*#lipids) ! per leaflet
system "./area.csh tmp/@D/ 245 244"

5. lipid.dat:
“1” is an internal flag for the presence of CHOL, the others are number of molecules of each lipid per leaflet starting with #sterol). # of sterol must be in the first, and followed by # of glycerophosphate lipids, and finally by # of sphingo lipids each lipid, which should be based on the order of lipids you used in def_nlip.str.
If don't have any CHOL, put 0 first, followed by # of other lipids in the order described above.
For hypocotyl membrane, (with 28 sterols and 72 glycerol lipids (7 types)) the following numbers are used.:
1 28 12 13 9 5 22 7 4

6. top/bot (only needed when you want to get the distribution of Comp Area):

top file example

Data in File	Meaning of Data
8	Number of Lipids Including Cholesterol
atomic-area.dat	Output atomic area file
lipid.dat	As Described in Step 5
0 150 1	min max and step size for area binning of sterol (must include even if no sterols)
0 150 1	min max and step size for area binning of lipids
sterol-top.dat	output sterol file (must include even if no sterols)
lip-top.dat	output lipid file
1000 1000	Step block size typically use # of frames in DCD file

7. 1_test.scr: The pairing between # of dcd count and dcd# is shown in the following table

# dcd file count	dyn#.dcd
1	41
…	…
35	75

The first file means first dcd file count in the following line, which is usually 1.

As explained in the comments inside test.scr, each values/name following test.csh are the following

# *.csh (#chol+3*#lip) (#chol+3*#lip+1) (first dcd count) (last dcd count) (first dcd - first dcd count) (output dir) (CHARMM input) (# of dimensions always 2) (image atoms) 
# note "dcd count" refers to number of files being read NOT ACTUAL DCD NAME (next argument) 
# SAME number for "first dcd" in ALL lines
# "image atoms" = 9*(#chol+3*#lip)

sbatch -A energybio-hi test.csh 244 245 1 12 40 tmp area-a 2 2196
sbatch -A energybio-hi test.csh 244 245 13 24 40 tmp area-a 2 2196
sbatch -A energybio-hi test.csh 244 245 25 35 40 tmp area-a 2 2196
…

» 244 245 : (#chol+3*#lip) (#chol+3*#lip+1) per leaflet which is 28+72*3, 28+72*3+1

» 1 12 : Since each test.csh allows maximum 12 dcd file counts, for 35 dcd files used, 1-12, 13-24, and 25-35 are used in each of three lines.

» 40 : (first dcd - first dcd count), as the dyn41-75.dcd are used, the first dcd # is 41, and the first dcd count # is 1, therefore, 41-1=40 is used for all lines.

» tmp : the directory path to save output folders/files used for top leaflet, tmp/bot is for bottom leaflet.

» area-a: area-a is input file for top leaflet, area-b is for bottom leaflet

» 2: This is the dimension of the tessellation, which is always 2.

» 2196: “image atoms” = 9*(#chol+3*#lip)=9*(28+3*72)

8. 2_run-dist-avg.scr:
Update number of blocks (number of dcd files in this case)

# Update number of blocks, NO space, i.e. "N=30", not "N = 30"
N=30

9. calc_avg_vertical.py & calc_avg_final.py:
Update the number of each lipid component per leaflet in the same order as in lipid.dat. If you lack sterol, the first number in the Nlist MUST be zero due to the assumption in calc_avg_vertical.py.

# Update the number of each lipid component per leaflet in the same order as in area-a.inp
Nlist = [28,12,13,9,5,22,7,4]

10. rtfpsf.str:
The file names in the rtfpsf.str file will most likely need to be updated.

Run the scripts:
There are actually only two scripts to submit, the rest can be ignored unless you have many more result data to organize.

1. Run script by command: ./1_test.scr
After run 1_test.scr (which may take about 0.5 ~ 2 hours), tmp and output files for all the dcd file count (1-35 in the example) will be generated. In each dcd count folder, atomic-area.dat will be generated which will have (#data points per dcd) rows and (# of represented atoms) columns. For hypo system here, atomic-area.dat has 1000 rows and 244 columns.

2. Get the averages and standard errors by command: ./2_run-dist-avg.scr. You will see 2_avg_std_final.txt which contains the component area for each lipid type following the order in def_nlip.str.

Ignore the rest of this wiki for now…Yalun will updated as needed

(Yalun: Following part to be updated) As mentioned above, when submit run-get-areas.scr to run get-area.scr, the combine.csh and avg.csh (also ?dist.csh) are called. lipid.dat is called/used in top and bot, which is applied in dist.exe(dist.c) by command line dist.exe < top in tdist.csh

    After run run-get-areas.scr, two data files of the distribution of SA/lipid will be obtained, sterol-top.dat and sterol-bot.dat for sterols (with 2 columns, the SITO and STIG are lumped together), and lip-top.dat and lip-bot.dat have the distribution of all 7 lipids (with 8 columns since two sterols in hycotyol: SITO and STIG are treated as one lipid).
   Also, the SA/lipid data is shown in avg.dat, which contains the final averaged SA/lipid of a lipid followed by the standard deviation of the distribution (which may be slightly broad). For 8 lipids hypocotyl system, there are total 16 columns in avg.dat.The standard deviation is sqrt(variance).
    Simulations of multiple replicates (e.g. hypocotyl and root, each has 3 replicates) are performed, the REAL final average are taken by the average value of all the replicates. And the standard error is calculated by standard deviation of the **average SA/lipid** divided by square root of the replicate number. E.g. for hypocotyl system of 3 replicate, it is divided by sqrt(3).

Check that the system go to equilibrium by comparing the distribution data, and adjust the dyn file range when necessary. Also check the distribution, the edge is not 0, increase the area range for broader distribution.