Differences

This shows you the differences between two versions of the page.

--- comp-sa [2020/08/03 14:10] – edit
+++ comp-sa [2023/10/12 11:04] (current) – edit
@@ Line 3: / Line 3: @@
 ==== Multi-lipid SA Average ====
-Brief instruction to obtain surface area per lipid of **__multicomponent system__** \\
+Instruction to obtain surface area per lipid of **__multicomponent system__** \\
-Authors: Xiaohong Zhuang and Viviana Monje
+Original Authors: Xiaohong Zhuang and Viviana Monje\\
+Updated Authors: Yalun Yu and Jeffery Klauda\\
+Updated rtfpsf folder: Joshua Lucker
-Original instruction (Viviana Monje):
+The script is in ZT1 path: **/afs/shell.umd.edu/project/energybio/shared/jbklauda/scripts/area_mult_lipids** \\
-**NEED** (without any update)\\
-''crystal_image.str'' \\
-''area.csh''\\
-''bdist.csh''\\
-''tdist.csh''\\
-''dist.c'' (just backup)\\
-''dist.exe''\\
-''test.csh ''\\
-''calc_avg.scr''\\
-**UPDATE/MODIFY**
-^Code ^Do ^
-|area.c	| define NTOT 244 /* //number of atoms being looked per leaflet// */ \\ Re-compile the file using: ''gcc -O area.c -o area.exe''|
-|area-a.inp & area-b.inp | EndOfFile (EOF) – update argument values for //area.csh// line: \\ ''! *.csh (directory) (#chol + 3*#lipids + 1)(#chol + 3*#lipids)'' \\ system ''./area.csh tmp/@D/ 245 244'' \\ \\ Check //a,b,c// values (at least order of magnitude) \\ Check NTOT (#frames per DCD file) \\ Check location of DCD files|
-|rtfpsf.str | Check file names |
-|lipid.dat | 1 (#sterols) (#lipids) \\PER LEAFLET |
-|get-areas.scr (no longer needed) | For lipids: //file.csh (# files) (first file)// \\ For average: //file.csh (sterol fraction) (lipid fraction) out-file// (to be copy to a common directory)|
-|top & bot (only needed for comp area distribution) | Check values in last line – frames per DCD (usually 500 500 OR 1000 1000 – check dyn-2.inp) \\ \\ // # lipids in the system// \\ //atomic-area.dat (file with data from qhull)// \\ //lipid.dat (file with: 1 #chol #lipids, “1” is an internal flag for the presence of CHOL, the others are number of molecs PER LEAFLET starting with #sterol)// \\ //[min] [max] [step-size] (for cholesterol’s distribution)// \\ //[min] [max] [step-size] (for the other lipid(s)’ distribution)// \\ //Chol-dist-file.dat// \\ //Lipid-dist-file.dat// \\ //#frames per DCD #total frames looked at// |
-|avg.csh (no longer needed) | UPDATE path of common directory to store final files if needed \\ (second-to-last line, just update path) |
-|1_test.scr| see Xiao's instruction|
-|2_run-dist-avg.scr| see Xiao's instruction|
-|calc_avg_vertical.py| see Xiao's instruction|
-|calc_avg_final.py| see Xiao's instruction|
-(**This no longer true after the most recent updates, see Xiao's instruction instead.**) MAKE SURE the common directory to store all final (average) files exists AND has system.scr (update name of files that you want to combine/merge into table) AND all.csh
-RUN “test.scr” - Go INTO “tmp” directory and RUN “get-areas.scr”
-RUN areas.dat in the common dir. (SYSTEM – CHOLarea – CHOLerror – LIParea – LIPerror – SAsystem)
-\\
-**Additional Notes/Explanations (Xiaohong Zhuang)**
-\\
-The script is in DT2 path: **/lustre/jbklauda/scripts/xiao/lipids_analyses/area_mult_lipids** \\
 or using zip file: {{ area_mult_lipids.gz |}}
-Besides files mentioned above, you will see some additional files. But you don’t need to edit these files unless you have more than 5 types of sterols, or 10 types of glycerophosphate lipids, or 10 types of sphingo lipids, (or cardiolinpin which will be added later). They are:
-Image-top.str, image-bot.str, ctype.str, gtype.str, stype.str
 **How to Edit and Run the Scripts:**
@@ Line 54: / Line 16: @@
 **1. def_nlip.str:** \\
-(i) Update the resname of lipids. The variable names used for resnames with prefix rnc for sterol, rng for glycerol lipids, rns for sphingo lipids.
+(i) Update the resname of lipids. The variable names used for resnames with prefix rnc for sterol, rng for glycerol lipids, rns for sphingo lipids. If you do not have a lipid of a specific class then remove the rn* line. You should only have a list of lipids that exists for your system.
 ''set rnc1 = SITO'' \\
@@ Line 71: / Line 33: @@
-**2. area.csh**: Update the path of qhull-2003.1
+**2. area.csh**: Update the path of qhull-2003.1, if needed. Current setup is relative to the areas directory.
 **3. area.c**:
@@ Line 77: / Line 39: @@
 # (NTOT in area.c (# of atoms) is not NTOT in area*.inp (# of frames of each dcd file))
-(#chol+3*#lip since use and count 1 atom from each chol, and 3 for each lipid to calculate SA/lipid)
-Eg. For hypocotyl system, with 28 sterols and 72 lipids, (per leaflet), NTOT=28+3*72=244
- \\
-**4. area-a.inp & area-b.inp**: \\
+(NTOT=#chol+3*#lip since the code uses 1 atom to define each chol, and 3 atoms for each lipid to calculate SA/lipid. These are numbers per leaflet NOT bilayer)
- Update file paths; Update the system size a,b,c based on dyn.xsc (the 2nd-4th non-zero values) in your dyn directory; Update atom numbers
+For example with the hypocotyl plant membrane system, with 28 sterols and 72 lipids, (per leaflet), NTOT=28+3*72=244
  \\
+**Must compile this C code with gcc**: ''gcc area.c -o area.exe''. This will overwrite the old executable (area.exe)
+**4. area-a.inp & area-b.inp**:
+Update file paths: in rtfpsf.str, for step5_assembly.str, crystal_image.str, and dyn@N.dcd \\
+TOPPAR Files: It is easiest if you copy the toppar.str file and its associated toppar directory from the CHARMM-GUI/CHARMM folder to the areas directory to use for these calculations.\\
+Update the system size a,b,c based on dyn.xsc (the 2nd-4th non-zero values) in your dyn directory \\
+Update the number of frames in each DCD: ''calc NTOT = 1000'' \\
+Update atom numbers: \\
+<code>
+! *.csh (directory) (#chol + 3*#lipids + 1) (#chol + 3*#lipids) ! per leaflet
+system "./area.csh tmp/@D/ 245 244"
+</code>
 **5. lipid.dat**: \\
@@ Line 95: / Line 70: @@
 **6. top/bot (only needed when you want to get the distribution of Comp Area):**
-In the last line of top/bot files, even though it says [#frames per DCD]  [#total frames], I believe they are actually both [#frames per DCD]  [#frames per DCD], which are 1000 1000 for hypocotyl membrane.
+//top file example//
+^Data in File ^ Meaning of Data ^
+|8|Number of Lipids Including Cholesterol|
+|atomic-area.dat|Output atomic area file|
+|lipid.dat|As Described in Step 5|
+|0 150 1|min max and step size for area binning of sterol (must include even if no sterols)|
+|0 150 1|min max and step size for area binning of lipids|
+|sterol-top.dat|output sterol file (must include even if no sterols)|
+|lip-top.dat|output lipid file|
+|1000 1000|Step block size typically use # of frames in DCD file|
@@ Line 107: / Line 91: @@
 \\
 The first file means first dcd file count in the following line, which is usually 1. \\ \\
-As explained in the comments inside test.scr, each values/name following test.csh are the follwing \\ \\
+As explained in the comments inside test.scr, each values/name following test.csh are the following \\ \\
-# *.csh (#chol+3*#lip) (#chol+3*#lip+1) (first dcd count) (last dcd count) (first dcd - first dcd count) (output dir) (CHARMM input) (#lipid types) (image atoms) \\
+<code>
-# note "dcd count" refers to number of files being read NOT ACTUAL DCD NAME (next argument) \\
+# *.csh (#chol+3*#lip) (#chol+3*#lip+1) (first dcd count) (last dcd count) (first dcd - first dcd count) (output dir) (CHARMM input) (# of dimensions always 2) (image atoms)
-# SAME number for "first dcd" in ALL lines \\
+# note "dcd count" refers to number of files being read NOT ACTUAL DCD NAME (next argument)
-# "image atoms" = 9*(#chol+3*#lip) \\ \\
+# SAME number for "first dcd" in ALL lines
+# "image atoms" = 9*(#chol+3*#lip)
+</code>
 ''sbatch -A energybio-hi test.csh 244 245 1 12 40 tmp area-a 2 2196''\\
 ''sbatch -A energybio-hi test.csh 244 245 13 24 40 tmp area-a 2 2196'' \\
@@ Line 117: / Line 103: @@
 … \\
-**244 245** : (#chol+3*#lip) (#chol+3*#lip+1) per leaflet which is 28+72*3, 28+72*3+1
+**>> 244 245** : (#chol+3*#lip) (#chol+3*#lip+1) per leaflet which is 28+72*3, 28+72*3+1
-**1 12** : Since each test.csh allows maximum 12 dcd file counts, for 35 dcd files used, 1-12, 13-24, and 25-35 are used in each of three lines.
+**>> 1 12** : Since each test.csh allows maximum 12 dcd file counts, for 35 dcd files used, 1-12, 13-24, and 25-35 are used in each of three lines.
-**40** : (first dcd - first dcd count), as the dyn41-75.dcd are used, the first dcd # is 41, and the first dcd count # is 1, therefore, 41-1=40 is used for all lines.
+**>> 40** : (first dcd - first dcd count), as the dyn41-75.dcd are used, the first dcd # is 41, and the first dcd count # is 1, therefore, 41-1=40 is used for all lines.
-**tmp** :  the directory path to save output folders/files used for top leaflet, tmp/bot is for bottom leaflet.
+**>> tmp** :  the directory path to save output folders/files used for top leaflet, tmp/bot is for bottom leaflet.
-**area-a**:  area-a is input file for top leaflet, area-b is for bottom leaflet
+**>> area-a**:  area-a is input file for top leaflet, area-b is for bottom leaflet
-**2**: The lipid types of 2 are used instead of 8. (1-atom represented choline/sterol, and other 3-atoms represented lipids, e.g. glycerol lipids, sphingo lipids)
+**>> 2**: This is the dimension of the tessellation, which is always 2.
-**2970**: "image atoms" = 9*(#chol+3*#lip)=9*(28+3*72)
+**>> 2196**: "image atoms" = 9*(#chol+3*#lip)=9*(28+3*72)
 **8. 2_run-dist-avg.scr**: \\
+Update number of blocks (number of dcd files in this case)
 <code>
 # Update number of blocks, NO space, i.e. "N=30", not "N = 30"
-N=48
+N=30
 </code>
-**Run the scripts:** \\
+**9. calc_avg_vertical.py & calc_avg_final.py:** \\
-There are actually only two scripts to submit, the rest can be ignored unless you have many more result data to organize.
+Update the number of each lipid component per leaflet in the same order as in lipid.dat. If you lack sterol, the first number in the Nlist MUST be zero due to the assumption in calc_avg_vertical.py.
+<code>
+# Update the number of each lipid component per leaflet in the same order as in area-a.inp
+Nlist = [28,12,13,9,5,22,7,4]
+</code>
+**10. rtfpsf.str:** \\
+The file names in the rtfpsf.str file will most likely need to be updated.
-.	Run script by command: 1_test.scr
-After run test.scr (which may take about 15-20 minutes), tmp and output files for all the dcd file count (1-35) will be generated.  In each dcd count folder, atomic-area.dat will be generated which will have (#data points per dcd) rows and (# of represented atoms)  columns, for hypo system here, atomic-area.dat has 1000 rows and 244 columns.
+**Run the scripts:** \\
-. Go INTO “tmp” directory, update run-get-areas.scr file and run the script by command: \\ ''run-get-areas.scr'' \\
+There are actually only two scripts to submit, the rest can be ignored unless you have many more result data to organize.
-# ''get-areas.scr (#cols is # of lipid area distribution e.g # columns in lip-top.dat) (output file name) (last dir) (first dir)''
-''get-areas.scr 8 hybo-b1 35 1''\\
+.	Run script by command: ''./1_test.scr'' \\
+After run 1_test.scr (which may take about 0.5 ~ 2 hours), tmp and output files for all the dcd file count (1-35 in the example) will be generated. In each dcd count folder, atomic-area.dat will be generated which will have (#data points per dcd) rows and (# of represented atoms) columns. For hypo system here, atomic-area.dat has 1000 rows and 244 columns.
-**8**: # of lipid area distribution e.g # columns in lip-top.dat\\
+.	Get the averages and standard errors by command: ''./2_run-dist-avg.scr''. You will see 2_avg_std_final.txt which contains the component area for each lipid type following the order in def_nlip.str.
-**hybo-b1**: name of output file that the avg.dat will be copied and saved to the common directory which is done by the following command in avg.csh \\
-''cp avg.dat /lustre/xzhuang/soybean/all_analyses_summary/areas/$5.dat ''\\
-**35**: the last dir or the last dcd count # \\
-**1**: the last dir or the last dcd count # \\
+** Ignore the rest of this wiki for now...Yalun will updated as needed **
+**(Yalun: Following part to be updated)**
 As mentioned above, when submit run-get-areas.scr to run get-area.scr, the combine.csh and avg.csh (also ?dist.csh) are called.
 lipid.dat is called/used in top  and bot,  which is applied in dist.exe(dist.c) by command line dist.exe < top  in tdist.csh