[[Analysis]]

======Brief instruction to calculate lipid clusters======

Authors: Xiaohong Zhuang and Dr. Jeffery B. Klauda 

This instruction is explain the procedure to perform the clustering analysis of multiple lipid bilayers.

Download Script: {{ ::clus-s.tar.gz |}}\\
Download Example: {{ :clus-e.tar.gz |}}

Copy the script folder provided in the path above to your path.

If on ZT1, please use the files in the following directory: /afs/shell.umd.edu/project/energybio/shared/jbklauda/scripts/cluster_mult_lip/

=====Cluster Prep=====

Obtain the XY coordinates based on the defined head group atoms by CHARMM scripts

1.1 Update the path of psf and crd files in **rtfpsf.str** 

1.2 Update the path of **step5_assembly.str**, **crystal_image.str**, **dcd files** and **number of frames per dcd** in **coord-a.inp** and **coord-b.inp**

1.3 Update the lipid names and head group names, and number of types of three groups of lipids in the **def_nlip_a.str** and **def_nlip_b.str**.
<code commend>
! rnc for sterol; rng for glycerol lipids; rns for sphingo lipids
! No need to specify headgroup for sterol; use hdg for glycerol lipids; hds for sphingo lipids
! hdg such as PA, PC, PE, PG, PI, and PS; hds such as SM and CER
e.g.
set rnc1 = CHL1

set rng1 = DPPC
set rng2 = DOPE

set hdg1 = PC
set hdg2 = PE

set rns1 = NSM
set hds1 = SM

! Update number of types of each lipid group (in integers)
set ctyp = 1     ! number of types of sterols
set gtyp = 2     ! number of types of glycerolphosphate lipids
set styp = 1     ! number of types of sphingo lipids
</code>

1.4 Update the dyn files ranges in the **1a_coord_sbatch.scr**
<code commend>
sbatch -A energybio 1a_coord.csh 1 12 102 top coord-a
…
</code>

1.5 Submit the job by command: ''1a_coord_sbatch.scr''

The box-*.txt and xy-*.dat will be obtained in the top and bot folders.


=====Cluster Calc=====

After the job finishes, Obtain the cluster size and numbers by the python scripts

2.1 Update number of dyns in **2a_fix_decimal.scr** to fix number of digits.

''# As in bash, make sure there is no space between N, equal sign and the dyn number in the following line.''
''N=48''

The **box-fix-*.txt** and **xy-fix-*.dat** will be obtained.

2.2 Update the cutoff distance in **dbscan.py**.

''Dcut = 5.5;''

Multiple tests may be required to obtain reasonable Dcut. The Dcut is selected so that the majority of Nlf in **tot-*.txt** is less than 75% of total lipid number per leaflet, and the majority of Nlc in **num-*.txt** is greater or equal to 4. You may discuss on this with Dr.Klauda.

2.3 Submit the job by ''2b_submit_sbatch.scr''. (May update the walltime requirement in **2b-run-dbscan.csh** when necessary. Also update Nf and Nlip, Nf should be 1/10 the actual number of frames to save time)

After the run, the **dbscan-*.txt**, **tot-*.txt**, and **num-*.txt** and will be obtained.

2.4 After the 2.2 job finishes, run the following commands in order (or copy and paste the following commands and run them in order automatically).
 
''2c-run-count.scr''\\
''2d-run-fr-avg.scr''\\
''2e-comb.scr''\\
''2f-run-dyn-avg.scr''\\
''2g-run-bin-Nlc.scr''

The **count_*.txt**, **Nlc-fr-*.txt**, and **all-*.txt** will be generated.

2.5 Then, obtain the top and bottom average by command: ''2h-tb-avg.scr''

The final results are shown in tb-avg folder. In the **final-Nlc-avg.txt**, with five rows are averaged number of lipid per cluster (Nc), the fraction of lipid per cluster (Rc) and overall lipid composition ratio (Rn), the difference between the cluster fraction (ratio) and overall composition ratio (Rc-Rn), and lastly, the relative difference between the cluster fraction ratio and overall composition ratio (Rc-Rn)/Rn.

=====Visualize=====

You may do this section only if you are interested to visualize the cluster snap shot data:

3.1 update the line numbers in **3a-get_plot_frame.scr**
<code commend>
#update the number of lines of selected frame for plots, usually the last frame of xy and the dbscan data files. The Nline1 is 450 (=50*9) if it is the last frame and number of lipids per leaflet is 50. And Nline2 is refer to the dbscan-*.txt

set Nline1 = 450
set Nline2 = 42
</code>

3.2 Update the 3b_cluster-top.gnu and then run the command: ''3b_plot_cluster.scr''

<code commend>
# Update left and right box boundary
Lb=-22.3
Rb=22.3
</code>

**3b_cluster-top.tif** will be obtained in the top folder. You can open the image if you are using WinSCP.

=====Lipid Combination=====

You may do this section only if you have multiple lipids of same group and are interested to see the lipid combination data:

4.1 update the column numbers in the **bin-R.py**
<code commend>
# Make sure select correct column numbers. e.g. 6:9 is 7th to 9th columns 
# add PLPA, PLPC, and PLPE to PL; add LLPA, LLPC, and LLPE to LL;
Nlc = data1[:,3]; PL = data1[:,6:9]; LL = data1[:,10:13];
</code>

4.2 Plot the distribution by command: ''4b_plot_cluster.scr''