Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3849

Process SRP454305 Goldstein Irradiation 2024 data set

    Details

    • Type: Task
    • Status: In Progress (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Following the protocol notes described in IGBF-3790, process data set SRP454305.

      This dataset consists of 12 samples of tardigrade H. exemplaris. The animals got dosed with radiation or a non-radiation control treatment.

      Article link:

        Attachments

          Issue Links

            Activity

            Hide
            ann.loraine Ann Loraine added a comment - - edited

            FINDJUNCTIONS step

            Similar to the coverage graphs step, made a new subdirectory in /projects/tomato_genome/fnb/dataprocessing/tardigrade/SRP454305/results/star_salmon.

            1) Made the find junctions "working" directory and added symbolic links to BAM and BAM index files in the parent directory with:

            [aloraine@str-i1 star_salmon]$ mkdir find_junctions
            [aloraine@str-i1 star_salmon]$ cd find_junctions/
            [aloraine@str-i1 find_junctions]$ ln -s ../*bam* .
            

            2) Download required required input 2bit file into the directory with:

            wget http://lorainelab-quickload.scidas.org/quickload/H_exemplaris_Z151_Apr_2017/H_exemplaris_Z151_Apr_2017.2bit
            

            3) Make symbolic links to scripts and jar file with code:

            ln -s ~/src/tardigrade/src/sbatch-doIt.sh .
            ln -s ~/src/tardigrade/src/find_junctions.sh
            ln -s src/tardigrade/src/find-junctions-1.0.0-jar-with-dependencies.jar
            

            4) Launch jobs with:

            sbatch-doIt.sh .bam find_junctions.sh >jobs.out 2>jobs.err
            
            Show
            ann.loraine Ann Loraine added a comment - - edited FINDJUNCTIONS step Similar to the coverage graphs step, made a new subdirectory in /projects/tomato_genome/fnb/dataprocessing/tardigrade/SRP454305/results/star_salmon. 1) Made the find junctions "working" directory and added symbolic links to BAM and BAM index files in the parent directory with: [aloraine@str-i1 star_salmon]$ mkdir find_junctions [aloraine@str-i1 star_salmon]$ cd find_junctions/ [aloraine@str-i1 find_junctions]$ ln -s ../*bam* . 2) Download required required input 2bit file into the directory with: wget http: //lorainelab-quickload.scidas.org/quickload/H_exemplaris_Z151_Apr_2017/H_exemplaris_Z151_Apr_2017.2bit 3) Make symbolic links to scripts and jar file with code: ln -s ~/src/tardigrade/src/sbatch-doIt.sh . ln -s ~/src/tardigrade/src/find_junctions.sh ln -s src/tardigrade/src/find-junctions-1.0.0-jar-with-dependencies.jar 4) Launch jobs with: sbatch-doIt.sh .bam find_junctions.sh >jobs.out 2>jobs.err
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            DATA TRANSFER SETUP step

            1) Create directory for transfer in /projects/tomato_genome/fnb/dataprocessing/tardigrade

            [aloraine@str-i1 tardigrade]$ pwd
            /projects/tomato_genome/fnb/dataprocessing/tardigrade
            mkdir for_quickload
            

            We will use this to store everything we will trasfer to Quickload for this "tardigrade" project

            2) Make directory for tardigrade genome assembly

            [aloraine@str-i1 for_quickload]$ pwd
            /projects/tomato_genome/fnb/dataprocessing/tardigrade/for_quickload
            mkdir H_exemplaris_Z151_Apr_2017
            

            Note: the above two steps only need to be done once!

            3) Make subdirectory for this data set, in the genome assembly directory used for alignments:

            [aloraine@str-i1 for_quickload]$ cd H_exemplaris_Z151_Apr_2017/
            [aloraine@str-i1 H_exemplaris_Z151_Apr_2017]$ pwd
            /projects/tomato_genome/fnb/dataprocessing/tardigrade/for_quickload/H_exemplaris_Z151_Apr_2017
            [aloraine@str-i1 H_exemplaris_Z151_Apr_2017]$ mkdir SRP454305
            [aloraine@str-i1 H_exemplaris_Z151_Apr_2017]$ cd SRP454305/
            [aloraine@str-i1 SRP454305]$ pwd
            /projects/tomato_genome/fnb/dataprocessing/tardigrade/for_quickload/H_exemplaris_Z151_Apr_2017/SRP454305
            

            4) Move bam, scaled coverage graph, junction files into this location:

            coverage graphs, from inside the directory containing them:

            [aloraine@str-i1 SRP454305]$ mv ../../../SRP454305/results/star_salmon/coverage_graphs/*bedgraph* .
            [aloraine@str-i1 SRP454305]$ ls
            SRR25590736.scaled.bedgraph.gz      SRR25590739.scaled.bedgraph.gz      SRR25590742.scaled.bedgraph.gz      SRR25590745.scaled.bedgraph.gz
            SRR25590736.scaled.bedgraph.gz.tbi  SRR25590739.scaled.bedgraph.gz.tbi  SRR25590742.scaled.bedgraph.gz.tbi  SRR25590745.scaled.bedgraph.gz.tbi
            SRR25590737.scaled.bedgraph.gz      SRR25590740.scaled.bedgraph.gz      SRR25590743.scaled.bedgraph.gz      SRR25590746.scaled.bedgraph.gz
            SRR25590737.scaled.bedgraph.gz.tbi  SRR25590740.scaled.bedgraph.gz.tbi  SRR25590743.scaled.bedgraph.gz.tbi  SRR25590746.scaled.bedgraph.gz.tbi
            SRR25590738.scaled.bedgraph.gz      SRR25590741.scaled.bedgraph.gz      SRR25590744.scaled.bedgraph.gz      SRR25590747.scaled.bedgraph.gz
            SRR25590738.scaled.bedgraph.gz.tbi  SRR25590741.scaled.bedgraph.gz.tbi  SRR25590744.scaled.bedgraph.gz.tbi  SRR25590747.scaled.bedgraph.gz.tbi
            

            Bam files, from inside the directory containing them:

            mv *.bam*  ../../../for_quickload/H_exemplaris_Z151_Apr_2017/SRP454305/.
            

            Junction files, from inside the directory containing them:

            mv *.FJ.* ../../../../for_quickload/H_exemplaris_Z151_Apr_2017/SRP454305/.
            

            5) Make all files world-readable and make all directories world-readable and world-executable:

            files:

            [aloraine@str-i1 SRP454305]$ pwd
            /projects/tomato_genome/fnb/dataprocessing/tardigrade/for_quickload/H_exemplaris_Z151_Apr_2017/SRP454305
            [aloraine@str-i1 SRP454305]$ chmod a+r *
            

            directory:

            [aloraine@str-i1 H_exemplaris_Z151_Apr_2017]$ pwd
            /projects/tomato_genome/fnb/dataprocessing/tardigrade/for_quickload/H_exemplaris_Z151_Apr_2017
            [aloraine@str-i1 H_exemplaris_Z151_Apr_2017]$ chmod a+rx SRP454305
            
            Show
            ann.loraine Ann Loraine added a comment - - edited DATA TRANSFER SETUP step 1) Create directory for transfer in /projects/tomato_genome/fnb/dataprocessing/tardigrade [aloraine@str-i1 tardigrade]$ pwd /projects/tomato_genome/fnb/dataprocessing/tardigrade mkdir for_quickload We will use this to store everything we will trasfer to Quickload for this "tardigrade" project 2) Make directory for tardigrade genome assembly [aloraine@str-i1 for_quickload]$ pwd /projects/tomato_genome/fnb/dataprocessing/tardigrade/for_quickload mkdir H_exemplaris_Z151_Apr_2017 Note: the above two steps only need to be done once! 3) Make subdirectory for this data set, in the genome assembly directory used for alignments: [aloraine@str-i1 for_quickload]$ cd H_exemplaris_Z151_Apr_2017/ [aloraine@str-i1 H_exemplaris_Z151_Apr_2017]$ pwd /projects/tomato_genome/fnb/dataprocessing/tardigrade/for_quickload/H_exemplaris_Z151_Apr_2017 [aloraine@str-i1 H_exemplaris_Z151_Apr_2017]$ mkdir SRP454305 [aloraine@str-i1 H_exemplaris_Z151_Apr_2017]$ cd SRP454305/ [aloraine@str-i1 SRP454305]$ pwd /projects/tomato_genome/fnb/dataprocessing/tardigrade/for_quickload/H_exemplaris_Z151_Apr_2017/SRP454305 4) Move bam, scaled coverage graph, junction files into this location: coverage graphs, from inside the directory containing them: [aloraine@str-i1 SRP454305]$ mv ../../../SRP454305/results/star_salmon/coverage_graphs/*bedgraph* . [aloraine@str-i1 SRP454305]$ ls SRR25590736.scaled.bedgraph.gz SRR25590739.scaled.bedgraph.gz SRR25590742.scaled.bedgraph.gz SRR25590745.scaled.bedgraph.gz SRR25590736.scaled.bedgraph.gz.tbi SRR25590739.scaled.bedgraph.gz.tbi SRR25590742.scaled.bedgraph.gz.tbi SRR25590745.scaled.bedgraph.gz.tbi SRR25590737.scaled.bedgraph.gz SRR25590740.scaled.bedgraph.gz SRR25590743.scaled.bedgraph.gz SRR25590746.scaled.bedgraph.gz SRR25590737.scaled.bedgraph.gz.tbi SRR25590740.scaled.bedgraph.gz.tbi SRR25590743.scaled.bedgraph.gz.tbi SRR25590746.scaled.bedgraph.gz.tbi SRR25590738.scaled.bedgraph.gz SRR25590741.scaled.bedgraph.gz SRR25590744.scaled.bedgraph.gz SRR25590747.scaled.bedgraph.gz SRR25590738.scaled.bedgraph.gz.tbi SRR25590741.scaled.bedgraph.gz.tbi SRR25590744.scaled.bedgraph.gz.tbi SRR25590747.scaled.bedgraph.gz.tbi Bam files, from inside the directory containing them: mv *.bam* ../../../for_quickload/H_exemplaris_Z151_Apr_2017/SRP454305/. Junction files, from inside the directory containing them: mv *.FJ.* ../../../../for_quickload/H_exemplaris_Z151_Apr_2017/SRP454305/. 5) Make all files world-readable and make all directories world-readable and world-executable: files: [aloraine@str-i1 SRP454305]$ pwd /projects/tomato_genome/fnb/dataprocessing/tardigrade/for_quickload/H_exemplaris_Z151_Apr_2017/SRP454305 [aloraine@str-i1 SRP454305]$ chmod a+r * directory: [aloraine@str-i1 H_exemplaris_Z151_Apr_2017]$ pwd /projects/tomato_genome/fnb/dataprocessing/tardigrade/for_quickload/H_exemplaris_Z151_Apr_2017 [aloraine@str-i1 H_exemplaris_Z151_Apr_2017]$ chmod a+rx SRP454305
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            RSYNC step

            1) Logged into data.bioviz.org (a virtual machine hosted on UNC Charlotte infrastructure) and moved to data deployment location in the file system there:

            local aloraine$ ssh aloraine@data.bioviz.org
            cd /mnt/igbdata/tardigrade/H_exemplaris_Z151_Apr_2017
            

            Some things to note:

            • I have deployed my public key into the authorized_hosts file in my "aloraine" account in data.bioviz.org. This way, I don't have to enter my password.
            • If I did need to enter my password, I would enter my Charlotte.edu password.
            • Anyone else wanting to do this will need to get an account on the data.bioviz.org
            • Note that we are inside a directory named for the reference genome assembly we used.

            2) Make a new directory for this new data set to be deployed:

            aloraine@cci-vm12:/mnt/igbdata/tardigrade/H_exemplaris_Z151_Apr_2017$ pwd
            /mnt/igbdata/tardigrade/H_exemplaris_Z151_Apr_2017
            aloraine@cci-vm12:/mnt/igbdata/tardigrade/H_exemplaris_Z151_Apr_2017$ ls
            SRP450893  SRP484252
            aloraine@cci-vm12:/mnt/igbdata/tardigrade/H_exemplaris_Z151_Apr_2017$ mkdir SRP454305
            

            3) Make sure it is group write-able and that its permissions match the other directories in the same location:

            aloraine@cci-vm12:/mnt/igbdata/tardigrade/H_exemplaris_Z151_Apr_2017$ ls -lh
            total 12K
            drwxrwsr-x 3 aloraine cci-igbquickload_users 4.0K Jul  2 09:52 SRP450893
            drwxr-xr-x 2 aloraine domain users           4.0K Aug  7 20:08 SRP454305
            drwxrwxr-x 2 aloraine domain users           4.0K Jul  3 13:42 SRP484252
            aloraine@cci-vm12:/mnt/igbdata/tardigrade/H_exemplaris_Z151_Apr_2017$ chmod g+w SRP454305
            aloraine@cci-vm12:/mnt/igbdata/tardigrade/H_exemplaris_Z151_Apr_2017$ ls -lh
            total 12K
            drwxrwsr-x 3 aloraine cci-igbquickload_users 4.0K Jul  2 09:52 SRP450893
            drwxrwxr-x 2 aloraine domain users           4.0K Aug  7 20:08 SRP454305
            drwxrwxr-x 2 aloraine domain users           4.0K Jul  3 13:42 SRP484252
            

            4) Start the data transfer using tmux and then rsync:

            tmux:

            tmux new -s transfer
            

            rsync:

            rsync -rtpvz aloraine@hpc.charlotte.edu:/projects/tomato_genome/fnb/dataprocessing/tardigrade/for_quickload/H_exemplaris_Z151_Apr_2017/SRP454305/* SRP454305/.
            

            Note: You can repeat the above rsync command any time you add new content to the source directory on hpc.charlotte.edu. Only the new files will get copied.

            Note: I could probably just "rsync" the entire genome directory. I think that this would automatically copy any new "SRP" directories and their contents over to data.bioviz.org.

            Show
            ann.loraine Ann Loraine added a comment - - edited RSYNC step 1) Logged into data.bioviz.org (a virtual machine hosted on UNC Charlotte infrastructure) and moved to data deployment location in the file system there: local aloraine$ ssh aloraine@data.bioviz.org cd /mnt/igbdata/tardigrade/H_exemplaris_Z151_Apr_2017 Some things to note: I have deployed my public key into the authorized_hosts file in my "aloraine" account in data.bioviz.org. This way, I don't have to enter my password. If I did need to enter my password, I would enter my Charlotte.edu password. Anyone else wanting to do this will need to get an account on the data.bioviz.org Note that we are inside a directory named for the reference genome assembly we used. 2) Make a new directory for this new data set to be deployed: aloraine@cci-vm12:/mnt/igbdata/tardigrade/H_exemplaris_Z151_Apr_2017$ pwd /mnt/igbdata/tardigrade/H_exemplaris_Z151_Apr_2017 aloraine@cci-vm12:/mnt/igbdata/tardigrade/H_exemplaris_Z151_Apr_2017$ ls SRP450893 SRP484252 aloraine@cci-vm12:/mnt/igbdata/tardigrade/H_exemplaris_Z151_Apr_2017$ mkdir SRP454305 3) Make sure it is group write-able and that its permissions match the other directories in the same location: aloraine@cci-vm12:/mnt/igbdata/tardigrade/H_exemplaris_Z151_Apr_2017$ ls -lh total 12K drwxrwsr-x 3 aloraine cci-igbquickload_users 4.0K Jul 2 09:52 SRP450893 drwxr-xr-x 2 aloraine domain users 4.0K Aug 7 20:08 SRP454305 drwxrwxr-x 2 aloraine domain users 4.0K Jul 3 13:42 SRP484252 aloraine@cci-vm12:/mnt/igbdata/tardigrade/H_exemplaris_Z151_Apr_2017$ chmod g+w SRP454305 aloraine@cci-vm12:/mnt/igbdata/tardigrade/H_exemplaris_Z151_Apr_2017$ ls -lh total 12K drwxrwsr-x 3 aloraine cci-igbquickload_users 4.0K Jul 2 09:52 SRP450893 drwxrwxr-x 2 aloraine domain users 4.0K Aug 7 20:08 SRP454305 drwxrwxr-x 2 aloraine domain users 4.0K Jul 3 13:42 SRP484252 4) Start the data transfer using tmux and then rsync: tmux: tmux new -s transfer rsync: rsync -rtpvz aloraine@hpc.charlotte.edu:/projects/tomato_genome/fnb/dataprocessing/tardigrade/for_quickload/H_exemplaris_Z151_Apr_2017/SRP454305/* SRP454305/. Note : You can repeat the above rsync command any time you add new content to the source directory on hpc.charlotte.edu. Only the new files will get copied. Note : I could probably just "rsync" the entire genome directory. I think that this would automatically copy any new "SRP" directories and their contents over to data.bioviz.org.
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            ANNOTS.XML step

            1) Opened the run file for this data set in Excel and save it, in Excel format, to tardigrade/Documentation/inputForMakeAnnotsXml (the tardigrade repository)

            Note: Open SRP48452_for_AnnotsXml as a reference and guide!

            2) Added five new columns to the front of the file, in from of "Run:

            • file name prefix
            • color
            • physical folder
            • study name
            • display name
            • url

            3) Used Excel referencing to insert all the values in "Run" in "file name prefix"

            4) Inserted hexadecimal colors codes for each sample. Made those cells have the same fill color as the colors I chose to help me assess their potential appearance and contrast in IGB.

            5) Inserted the study code (e.g., SRP454305) in "physical folder" column

            6) Used Excel reference to insert a human-friendly "study name" - this becomes the name of the folder where the data files will be listed in IGB.

            7) Used Excel references to insert human-friendly "display name" values - these become the checkbox labels in IGB.

            8) Used Excel references to make URLs for each file / data set. Used the "SRX" values in the existing "Experiment" column to construct the URL.

            9) Added new columns as needed after the first five to use for sorting. For example, I added "Concentration" and then sorted the spreadsheet by concentration and then by run so that the lower concentration, control samples would appear first in the IGB data display list.

            10) Edited the script makeAnnots.py to include the new spreadsheet in function getSampleSheets. Ran the script, which will add the new data files to annots.xml in tardigrade/ForGenomeBrowsers/quickload.

            11) Checked how it looks by adding the above directory to IGB as a new quickload data source.

            Show
            ann.loraine Ann Loraine added a comment - - edited ANNOTS.XML step 1) Opened the run file for this data set in Excel and save it, in Excel format, to tardigrade/Documentation/inputForMakeAnnotsXml (the tardigrade repository) Note : Open SRP48452_for_AnnotsXml as a reference and guide! 2) Added five new columns to the front of the file, in from of "Run: file name prefix color physical folder study name display name url 3) Used Excel referencing to insert all the values in "Run" in "file name prefix" 4) Inserted hexadecimal colors codes for each sample. Made those cells have the same fill color as the colors I chose to help me assess their potential appearance and contrast in IGB. 5) Inserted the study code (e.g., SRP454305) in "physical folder" column 6) Used Excel reference to insert a human-friendly "study name" - this becomes the name of the folder where the data files will be listed in IGB. 7) Used Excel references to insert human-friendly "display name" values - these become the checkbox labels in IGB. 8) Used Excel references to make URLs for each file / data set. Used the "SRX" values in the existing "Experiment" column to construct the URL. 9) Added new columns as needed after the first five to use for sorting. For example, I added "Concentration" and then sorted the spreadsheet by concentration and then by run so that the lower concentration, control samples would appear first in the IGB data display list. 10) Edited the script makeAnnots.py to include the new spreadsheet in function getSampleSheets. Ran the script, which will add the new data files to annots.xml in tardigrade/ForGenomeBrowsers/quickload. 11) Checked how it looks by adding the above directory to IGB as a new quickload data source.
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            CLEANUP step:

            • Removed the "work" directory within SRP454305 because it is ENORMOUS and we no longer need it.
            • Moved the entire SRP454305 directory into tardigrade/DONE
            Show
            ann.loraine Ann Loraine added a comment - - edited CLEANUP step: Removed the "work" directory within SRP454305 because it is ENORMOUS and we no longer need it. Moved the entire SRP454305 directory into tardigrade/DONE

              People

              • Assignee:
                ann.loraine Ann Loraine
                Reporter:
                ann.loraine Ann Loraine
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated: