Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3391

Create scaled and junction files for 2022 Palanivelu Lab-generated samples

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Directory: /projects/tomato_genome/rnaseq/ravi-2022-fullrun/30-681594536/00_fastq

      • Rename "sorted" names
      • Create scaled coverage graphs
      • Create junction files

      Note: Remember to use 2019 and 2022 2bit files for SL4 and SL5. Also make sure to change find_junctions.sh to use that specific 2019 or 2022 2bit file.

        Attachments

          Issue Links

            Activity

            Mdavis4290 Molly Davis created issue -
            Mdavis4290 Molly Davis made changes -
            Field Original Value New Value
            Epic Link IGBF-3251 [ 22132 ]
            Mdavis4290 Molly Davis made changes -
            Link This issue relates to IGBF-3367 [ IGBF-3367 ]
            Mdavis4290 Molly Davis made changes -
            Link This issue relates to IGBF-3390 [ IGBF-3390 ]
            Mdavis4290 Molly Davis made changes -
            Link This issue relates to IGBF-3392 [ IGBF-3392 ]
            Mdavis4290 Molly Davis made changes -
            Summary Create scaled and junction files for 2022 Palanavelu Lab-generated samples Create scaled and junction files for 2022 Palanivelu Lab-generated samples
            Mdavis4290 Molly Davis made changes -
            Sprint Summer 5 2023 July 10 [ 174 ] Summer 6 2023 July 24 [ 175 ]
            Mdavis4290 Molly Davis made changes -
            Description Directory: /projects/tomato_genome/rnaseq/ravi-2022-fullrun/30-681594536/00_fastq
            Nf-core run: /nobackup/tomato_genome/ravi-55/sl5-nfcore/results # but there are no scaled or junction files.

            * Rename "sorted" names
            * Create scaled coverage graphs
            * Create junction files

            Directory: /projects/tomato_genome/rnaseq/ravi-2022-fullrun/30-681594536/00_fastq
            Nf-core run: /nobackup/tomato_genome/ravi-55/sl5-nfcore/results # but there are no scaled or junction files.

            * Rename "sorted" names
            * Create scaled coverage graphs
            * Create junction files

            Note: Remember to use 2019 and 2022 2bit files for SL4 and SL5.
            Mdavis4290 Molly Davis made changes -
            Description Directory: /projects/tomato_genome/rnaseq/ravi-2022-fullrun/30-681594536/00_fastq
            Nf-core run: /nobackup/tomato_genome/ravi-55/sl5-nfcore/results # but there are no scaled or junction files.

            * Rename "sorted" names
            * Create scaled coverage graphs
            * Create junction files

            Note: Remember to use 2019 and 2022 2bit files for SL4 and SL5.
            Directory: /projects/tomato_genome/rnaseq/ravi-2022-fullrun/30-681594536/00_fastq

            * Rename "sorted" names
            * Create scaled coverage graphs
            * Create junction files

            Note: Remember to use 2019 and 2022 2bit files for SL4 and SL5.
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Scripts and Files used:

            • renameBams.sh
            • sbatch-doIt.sh
            • S_lycopersicum_Jun_2022.2bit
            • S_lycopersicum_Sep_2019.2bit
            • find_junctions.sh
            • find-junctions-1.0.0-jar-with-dependencies.jar
            • bamCoverage.sh

            Launch renameBams.sh script:

            ./renameBams.sh
            

            Launch Scaled Coverage graphs script:

            ./sbatch-doIt.sh .bam bamCoverage.sh >jobs.out 2>jobs.err
            

            Launch Junction files script:

            ./sbatch-doIt.sh .bam find_junctions.sh >jobs.out 2>jobs.err
            

            SL5 Directory: /nobackup/tomato_genome/Ravi_2022/Ravi_2022_SL5/results/star_salmon
            SL4 Directory: /nobackup/tomato_genome/Ravi_2022/Ravi_2022_SL4/results/star_salmon

            Reviewer:
            Check that files have reasonable sizes (no "zero" size files, for example)
            Check that every "FJ.bed.gz" file has a corresponding "FJ.bed.gz.tbi" index file
            Check that every bam file has a corresponding "FJ.bed.gz" file
            Check that every bam file has a corresponding "scaled.bedgraph.gz" file
            Check that every "scaled.bedgraph.gz" has a corresponding "scaled.bedgraph.gz.tbi"
            If all files are normal please move ticket to done

            Show
            Mdavis4290 Molly Davis added a comment - - edited Scripts and Files used : renameBams.sh sbatch-doIt.sh S_lycopersicum_Jun_2022.2bit S_lycopersicum_Sep_2019.2bit find_junctions.sh find-junctions-1.0.0-jar-with-dependencies.jar bamCoverage.sh Launch renameBams.sh script : ./renameBams.sh Launch Scaled Coverage graphs script : ./sbatch-doIt.sh .bam bamCoverage.sh >jobs.out 2>jobs.err Launch Junction files script : ./sbatch-doIt.sh .bam find_junctions.sh >jobs.out 2>jobs.err SL5 Directory: /nobackup/tomato_genome/Ravi_2022/Ravi_2022_SL5/results/star_salmon SL4 Directory: /nobackup/tomato_genome/Ravi_2022/Ravi_2022_SL4/results/star_salmon Reviewer : Check that files have reasonable sizes (no "zero" size files, for example) Check that every "FJ.bed.gz" file has a corresponding "FJ.bed.gz.tbi" index file Check that every bam file has a corresponding "FJ.bed.gz" file Check that every bam file has a corresponding "scaled.bedgraph.gz" file Check that every "scaled.bedgraph.gz" has a corresponding "scaled.bedgraph.gz.tbi" If all files are normal please move ticket to done
            Mdavis4290 Molly Davis made changes -
            Description Directory: /projects/tomato_genome/rnaseq/ravi-2022-fullrun/30-681594536/00_fastq

            * Rename "sorted" names
            * Create scaled coverage graphs
            * Create junction files

            Note: Remember to use 2019 and 2022 2bit files for SL4 and SL5.
            Directory: /projects/tomato_genome/rnaseq/ravi-2022-fullrun/30-681594536/00_fastq

            * Rename "sorted" names
            * Create scaled coverage graphs
            * Create junction files

            Note: Remember to use 2019 and 2022 2bit files for SL4 and SL5. Also make sure to change find_junctions.sh to use that specific 2019 or 2022 2bit file.
            Mdavis4290 Molly Davis made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            Mdavis4290 Molly Davis made changes -
            Status In Progress [ 3 ] To-Do [ 10305 ]
            Mdavis4290 Molly Davis made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            Mdavis4290 Molly Davis made changes -
            Status In Progress [ 3 ] To-Do [ 10305 ]
            Mdavis4290 Molly Davis made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            Mdavis4290 Molly Davis made changes -
            Status In Progress [ 3 ] To-Do [ 10305 ]
            Mdavis4290 Molly Davis made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            Mdavis4290 Molly Davis made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            Mdavis4290 Molly Davis made changes -
            Assignee Molly Davis [ molly ]
            robofjoy Robert Reid made changes -
            Assignee Robert Reid [ robertreid ]
            Hide
            robofjoy Robert Reid added a comment -

            *For /nobackup/tomato_genome/Ravi_2022/Ravi_2022_SL5/results/star_salmon
            *
            There are 1347 total files.
            ll*bed.gz produces 55 gzipped bed files.
            Most of these files are 5-8MB in size.

            But with 2 exceptions:
            rw-rr- 1 mdavi258 tomato_genome 28 Jul 25 14:44 Nagcarlang-R1-3hr-25C.FJ.bed.gz
            rw-rr- 1 mdavi258 tomato_genome 699K Jul 25 14:44 Malintka-R3-8hr-37C.FJ.bed.gz

            These 2 are MUCH smaller than the rest.
            There are 55 .tbi files.

            There are 56 .out files (1 extra for jobs.out so that makes sense).
            All are similar size except for Malintka-R3-8hr-37C.out, which is only 1.9M. Not surprising given that the corresponding bed file is small / truncated. However the tiny Nag file appers to have a normal sized out file! A mystery side quest to pursue!

            *ll *err CHECKING THE ERR files *
            There are 56 err files as expected.
            All are empty except for:
            rw-rr- 1 mdavi258 tomato_genome 234 Jul 25 13:56 Tamaulipas-R1-3hr-37C.err
            rw-rr- 1 mdavi258 tomato_genome 696 Jul 25 14:44 Nagcarlang-R1-3hr-25C.err
            rw-rr- 1 mdavi258 tomato_genome 655 Jul 25 14:44 Malintka-R3-8hr-37C.err
            rw-rr- 1 mdavi258 tomato_genome 655 Jul 25 14:44 Heinz-R3-8hr-25C.err

            (our 2 previous devious samples and now 2 more!!!)

            The error within:
            Exception in thread "main" java.io.IOException: Cannot send after transport endpoint shutdown
            at java.io.RandomAccessFile.readBytes(Native Method)
            at java.io.RandomAccessFile.read(RandomAccessFile.java:400)
            at org.biojava.nbio.genome.parsers.twobit.TwoBitParser.loadBits(TwoBitParser.java:205)
            at org.biojava.nbio.genome.parsers.twobit.TwoBitParser.skip(TwoBitParser.java:281)
            at org.biojava.nbio.genome.parsers.twobit.TwoBitParser.setCurrentSequencePosition(TwoBitParser.java:196)
            at org.biojava.nbio.genome.parsers.twobit.TwoBitParser.loadFragment(TwoBitParser.java:332)
            at org.lorainelab.findjunctions.FindJunctions.main(FindJunctions.java:248)
            Heinz-R3-8hr-25C.err lines 1-8/8 (END)

            SL4 Check
            /nobackup/tomato_genome/Ravi_2022/Ravi_2022_SL4/results/star_salmon

            This time starting with err files:
            Found the expected 56 files. (55 + 1 jobs.err file)
            Only 2 files have issues.
            rw-rr- 1 mdavi258 tomato_genome 370 Jul 25 16:09 Malintka-R2-8hr-37C.err
            rw-rr- 1 mdavi258 tomato_genome 370 Jul 25 16:09 Nagcarlang-R2-8hr-37C.err

            No overlap from the SL5 run it seems.
            But potential issues with these 2.

            I find 55 bed.gz files and 55 tbi files.
            All bed and tbi files are of similar size.

            The err files:

            Exception in thread "main" java.lang.RuntimeException: Postion is too high (more than 64792705)
            at org.biojava.nbio.genome.parsers.twobit.TwoBitParser.setCurrentSequencePosition(TwoBitParser.java:191)
            at org.biojava.nbio.genome.parsers.twobit.TwoBitParser.loadFragment(TwoBitParser.java:332)
            at org.lorainelab.findjunctions.FindJunctions.main(FindJunctions.java:249)
            Malintka-R2-8hr-37C.err lines 1-4/4 (END)

            Exception in thread "main" java.lang.RuntimeException: Postion is too high (more than 64792705)
            at org.biojava.nbio.genome.parsers.twobit.TwoBitParser.setCurrentSequencePosition(TwoBitParser.java:191)
            at org.biojava.nbio.genome.parsers.twobit.TwoBitParser.loadFragment(TwoBitParser.java:332)
            at org.lorainelab.findjunctions.FindJunctions.main(FindJunctions.java:249)
            Nagcarlang-R2-8hr-37C.err lines 1-4/4 (END)

            Same error for both.

            Let the Scooby mystery begin!

            Show
            robofjoy Robert Reid added a comment - *For /nobackup/tomato_genome/Ravi_2022/Ravi_2022_SL5/results/star_salmon * There are 1347 total files. ll*bed.gz produces 55 gzipped bed files. Most of these files are 5-8MB in size. But with 2 exceptions: rw-r r - 1 mdavi258 tomato_genome 28 Jul 25 14:44 Nagcarlang-R1-3hr-25C.FJ.bed.gz rw-r r - 1 mdavi258 tomato_genome 699K Jul 25 14:44 Malintka-R3-8hr-37C.FJ.bed.gz These 2 are MUCH smaller than the rest. There are 55 .tbi files. There are 56 .out files (1 extra for jobs.out so that makes sense). All are similar size except for Malintka-R3-8hr-37C.out, which is only 1.9M. Not surprising given that the corresponding bed file is small / truncated. However the tiny Nag file appers to have a normal sized out file! A mystery side quest to pursue! *ll *err CHECKING THE ERR files * There are 56 err files as expected. All are empty except for: rw-r r - 1 mdavi258 tomato_genome 234 Jul 25 13:56 Tamaulipas-R1-3hr-37C.err rw-r r - 1 mdavi258 tomato_genome 696 Jul 25 14:44 Nagcarlang-R1-3hr-25C.err rw-r r - 1 mdavi258 tomato_genome 655 Jul 25 14:44 Malintka-R3-8hr-37C.err rw-r r - 1 mdavi258 tomato_genome 655 Jul 25 14:44 Heinz-R3-8hr-25C.err (our 2 previous devious samples and now 2 more!!!) The error within: Exception in thread "main" java.io.IOException: Cannot send after transport endpoint shutdown at java.io.RandomAccessFile.readBytes(Native Method) at java.io.RandomAccessFile.read(RandomAccessFile.java:400) at org.biojava.nbio.genome.parsers.twobit.TwoBitParser.loadBits(TwoBitParser.java:205) at org.biojava.nbio.genome.parsers.twobit.TwoBitParser.skip(TwoBitParser.java:281) at org.biojava.nbio.genome.parsers.twobit.TwoBitParser.setCurrentSequencePosition(TwoBitParser.java:196) at org.biojava.nbio.genome.parsers.twobit.TwoBitParser.loadFragment(TwoBitParser.java:332) at org.lorainelab.findjunctions.FindJunctions.main(FindJunctions.java:248) Heinz-R3-8hr-25C.err lines 1-8/8 (END) SL4 Check /nobackup/tomato_genome/Ravi_2022/Ravi_2022_SL4/results/star_salmon This time starting with err files: Found the expected 56 files. (55 + 1 jobs.err file) Only 2 files have issues. rw-r r - 1 mdavi258 tomato_genome 370 Jul 25 16:09 Malintka-R2-8hr-37C.err rw-r r - 1 mdavi258 tomato_genome 370 Jul 25 16:09 Nagcarlang-R2-8hr-37C.err No overlap from the SL5 run it seems. But potential issues with these 2. I find 55 bed.gz files and 55 tbi files. All bed and tbi files are of similar size. The err files: Exception in thread "main" java.lang.RuntimeException: Postion is too high (more than 64792705) at org.biojava.nbio.genome.parsers.twobit.TwoBitParser.setCurrentSequencePosition(TwoBitParser.java:191) at org.biojava.nbio.genome.parsers.twobit.TwoBitParser.loadFragment(TwoBitParser.java:332) at org.lorainelab.findjunctions.FindJunctions.main(FindJunctions.java:249) Malintka-R2-8hr-37C.err lines 1-4/4 (END) Exception in thread "main" java.lang.RuntimeException: Postion is too high (more than 64792705) at org.biojava.nbio.genome.parsers.twobit.TwoBitParser.setCurrentSequencePosition(TwoBitParser.java:191) at org.biojava.nbio.genome.parsers.twobit.TwoBitParser.loadFragment(TwoBitParser.java:332) at org.lorainelab.findjunctions.FindJunctions.main(FindJunctions.java:249) Nagcarlang-R2-8hr-37C.err lines 1-4/4 (END) Same error for both. Let the Scooby mystery begin!
            robofjoy Robert Reid made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            Mdavis4290 Molly Davis made changes -
            Assignee Robert Reid [ robertreid ] Molly Davis [ molly ]
            Mdavis4290 Molly Davis made changes -
            Status First Level Review in Progress [ 10301 ] To-Do [ 10305 ]
            Mdavis4290 Molly Davis made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Thank you for catching the errors Dr. Reid!

            Next step: Remove the 2bit files and copy them again into the directory and rerun the junction script for SL4 & SL5. Also need to rerun scaled coverage script due to files missing.

            Show
            Mdavis4290 Molly Davis added a comment - - edited Thank you for catching the errors Dr. Reid! Next step : Remove the 2bit files and copy them again into the directory and rerun the junction script for SL4 & SL5. Also need to rerun scaled coverage script due to files missing.
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Fixed 2bit errors:
            SL5 Directory: /nobackup/tomato_genome/Ravi_2022/Ravi_2022_SL5/results/star_salmon
            SL4 Directory: /nobackup/tomato_genome/Ravi_2022/Ravi_2022_SL4/results/star_salmon

            Note: There is an exception/error for two files in SL4 due to positioning (Exception in thread "main" java.lang.RuntimeException: Postion is too high (more than 64792705)). Due to this being an exception I believe we can move forward with the files. Then there is also one error file in SL5 "Tamaulipas-R1-3hr-37C.err" but this is also a warning and seems to not affect the final results.

            Reviewer:
            Check that files have reasonable sizes (no "zero" size files, for example)
            Check that every "FJ.bed.gz" file has a corresponding "FJ.bed.gz.tbi" index file
            Check that every bam file has a corresponding "FJ.bed.gz" file
            Check that every bam file has a corresponding "scaled.bedgraph.gz" file
            Check that every "scaled.bedgraph.gz" has a corresponding "scaled.bedgraph.gz.tbi"
            If all files are normal please move ticket to done

            Show
            Mdavis4290 Molly Davis added a comment - - edited Fixed 2bit errors : SL5 Directory: /nobackup/tomato_genome/Ravi_2022/Ravi_2022_SL5/results/star_salmon SL4 Directory: /nobackup/tomato_genome/Ravi_2022/Ravi_2022_SL4/results/star_salmon Note : There is an exception/error for two files in SL4 due to positioning (Exception in thread "main" java.lang.RuntimeException: Postion is too high (more than 64792705)). Due to this being an exception I believe we can move forward with the files. Then there is also one error file in SL5 "Tamaulipas-R1-3hr-37C.err" but this is also a warning and seems to not affect the final results. Reviewer : Check that files have reasonable sizes (no "zero" size files, for example) Check that every "FJ.bed.gz" file has a corresponding "FJ.bed.gz.tbi" index file Check that every bam file has a corresponding "FJ.bed.gz" file Check that every bam file has a corresponding "scaled.bedgraph.gz" file Check that every "scaled.bedgraph.gz" has a corresponding "scaled.bedgraph.gz.tbi" If all files are normal please move ticket to done
            Mdavis4290 Molly Davis made changes -
            Assignee Molly Davis [ molly ]
            Mdavis4290 Molly Davis made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            Mdavis4290 Molly Davis made changes -
            Assignee Robert Reid [ robertreid ]
            Hide
            robofjoy Robert Reid added a comment -

            The 2 new locations for the data is NOW:

            /projects/tomato_genome/fnb/Ravi_2022/Ravi_2022_SL5/results/star_salmon
            /projects/tomato_genome/fnb/Ravi_2022/Ravi_2022_SL4/results/star_salmon

            Our
            /nobackup
            is no more.

            Show
            robofjoy Robert Reid added a comment - The 2 new locations for the data is NOW: /projects/tomato_genome/fnb/Ravi_2022/Ravi_2022_SL5/results/star_salmon /projects/tomato_genome/fnb/Ravi_2022/Ravi_2022_SL4/results/star_salmon Our /nobackup is no more.
            ann.loraine Ann Loraine made changes -
            Sprint Summer 6 2023 July 24 [ 175 ] Summer 6 2023 July 24, Summer 7 2023 Aug 7 [ 175, 176 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            Hide
            robofjoy Robert Reid added a comment -

            SL5:
            I see 55 bedgraph and 55 bed files. All zipped and all similar size. (e.g., the bed files 6-8MB and bedgraphs bigger)

            The .tbi files follow the same pattern.

            All BAM files are over 1GB in size.
            55 bam files. 55 bai files

            SL4:
            I see:
            55 .tbi files for bed.gz
            55 .tbi files for bedgraph.gz

            All BAM files are over 1GB in size.
            55 bam files. 55 bai files

            All looks great!!!!!

            Show
            robofjoy Robert Reid added a comment - SL5: I see 55 bedgraph and 55 bed files. All zipped and all similar size. (e.g., the bed files 6-8MB and bedgraphs bigger) The .tbi files follow the same pattern. All BAM files are over 1GB in size. 55 bam files. 55 bai files SL4: I see: 55 .tbi files for bed.gz 55 .tbi files for bedgraph.gz All BAM files are over 1GB in size. 55 bam files. 55 bai files All looks great!!!!!
            robofjoy Robert Reid made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            robofjoy Robert Reid made changes -
            Status First Level Review in Progress [ 10301 ] Ready for Pull Request [ 10304 ]
            robofjoy Robert Reid made changes -
            Status Ready for Pull Request [ 10304 ] Pull Request Submitted [ 10101 ]
            robofjoy Robert Reid made changes -
            Status Pull Request Submitted [ 10101 ] Reviewing Pull Request [ 10303 ]
            robofjoy Robert Reid made changes -
            Status Reviewing Pull Request [ 10303 ] Merged Needs Testing [ 10002 ]
            robofjoy Robert Reid made changes -
            Status Merged Needs Testing [ 10002 ] Post-merge Testing In Progress [ 10003 ]
            robofjoy Robert Reid made changes -
            Resolution Done [ 10000 ]
            Status Post-merge Testing In Progress [ 10003 ] Closed [ 6 ]
            Mdavis4290 Molly Davis made changes -
            Assignee Robert Reid [ robertreid ] Molly Davis [ molly ]

              People

              • Assignee:
                Mdavis4290 Molly Davis
                Reporter:
                Mdavis4290 Molly Davis
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: