Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3391

Create scaled and junction files for 2022 Palanivelu Lab-generated samples

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Directory: /projects/tomato_genome/rnaseq/ravi-2022-fullrun/30-681594536/00_fastq

      • Rename "sorted" names
      • Create scaled coverage graphs
      • Create junction files

      Note: Remember to use 2019 and 2022 2bit files for SL4 and SL5. Also make sure to change find_junctions.sh to use that specific 2019 or 2022 2bit file.

        Attachments

          Issue Links

            Activity

            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Scripts and Files used:

            • renameBams.sh
            • sbatch-doIt.sh
            • S_lycopersicum_Jun_2022.2bit
            • S_lycopersicum_Sep_2019.2bit
            • find_junctions.sh
            • find-junctions-1.0.0-jar-with-dependencies.jar
            • bamCoverage.sh

            Launch renameBams.sh script:

            ./renameBams.sh
            

            Launch Scaled Coverage graphs script:

            ./sbatch-doIt.sh .bam bamCoverage.sh >jobs.out 2>jobs.err
            

            Launch Junction files script:

            ./sbatch-doIt.sh .bam find_junctions.sh >jobs.out 2>jobs.err
            

            SL5 Directory: /nobackup/tomato_genome/Ravi_2022/Ravi_2022_SL5/results/star_salmon
            SL4 Directory: /nobackup/tomato_genome/Ravi_2022/Ravi_2022_SL4/results/star_salmon

            Reviewer:
            Check that files have reasonable sizes (no "zero" size files, for example)
            Check that every "FJ.bed.gz" file has a corresponding "FJ.bed.gz.tbi" index file
            Check that every bam file has a corresponding "FJ.bed.gz" file
            Check that every bam file has a corresponding "scaled.bedgraph.gz" file
            Check that every "scaled.bedgraph.gz" has a corresponding "scaled.bedgraph.gz.tbi"
            If all files are normal please move ticket to done

            Show
            Mdavis4290 Molly Davis added a comment - - edited Scripts and Files used : renameBams.sh sbatch-doIt.sh S_lycopersicum_Jun_2022.2bit S_lycopersicum_Sep_2019.2bit find_junctions.sh find-junctions-1.0.0-jar-with-dependencies.jar bamCoverage.sh Launch renameBams.sh script : ./renameBams.sh Launch Scaled Coverage graphs script : ./sbatch-doIt.sh .bam bamCoverage.sh >jobs.out 2>jobs.err Launch Junction files script : ./sbatch-doIt.sh .bam find_junctions.sh >jobs.out 2>jobs.err SL5 Directory: /nobackup/tomato_genome/Ravi_2022/Ravi_2022_SL5/results/star_salmon SL4 Directory: /nobackup/tomato_genome/Ravi_2022/Ravi_2022_SL4/results/star_salmon Reviewer : Check that files have reasonable sizes (no "zero" size files, for example) Check that every "FJ.bed.gz" file has a corresponding "FJ.bed.gz.tbi" index file Check that every bam file has a corresponding "FJ.bed.gz" file Check that every bam file has a corresponding "scaled.bedgraph.gz" file Check that every "scaled.bedgraph.gz" has a corresponding "scaled.bedgraph.gz.tbi" If all files are normal please move ticket to done
            Hide
            robofjoy Robert Reid added a comment -

            *For /nobackup/tomato_genome/Ravi_2022/Ravi_2022_SL5/results/star_salmon
            *
            There are 1347 total files.
            ll*bed.gz produces 55 gzipped bed files.
            Most of these files are 5-8MB in size.

            But with 2 exceptions:
            rw-rr- 1 mdavi258 tomato_genome 28 Jul 25 14:44 Nagcarlang-R1-3hr-25C.FJ.bed.gz
            rw-rr- 1 mdavi258 tomato_genome 699K Jul 25 14:44 Malintka-R3-8hr-37C.FJ.bed.gz

            These 2 are MUCH smaller than the rest.
            There are 55 .tbi files.

            There are 56 .out files (1 extra for jobs.out so that makes sense).
            All are similar size except for Malintka-R3-8hr-37C.out, which is only 1.9M. Not surprising given that the corresponding bed file is small / truncated. However the tiny Nag file appers to have a normal sized out file! A mystery side quest to pursue!

            *ll *err CHECKING THE ERR files *
            There are 56 err files as expected.
            All are empty except for:
            rw-rr- 1 mdavi258 tomato_genome 234 Jul 25 13:56 Tamaulipas-R1-3hr-37C.err
            rw-rr- 1 mdavi258 tomato_genome 696 Jul 25 14:44 Nagcarlang-R1-3hr-25C.err
            rw-rr- 1 mdavi258 tomato_genome 655 Jul 25 14:44 Malintka-R3-8hr-37C.err
            rw-rr- 1 mdavi258 tomato_genome 655 Jul 25 14:44 Heinz-R3-8hr-25C.err

            (our 2 previous devious samples and now 2 more!!!)

            The error within:
            Exception in thread "main" java.io.IOException: Cannot send after transport endpoint shutdown
            at java.io.RandomAccessFile.readBytes(Native Method)
            at java.io.RandomAccessFile.read(RandomAccessFile.java:400)
            at org.biojava.nbio.genome.parsers.twobit.TwoBitParser.loadBits(TwoBitParser.java:205)
            at org.biojava.nbio.genome.parsers.twobit.TwoBitParser.skip(TwoBitParser.java:281)
            at org.biojava.nbio.genome.parsers.twobit.TwoBitParser.setCurrentSequencePosition(TwoBitParser.java:196)
            at org.biojava.nbio.genome.parsers.twobit.TwoBitParser.loadFragment(TwoBitParser.java:332)
            at org.lorainelab.findjunctions.FindJunctions.main(FindJunctions.java:248)
            Heinz-R3-8hr-25C.err lines 1-8/8 (END)

            SL4 Check
            /nobackup/tomato_genome/Ravi_2022/Ravi_2022_SL4/results/star_salmon

            This time starting with err files:
            Found the expected 56 files. (55 + 1 jobs.err file)
            Only 2 files have issues.
            rw-rr- 1 mdavi258 tomato_genome 370 Jul 25 16:09 Malintka-R2-8hr-37C.err
            rw-rr- 1 mdavi258 tomato_genome 370 Jul 25 16:09 Nagcarlang-R2-8hr-37C.err

            No overlap from the SL5 run it seems.
            But potential issues with these 2.

            I find 55 bed.gz files and 55 tbi files.
            All bed and tbi files are of similar size.

            The err files:

            Exception in thread "main" java.lang.RuntimeException: Postion is too high (more than 64792705)
            at org.biojava.nbio.genome.parsers.twobit.TwoBitParser.setCurrentSequencePosition(TwoBitParser.java:191)
            at org.biojava.nbio.genome.parsers.twobit.TwoBitParser.loadFragment(TwoBitParser.java:332)
            at org.lorainelab.findjunctions.FindJunctions.main(FindJunctions.java:249)
            Malintka-R2-8hr-37C.err lines 1-4/4 (END)

            Exception in thread "main" java.lang.RuntimeException: Postion is too high (more than 64792705)
            at org.biojava.nbio.genome.parsers.twobit.TwoBitParser.setCurrentSequencePosition(TwoBitParser.java:191)
            at org.biojava.nbio.genome.parsers.twobit.TwoBitParser.loadFragment(TwoBitParser.java:332)
            at org.lorainelab.findjunctions.FindJunctions.main(FindJunctions.java:249)
            Nagcarlang-R2-8hr-37C.err lines 1-4/4 (END)

            Same error for both.

            Let the Scooby mystery begin!

            Show
            robofjoy Robert Reid added a comment - *For /nobackup/tomato_genome/Ravi_2022/Ravi_2022_SL5/results/star_salmon * There are 1347 total files. ll*bed.gz produces 55 gzipped bed files. Most of these files are 5-8MB in size. But with 2 exceptions: rw-r r - 1 mdavi258 tomato_genome 28 Jul 25 14:44 Nagcarlang-R1-3hr-25C.FJ.bed.gz rw-r r - 1 mdavi258 tomato_genome 699K Jul 25 14:44 Malintka-R3-8hr-37C.FJ.bed.gz These 2 are MUCH smaller than the rest. There are 55 .tbi files. There are 56 .out files (1 extra for jobs.out so that makes sense). All are similar size except for Malintka-R3-8hr-37C.out, which is only 1.9M. Not surprising given that the corresponding bed file is small / truncated. However the tiny Nag file appers to have a normal sized out file! A mystery side quest to pursue! *ll *err CHECKING THE ERR files * There are 56 err files as expected. All are empty except for: rw-r r - 1 mdavi258 tomato_genome 234 Jul 25 13:56 Tamaulipas-R1-3hr-37C.err rw-r r - 1 mdavi258 tomato_genome 696 Jul 25 14:44 Nagcarlang-R1-3hr-25C.err rw-r r - 1 mdavi258 tomato_genome 655 Jul 25 14:44 Malintka-R3-8hr-37C.err rw-r r - 1 mdavi258 tomato_genome 655 Jul 25 14:44 Heinz-R3-8hr-25C.err (our 2 previous devious samples and now 2 more!!!) The error within: Exception in thread "main" java.io.IOException: Cannot send after transport endpoint shutdown at java.io.RandomAccessFile.readBytes(Native Method) at java.io.RandomAccessFile.read(RandomAccessFile.java:400) at org.biojava.nbio.genome.parsers.twobit.TwoBitParser.loadBits(TwoBitParser.java:205) at org.biojava.nbio.genome.parsers.twobit.TwoBitParser.skip(TwoBitParser.java:281) at org.biojava.nbio.genome.parsers.twobit.TwoBitParser.setCurrentSequencePosition(TwoBitParser.java:196) at org.biojava.nbio.genome.parsers.twobit.TwoBitParser.loadFragment(TwoBitParser.java:332) at org.lorainelab.findjunctions.FindJunctions.main(FindJunctions.java:248) Heinz-R3-8hr-25C.err lines 1-8/8 (END) SL4 Check /nobackup/tomato_genome/Ravi_2022/Ravi_2022_SL4/results/star_salmon This time starting with err files: Found the expected 56 files. (55 + 1 jobs.err file) Only 2 files have issues. rw-r r - 1 mdavi258 tomato_genome 370 Jul 25 16:09 Malintka-R2-8hr-37C.err rw-r r - 1 mdavi258 tomato_genome 370 Jul 25 16:09 Nagcarlang-R2-8hr-37C.err No overlap from the SL5 run it seems. But potential issues with these 2. I find 55 bed.gz files and 55 tbi files. All bed and tbi files are of similar size. The err files: Exception in thread "main" java.lang.RuntimeException: Postion is too high (more than 64792705) at org.biojava.nbio.genome.parsers.twobit.TwoBitParser.setCurrentSequencePosition(TwoBitParser.java:191) at org.biojava.nbio.genome.parsers.twobit.TwoBitParser.loadFragment(TwoBitParser.java:332) at org.lorainelab.findjunctions.FindJunctions.main(FindJunctions.java:249) Malintka-R2-8hr-37C.err lines 1-4/4 (END) Exception in thread "main" java.lang.RuntimeException: Postion is too high (more than 64792705) at org.biojava.nbio.genome.parsers.twobit.TwoBitParser.setCurrentSequencePosition(TwoBitParser.java:191) at org.biojava.nbio.genome.parsers.twobit.TwoBitParser.loadFragment(TwoBitParser.java:332) at org.lorainelab.findjunctions.FindJunctions.main(FindJunctions.java:249) Nagcarlang-R2-8hr-37C.err lines 1-4/4 (END) Same error for both. Let the Scooby mystery begin!
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Thank you for catching the errors Dr. Reid!

            Next step: Remove the 2bit files and copy them again into the directory and rerun the junction script for SL4 & SL5. Also need to rerun scaled coverage script due to files missing.

            Show
            Mdavis4290 Molly Davis added a comment - - edited Thank you for catching the errors Dr. Reid! Next step : Remove the 2bit files and copy them again into the directory and rerun the junction script for SL4 & SL5. Also need to rerun scaled coverage script due to files missing.
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Fixed 2bit errors:
            SL5 Directory: /nobackup/tomato_genome/Ravi_2022/Ravi_2022_SL5/results/star_salmon
            SL4 Directory: /nobackup/tomato_genome/Ravi_2022/Ravi_2022_SL4/results/star_salmon

            Note: There is an exception/error for two files in SL4 due to positioning (Exception in thread "main" java.lang.RuntimeException: Postion is too high (more than 64792705)). Due to this being an exception I believe we can move forward with the files. Then there is also one error file in SL5 "Tamaulipas-R1-3hr-37C.err" but this is also a warning and seems to not affect the final results.

            Reviewer:
            Check that files have reasonable sizes (no "zero" size files, for example)
            Check that every "FJ.bed.gz" file has a corresponding "FJ.bed.gz.tbi" index file
            Check that every bam file has a corresponding "FJ.bed.gz" file
            Check that every bam file has a corresponding "scaled.bedgraph.gz" file
            Check that every "scaled.bedgraph.gz" has a corresponding "scaled.bedgraph.gz.tbi"
            If all files are normal please move ticket to done

            Show
            Mdavis4290 Molly Davis added a comment - - edited Fixed 2bit errors : SL5 Directory: /nobackup/tomato_genome/Ravi_2022/Ravi_2022_SL5/results/star_salmon SL4 Directory: /nobackup/tomato_genome/Ravi_2022/Ravi_2022_SL4/results/star_salmon Note : There is an exception/error for two files in SL4 due to positioning (Exception in thread "main" java.lang.RuntimeException: Postion is too high (more than 64792705)). Due to this being an exception I believe we can move forward with the files. Then there is also one error file in SL5 "Tamaulipas-R1-3hr-37C.err" but this is also a warning and seems to not affect the final results. Reviewer : Check that files have reasonable sizes (no "zero" size files, for example) Check that every "FJ.bed.gz" file has a corresponding "FJ.bed.gz.tbi" index file Check that every bam file has a corresponding "FJ.bed.gz" file Check that every bam file has a corresponding "scaled.bedgraph.gz" file Check that every "scaled.bedgraph.gz" has a corresponding "scaled.bedgraph.gz.tbi" If all files are normal please move ticket to done
            Hide
            robofjoy Robert Reid added a comment -

            The 2 new locations for the data is NOW:

            /projects/tomato_genome/fnb/Ravi_2022/Ravi_2022_SL5/results/star_salmon
            /projects/tomato_genome/fnb/Ravi_2022/Ravi_2022_SL4/results/star_salmon

            Our
            /nobackup
            is no more.

            Show
            robofjoy Robert Reid added a comment - The 2 new locations for the data is NOW: /projects/tomato_genome/fnb/Ravi_2022/Ravi_2022_SL5/results/star_salmon /projects/tomato_genome/fnb/Ravi_2022/Ravi_2022_SL4/results/star_salmon Our /nobackup is no more.
            Hide
            robofjoy Robert Reid added a comment -

            SL5:
            I see 55 bedgraph and 55 bed files. All zipped and all similar size. (e.g., the bed files 6-8MB and bedgraphs bigger)

            The .tbi files follow the same pattern.

            All BAM files are over 1GB in size.
            55 bam files. 55 bai files

            SL4:
            I see:
            55 .tbi files for bed.gz
            55 .tbi files for bedgraph.gz

            All BAM files are over 1GB in size.
            55 bam files. 55 bai files

            All looks great!!!!!

            Show
            robofjoy Robert Reid added a comment - SL5: I see 55 bedgraph and 55 bed files. All zipped and all similar size. (e.g., the bed files 6-8MB and bedgraphs bigger) The .tbi files follow the same pattern. All BAM files are over 1GB in size. 55 bam files. 55 bai files SL4: I see: 55 .tbi files for bed.gz 55 .tbi files for bedgraph.gz All BAM files are over 1GB in size. 55 bam files. 55 bai files All looks great!!!!!

              People

              • Assignee:
                Mdavis4290 Molly Davis
                Reporter:
                Mdavis4290 Molly Davis
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: