Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3172

Tabulate splicing support by running arabitag algorithm on junction and bam files

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Run the arabitag algorithm to tabulate splice junction support using output from find_junctions.sh in linked issue.

      Ref: See diagram from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6589529/figure/pld3136-fig-0001/
      See paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6589529/

      Rice paper repository: https://bitbucket.org/lorainelab/ricealtsplice/src/master/

      Arabitag algorithm repository: https://bitbucket.org/lorainelab/altspliceanalysis/src/master/

        Attachments

          Issue Links

            Activity

            ann.loraine Ann Loraine created issue -
            ann.loraine Ann Loraine made changes -
            Field Original Value New Value
            Epic Link IGBF-2993 [ 21429 ]
            ann.loraine Ann Loraine made changes -
            Link This issue is blocked by IGBF-3165 [ IGBF-3165 ]
            ann.loraine Ann Loraine made changes -
            Summary Run arabitag algorithm on junction and bam files for tabulating splicing support Tabulate splicing support by running arabitag algorithm on junction and bam files
            ann.loraine Ann Loraine made changes -
            Description Run the arabitag algorithm to tabulate splice junction support using output from find_junctions.sh in linked issue. Run the arabitag algorithm to tabulate splice junction support using output from find_junctions.sh in linked issue.

            Ref: See diagram from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6589529/figure/pld3136-fig-0001/
            ann.loraine Ann Loraine made changes -
            Description Run the arabitag algorithm to tabulate splice junction support using output from find_junctions.sh in linked issue.

            Ref: See diagram from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6589529/figure/pld3136-fig-0001/
            Run the arabitag algorithm to tabulate splice junction support using output from find_junctions.sh in linked issue.

            Ref: See diagram from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6589529/figure/pld3136-fig-0001/
            See paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6589529/
            Hide
            Mdavis4290 Molly Davis added a comment - - edited
            Show
            Mdavis4290 Molly Davis added a comment - - edited During our meeting I brought up using deep learning/machine learning to identify splicing events. Here are some papers I found that were related to the idea: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6722613/ https://academic.oup.com/bioinformatics/article/33/14/i274/3953982 code: https://majiq.biociphers.org/jha_et_al_2017/ https://www.sciencedirect.com/science/article/pii/S0092867418316295 code: https://github.com/Illumina/SpliceAI
            ann.loraine Ann Loraine made changes -
            Description Run the arabitag algorithm to tabulate splice junction support using output from find_junctions.sh in linked issue.

            Ref: See diagram from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6589529/figure/pld3136-fig-0001/
            See paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6589529/
            Run the arabitag algorithm to tabulate splice junction support using output from find_junctions.sh in linked issue.

            Ref: See diagram from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6589529/figure/pld3136-fig-0001/
            See paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6589529/

            Rice paper repository: https://bitbucket.org/lorainelab/ricealtsplice/src/master/

            Arabitag algorithm repository: https://bitbucket.org/lorainelab/altspliceanalysis/src/master/
            Hide
            ann.loraine Ann Loraine added a comment -

            Running arabitag tabulation algorithm with:

            sbatch --export=BED=S_lycopersicum_Jun_2022_noPRAM.bed,F=5,RIF=10 --output=arabitag.out --error=arabitag.err arabitag.sh 2>jobs.err 1>jobs.out
            

            in:

            /nobackup/tomato_genome/alt_splicing/arabitag
            

            using code from my home directory.

            Show
            ann.loraine Ann Loraine added a comment - Running arabitag tabulation algorithm with: sbatch --export=BED=S_lycopersicum_Jun_2022_noPRAM.bed,F=5,RIF=10 --output=arabitag.out --error=arabitag.err arabitag.sh 2>jobs.err 1>jobs.out in: /nobackup/tomato_genome/alt_splicing/arabitag using code from my home directory.
            Hide
            ann.loraine Ann Loraine added a comment -

            Made changes to arabitag.sh to enable the splicing analysis algorithm to be run in parallel on the slurm cluster.
            Ran it, and now the files are all made.
            Within directory /nobackup/tomato_genome/alt_splicing/arabitag/all, there are multiple files with prefix "SRR" identifier and file name suffix splice_support.txt.
            The first line of each file is a comment (marked with hash character) giving the flanking base number and other parameters used to run the data processing steps.
            Each row represents an alternative splicing event, with two mutually exclusive choices.

            • chromosome - the location of the event
            • strand - the strand of the event
            • start - interbase coordinates for the start position of the difference region, the alternatively spliced region
            • end - interbase coordinates for the end position of the difference region
            • Ga - gene model lacking the difference region
            • Gp - gene model containing the difference region
            • Ga_[sample name] - the number of alignments from "sample_name" (an RNA-Seq library) that unambiguously support model Ga
            • Gp_[sample name] - the number of alignmetns from "sample_name" that unambiguously support model Gp

            For the next step, we need to consolidate all of these into a single data frame, or something like that, so that we can then determine how or if support for Ga and Gp splicing choices varies between or among samples.

            Show
            ann.loraine Ann Loraine added a comment - Made changes to arabitag.sh to enable the splicing analysis algorithm to be run in parallel on the slurm cluster. Ran it, and now the files are all made. Within directory /nobackup/tomato_genome/alt_splicing/arabitag/all, there are multiple files with prefix "SRR" identifier and file name suffix splice_support.txt. The first line of each file is a comment (marked with hash character) giving the flanking base number and other parameters used to run the data processing steps. Each row represents an alternative splicing event, with two mutually exclusive choices. chromosome - the location of the event strand - the strand of the event start - interbase coordinates for the start position of the difference region, the alternatively spliced region end - interbase coordinates for the end position of the difference region Ga - gene model lacking the difference region Gp - gene model containing the difference region Ga_ [sample name] - the number of alignments from "sample_name" (an RNA-Seq library) that unambiguously support model Ga Gp_ [sample name] - the number of alignmetns from "sample_name" that unambiguously support model Gp For the next step, we need to consolidate all of these into a single data frame, or something like that, so that we can then determine how or if support for Ga and Gp splicing choices varies between or among samples.
            ann.loraine Ann Loraine made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            ann.loraine Ann Loraine made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ]
            Hide
            ann.loraine Ann Loraine added a comment -

            To review:

            • check that files have reasonable sizes (no "zero" size files, for example)
            • check that every "SRR" bam file in our control and experimental sample directories has a corresponding "splice_support.txt" file
            Show
            ann.loraine Ann Loraine added a comment - To review: check that files have reasonable sizes (no "zero" size files, for example) check that every "SRR" bam file in our control and experimental sample directories has a corresponding "splice_support.txt" file
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Review:

            Directory- /nobackup/tomato_genome/alt_splicing/arabitag/all

            • With command LL/ll I checked to see if any files were zero. Only file 'jobs.err' had zero file size but all of the SRR and support files were non-zero files.
            • Every "SRR" bam file in our control and experimental sample directories has a corresponding "splice_support.txt" file. SRP328042-molly SRR files are in-between SRP252265 files in the directory might be hard to differentiate which is control and which is experimental.

            [~aloraine]

            Show
            Mdavis4290 Molly Davis added a comment - - edited Review: Directory- /nobackup/tomato_genome/alt_splicing/arabitag/all With command LL/ll I checked to see if any files were zero. Only file 'jobs.err' had zero file size but all of the SRR and support files were non-zero files. Every "SRR" bam file in our control and experimental sample directories has a corresponding "splice_support.txt" file. SRP328042-molly SRR files are in-between SRP252265 files in the directory might be hard to differentiate which is control and which is experimental. [~aloraine]
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ]
            Hide
            ann.loraine Ann Loraine added a comment -

            Thanks [~molly]. I decided to store all the files in the same folder and will use an index / table of contents type strategy to distinguish them. Moving to Done.

            Show
            ann.loraine Ann Loraine added a comment - Thanks [~molly] . I decided to store all the files in the same folder and will use an index / table of contents type strategy to distinguish them. Moving to Done.
            ann.loraine Ann Loraine made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            ann.loraine Ann Loraine made changes -
            Status First Level Review in Progress [ 10301 ] Needs 1st Level Review [ 10005 ]
            ann.loraine Ann Loraine made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            ann.loraine Ann Loraine made changes -
            Status First Level Review in Progress [ 10301 ] Ready for Pull Request [ 10304 ]
            ann.loraine Ann Loraine made changes -
            Status Ready for Pull Request [ 10304 ] Pull Request Submitted [ 10101 ]
            ann.loraine Ann Loraine made changes -
            Status Pull Request Submitted [ 10101 ] Reviewing Pull Request [ 10303 ]
            ann.loraine Ann Loraine made changes -
            Status Reviewing Pull Request [ 10303 ] Merged Needs Testing [ 10002 ]
            ann.loraine Ann Loraine made changes -
            Status Merged Needs Testing [ 10002 ] Post-merge Testing In Progress [ 10003 ]
            ann.loraine Ann Loraine made changes -
            Resolution Done [ 10000 ]
            Status Post-merge Testing In Progress [ 10003 ] Closed [ 6 ]
            ann.loraine Ann Loraine made changes -
            Resolution Done [ 10000 ]
            Status Closed [ 6 ] To-Do [ 10305 ]
            Hide
            ann.loraine Ann Loraine added a comment -

            Re-opening to include new work to tabulate results and read into R for interactive analysis of results.

            Show
            ann.loraine Ann Loraine added a comment - Re-opening to include new work to tabulate results and read into R for interactive analysis of results.
            ann.loraine Ann Loraine made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            ann.loraine Ann Loraine made changes -
            Sprint Fall 2 2022 Sep 5 [ 154 ] Fall 2 2022 Sep 5, Fall 3 2022 Sep 26 [ 154, 155 ]
            Hide
            ann.loraine Ann Loraine added a comment -

            Splicing support output files added as "tar" file to https://bitbucket.org/hotpollen/splicing-analysis as TabulateSplicingSupport/results/2022-09-08-results.tar.

            Show
            ann.loraine Ann Loraine added a comment - Splicing support output files added as "tar" file to https://bitbucket.org/hotpollen/splicing-analysis as TabulateSplicingSupport/results/2022-09-08-results.tar.
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Creating R code for organizing and sanity-checking results. Working on this fork and branch: https://bitbucket.org/aloraine/splicing-analysis/branch/IGBF-3172

            Show
            ann.loraine Ann Loraine added a comment - - edited Creating R code for organizing and sanity-checking results. Working on this fork and branch: https://bitbucket.org/aloraine/splicing-analysis/branch/IGBF-3172
            Hide
            ann.loraine Ann Loraine added a comment -

            Merged changes into master branch. Created new folder with consolidated results:

            • TabulateSplicingSupport/results/ containing file 2022-09-08-results.txt
            Show
            ann.loraine Ann Loraine added a comment - Merged changes into master branch. Created new folder with consolidated results: TabulateSplicingSupport/results/ containing file 2022-09-08-results.txt
            ann.loraine Ann Loraine made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ]
            ann.loraine Ann Loraine made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            ann.loraine Ann Loraine made changes -
            Status First Level Review in Progress [ 10301 ] Ready for Pull Request [ 10304 ]
            ann.loraine Ann Loraine made changes -
            Status Ready for Pull Request [ 10304 ] Pull Request Submitted [ 10101 ]
            ann.loraine Ann Loraine made changes -
            Status Pull Request Submitted [ 10101 ] Reviewing Pull Request [ 10303 ]
            ann.loraine Ann Loraine made changes -
            Status Reviewing Pull Request [ 10303 ] Merged Needs Testing [ 10002 ]
            ann.loraine Ann Loraine made changes -
            Status Merged Needs Testing [ 10002 ] Post-merge Testing In Progress [ 10003 ]
            ann.loraine Ann Loraine made changes -
            Resolution Done [ 10000 ]
            Status Post-merge Testing In Progress [ 10003 ] Closed [ 6 ]
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ]

              People

              • Assignee:
                ann.loraine Ann Loraine
                Reporter:
                ann.loraine Ann Loraine
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: