Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3172

Tabulate splicing support by running arabitag algorithm on junction and bam files

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Run the arabitag algorithm to tabulate splice junction support using output from find_junctions.sh in linked issue.

      Ref: See diagram from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6589529/figure/pld3136-fig-0001/
      See paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6589529/

      Rice paper repository: https://bitbucket.org/lorainelab/ricealtsplice/src/master/

      Arabitag algorithm repository: https://bitbucket.org/lorainelab/altspliceanalysis/src/master/

        Attachments

          Issue Links

            Activity

            Hide
            Mdavis4290 Molly Davis added a comment - - edited
            Show
            Mdavis4290 Molly Davis added a comment - - edited During our meeting I brought up using deep learning/machine learning to identify splicing events. Here are some papers I found that were related to the idea: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6722613/ https://academic.oup.com/bioinformatics/article/33/14/i274/3953982 code: https://majiq.biociphers.org/jha_et_al_2017/ https://www.sciencedirect.com/science/article/pii/S0092867418316295 code: https://github.com/Illumina/SpliceAI
            Hide
            ann.loraine Ann Loraine added a comment -

            Running arabitag tabulation algorithm with:

            sbatch --export=BED=S_lycopersicum_Jun_2022_noPRAM.bed,F=5,RIF=10 --output=arabitag.out --error=arabitag.err arabitag.sh 2>jobs.err 1>jobs.out
            

            in:

            /nobackup/tomato_genome/alt_splicing/arabitag
            

            using code from my home directory.

            Show
            ann.loraine Ann Loraine added a comment - Running arabitag tabulation algorithm with: sbatch --export=BED=S_lycopersicum_Jun_2022_noPRAM.bed,F=5,RIF=10 --output=arabitag.out --error=arabitag.err arabitag.sh 2>jobs.err 1>jobs.out in: /nobackup/tomato_genome/alt_splicing/arabitag using code from my home directory.
            Hide
            ann.loraine Ann Loraine added a comment -

            Made changes to arabitag.sh to enable the splicing analysis algorithm to be run in parallel on the slurm cluster.
            Ran it, and now the files are all made.
            Within directory /nobackup/tomato_genome/alt_splicing/arabitag/all, there are multiple files with prefix "SRR" identifier and file name suffix splice_support.txt.
            The first line of each file is a comment (marked with hash character) giving the flanking base number and other parameters used to run the data processing steps.
            Each row represents an alternative splicing event, with two mutually exclusive choices.

            • chromosome - the location of the event
            • strand - the strand of the event
            • start - interbase coordinates for the start position of the difference region, the alternatively spliced region
            • end - interbase coordinates for the end position of the difference region
            • Ga - gene model lacking the difference region
            • Gp - gene model containing the difference region
            • Ga_[sample name] - the number of alignments from "sample_name" (an RNA-Seq library) that unambiguously support model Ga
            • Gp_[sample name] - the number of alignmetns from "sample_name" that unambiguously support model Gp

            For the next step, we need to consolidate all of these into a single data frame, or something like that, so that we can then determine how or if support for Ga and Gp splicing choices varies between or among samples.

            Show
            ann.loraine Ann Loraine added a comment - Made changes to arabitag.sh to enable the splicing analysis algorithm to be run in parallel on the slurm cluster. Ran it, and now the files are all made. Within directory /nobackup/tomato_genome/alt_splicing/arabitag/all, there are multiple files with prefix "SRR" identifier and file name suffix splice_support.txt. The first line of each file is a comment (marked with hash character) giving the flanking base number and other parameters used to run the data processing steps. Each row represents an alternative splicing event, with two mutually exclusive choices. chromosome - the location of the event strand - the strand of the event start - interbase coordinates for the start position of the difference region, the alternatively spliced region end - interbase coordinates for the end position of the difference region Ga - gene model lacking the difference region Gp - gene model containing the difference region Ga_ [sample name] - the number of alignments from "sample_name" (an RNA-Seq library) that unambiguously support model Ga Gp_ [sample name] - the number of alignmetns from "sample_name" that unambiguously support model Gp For the next step, we need to consolidate all of these into a single data frame, or something like that, so that we can then determine how or if support for Ga and Gp splicing choices varies between or among samples.
            Hide
            ann.loraine Ann Loraine added a comment -

            To review:

            • check that files have reasonable sizes (no "zero" size files, for example)
            • check that every "SRR" bam file in our control and experimental sample directories has a corresponding "splice_support.txt" file
            Show
            ann.loraine Ann Loraine added a comment - To review: check that files have reasonable sizes (no "zero" size files, for example) check that every "SRR" bam file in our control and experimental sample directories has a corresponding "splice_support.txt" file
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Review:

            Directory- /nobackup/tomato_genome/alt_splicing/arabitag/all

            • With command LL/ll I checked to see if any files were zero. Only file 'jobs.err' had zero file size but all of the SRR and support files were non-zero files.
            • Every "SRR" bam file in our control and experimental sample directories has a corresponding "splice_support.txt" file. SRP328042-molly SRR files are in-between SRP252265 files in the directory might be hard to differentiate which is control and which is experimental.

            [~aloraine]

            Show
            Mdavis4290 Molly Davis added a comment - - edited Review: Directory- /nobackup/tomato_genome/alt_splicing/arabitag/all With command LL/ll I checked to see if any files were zero. Only file 'jobs.err' had zero file size but all of the SRR and support files were non-zero files. Every "SRR" bam file in our control and experimental sample directories has a corresponding "splice_support.txt" file. SRP328042-molly SRR files are in-between SRP252265 files in the directory might be hard to differentiate which is control and which is experimental. [~aloraine]
            Hide
            ann.loraine Ann Loraine added a comment -

            Thanks [~molly]. I decided to store all the files in the same folder and will use an index / table of contents type strategy to distinguish them. Moving to Done.

            Show
            ann.loraine Ann Loraine added a comment - Thanks [~molly] . I decided to store all the files in the same folder and will use an index / table of contents type strategy to distinguish them. Moving to Done.
            Hide
            ann.loraine Ann Loraine added a comment -

            Re-opening to include new work to tabulate results and read into R for interactive analysis of results.

            Show
            ann.loraine Ann Loraine added a comment - Re-opening to include new work to tabulate results and read into R for interactive analysis of results.
            Hide
            ann.loraine Ann Loraine added a comment -

            Splicing support output files added as "tar" file to https://bitbucket.org/hotpollen/splicing-analysis as TabulateSplicingSupport/results/2022-09-08-results.tar.

            Show
            ann.loraine Ann Loraine added a comment - Splicing support output files added as "tar" file to https://bitbucket.org/hotpollen/splicing-analysis as TabulateSplicingSupport/results/2022-09-08-results.tar.
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Creating R code for organizing and sanity-checking results. Working on this fork and branch: https://bitbucket.org/aloraine/splicing-analysis/branch/IGBF-3172

            Show
            ann.loraine Ann Loraine added a comment - - edited Creating R code for organizing and sanity-checking results. Working on this fork and branch: https://bitbucket.org/aloraine/splicing-analysis/branch/IGBF-3172
            Hide
            ann.loraine Ann Loraine added a comment -

            Merged changes into master branch. Created new folder with consolidated results:

            • TabulateSplicingSupport/results/ containing file 2022-09-08-results.txt
            Show
            ann.loraine Ann Loraine added a comment - Merged changes into master branch. Created new folder with consolidated results: TabulateSplicingSupport/results/ containing file 2022-09-08-results.txt

              People

              • Assignee:
                ann.loraine Ann Loraine
                Reporter:
                ann.loraine Ann Loraine
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: