Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3258

Download and process SRP371294 RNA-Seq data

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      This data set from Arabidopsis thaliana contains six samples of FACS-sorted sperm and vegetative cells from mature pollen. The publication is here: https://pubmed.ncbi.nlm.nih.gov/36515615/

      This would be a useful reference data set for our studies, as the authors reported many differentially and alternatively spliced genes between sperm and vegetative cells harvested from mature Arabidopsis pollen.

      For this task

      • download the data as fastq files from SRA
      • align fastq files using nf-core/rnaseq vs. TAIR10 genome (see link below)
      • for alignment parameters, use original publication (referenced above) and their parameters that they used in their experiment. Every RNA-Seq to genome alignment tool requires the user to define a maximum intron size parameter. Never use the default! Customize for your species!
      • align using same maxIntron parameter reported in the methods section for the paper
      • for the above, make a new "config" file
      • create coverage graphs
      • create junction files

      Use this reference genome for alignment:

      Create the "fasta" file from the above 2bit file using blat suite tools on cluster. The program you need is 2bitToFa (I think to load it, you have to use "module load blatsuite" or something like that. Use "module avail" to find the correct module name.)

      2bitToFa command:

      twoBitToFa A_thaliana_Jun_2009.2bit A_thaliana_Jun_2009.fa
      

        Attachments

          Issue Links

            Activity

            ann.loraine Ann Loraine made changes -
            Link This issue relates to IGBF-3329 [ IGBF-3329 ]
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ] Molly Davis [ molly ]
            ann.loraine Ann Loraine made changes -
            Resolution Done [ 10000 ]
            Status Post-merge Testing In Progress [ 10003 ] Closed [ 6 ]
            ann.loraine Ann Loraine made changes -
            Status Merged Needs Testing [ 10002 ] Post-merge Testing In Progress [ 10003 ]
            ann.loraine Ann Loraine made changes -
            Status Reviewing Pull Request [ 10303 ] Merged Needs Testing [ 10002 ]
            ann.loraine Ann Loraine made changes -
            Status Pull Request Submitted [ 10101 ] Reviewing Pull Request [ 10303 ]
            ann.loraine Ann Loraine made changes -
            Status Ready for Pull Request [ 10304 ] Pull Request Submitted [ 10101 ]
            ann.loraine Ann Loraine made changes -
            Status First Level Review in Progress [ 10301 ] Ready for Pull Request [ 10304 ]
            Hide
            ann.loraine Ann Loraine added a comment -

            Added files to repository:

            • SRP371294-multiqc_report.html
            • SRP371294-salmon.merged.gene_counts.tsv
            • SRP371294.config (copy of /nobackup/tomato_genome/alt_splicing/SRP371294/Arabidopsis.config)

            Multiqc file looks fine.

            Moving to DONE.

            Show
            ann.loraine Ann Loraine added a comment - Added files to repository: SRP371294-multiqc_report.html SRP371294-salmon.merged.gene_counts.tsv SRP371294.config (copy of /nobackup/tomato_genome/alt_splicing/SRP371294/Arabidopsis.config) Multiqc file looks fine. Moving to DONE.
            Hide
            ann.loraine Ann Loraine added a comment -

            Transferring with:

            scp -J aloraine@hop.renci.org -r SRP371294.transfer 
            aloraine@lorainelab-quickload.scidas.org:/projects/igbquickload/lorainelab/www/main/htdocs/rnaseq/A_thaliana_Jun_2009/SRP371294/.
            
            Show
            ann.loraine Ann Loraine added a comment - Transferring with: scp -J aloraine@hop.renci.org -r SRP371294.transfer aloraine@lorainelab-quickload.scidas.org:/projects/igbquickload/lorainelab/www/main/htdocs/rnaseq/A_thaliana_Jun_2009/SRP371294/.
            Hide
            ann.loraine Ann Loraine added a comment -

            Deploying data to:

            /projects/igbquickload/lorainelab/www/main/htdocs/rnaseq/A_thaliana_Jun_2009/SRP371294

            On RENCI sci-das host.

            Note:

            This will not be part of the hotpollen Quickload but instead will get put into the "rnaseq" quickload.

            Show
            ann.loraine Ann Loraine added a comment - Deploying data to: /projects/igbquickload/lorainelab/www/main/htdocs/rnaseq/A_thaliana_Jun_2009/SRP371294 On RENCI sci-das host. Note: This will not be part of the hotpollen Quickload but instead will get put into the "rnaseq" quickload.
            Hide
            ann.loraine Ann Loraine added a comment -

            Copied files to /nobackup/tomato_genome/alt_splicing/SRP371294.transfer ( a location where my user has write permission )

            Number of files: 36
            Number of samples: 6
            Size: 15 Gb

            Show
            ann.loraine Ann Loraine added a comment - Copied files to /nobackup/tomato_genome/alt_splicing/SRP371294.transfer ( a location where my user has write permission ) Number of files: 36 Number of samples: 6 Size: 15 Gb
            Hide
            ann.loraine Ann Loraine added a comment -
            Show
            ann.loraine Ann Loraine added a comment - Read this: https://mason.gmu.edu/~montecin/UNIXpermiss.htm
            Hide
            ann.loraine Ann Loraine added a comment -

            Permissions errors:

            mkdir: cannot create directory ‘to_transfer’: Permission denied
            [aloraine@str-i1 SRP371294]$ pwd
            /nobackup/tomato_genome/alt_splicing/SRP371294
            
            Show
            ann.loraine Ann Loraine added a comment - Permissions errors: mkdir: cannot create directory ‘to_transfer’: Permission denied [aloraine@str-i1 SRP371294]$ pwd /nobackup/tomato_genome/alt_splicing/SRP371294
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Ok! Let me know if it is working now!

            chmod +rwx *
            

            [~aloraine]

            Show
            Mdavis4290 Molly Davis added a comment - - edited Ok! Let me know if it is working now! chmod +rwx * [~aloraine]
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Request for [~molly]: Please log into the cluster & make sure all the newly made files and directories are group-writable. (So I can make or edit files)

            Show
            ann.loraine Ann Loraine added a comment - - edited Request for [~molly] : Please log into the cluster & make sure all the newly made files and directories are group-writable. (So I can make or edit files)
            ann.loraine Ann Loraine made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            Hide
            Mdavis4290 Molly Davis added a comment -

            Review:

            • Do I need to add SRP371294.csv, A_thaliana_Jun_2009.fa, Araport11.gtf, Araport11.bed, and Arabidopsis.config to bitbucket? If so, where?
            • Do I need to add multiqc report and csv files to bitbucket?
            • Do all coverage graphs and junction files work?
            • Should I make new directories on cluster for coverage graphs or junction file results?
            Show
            Mdavis4290 Molly Davis added a comment - Review: Do I need to add SRP371294.csv, A_thaliana_Jun_2009.fa, Araport11.gtf, Araport11.bed, and Arabidopsis.config to bitbucket? If so, where? Do I need to add multiqc report and csv files to bitbucket? Do all coverage graphs and junction files work? Should I make new directories on cluster for coverage graphs or junction file results?
            Mdavis4290 Molly Davis made changes -
            Assignee Molly Davis [ molly ] Ann Loraine [ aloraine ]
            Mdavis4290 Molly Davis made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            Mdavis4290 Molly Davis made changes -
            Attachment Successful Pipeline Run 6.png [ 17867 ]
            Mdavis4290 Molly Davis made changes -
            Attachment SRP371294_multiqc_report.html [ 17866 ]
            Mdavis4290 Molly Davis made changes -
            Attachment multiqc_report.html [ 17865 ]
            Mdavis4290 Molly Davis made changes -
            Attachment multiqc_report.html [ 17865 ]
            Mdavis4290 Molly Davis made changes -
            Attachment Screenshot 2023-04-20 at 1.32.08 PM.png [ 17864 ]
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Pipeline Successfully ran with the following parameters:

            ./doIt.sh SRP371294.csv A_thaliana_Jun_2009.fa Araport11.gtf Araport11.bed Arabidopsis.config 1> out.6.txt 2> err.6.txt
            

            Notes: Araport11.gtf worked due to only containing exons and appropriate annotations that were needed. Can only use the command 'wget' on the cluster with open raw data in bitbucket.

            Next Steps: Check multiqc report and determine if strandedness was correct for csv file. Remove the word 'sorted' from bam files. Create coverage graphs and junction files.

            scp mdavi258@hpc.uncc.edu:/nobackup/tomato_genome/alt_splicing/SRP371294/results/multiqc/star_salmon/multiqc_report.html ~/Desktop
            

            Fixed Successful Run:

            Scaled Coverage Graphs:

            ./sbatch-doIt.sh .bam bamCoverage.sh >jobs.out 2>jobs.err
            
            • /nobackup/tomato_genome/alt_splicing/SRP371294/results/star_salmon

            Junction Files:

            • Reference ticket for code IGBF-3165
            • Had to change 2bit file in find_junctions.sh script to 'A_thaliana_Jun_2009.2bit' (see ticket description)
            ./sbatch-doIt.sh .bam find_junctions.sh >jobs.out 2>jobs.err
            
            • /nobackup/tomato_genome/alt_splicing/SRP371294/results/star_salmon
            Show
            Mdavis4290 Molly Davis added a comment - - edited Pipeline Successfully ran with the following parameters: ./doIt.sh SRP371294.csv A_thaliana_Jun_2009.fa Araport11.gtf Araport11.bed Arabidopsis.config 1> out.6.txt 2> err.6.txt Notes : Araport11.gtf worked due to only containing exons and appropriate annotations that were needed. Can only use the command 'wget' on the cluster with open raw data in bitbucket. Next Steps : Check multiqc report and determine if strandedness was correct for csv file. Remove the word 'sorted' from bam files. Create coverage graphs and junction files. scp mdavi258@hpc.uncc.edu:/nobackup/tomato_genome/alt_splicing/SRP371294/results/multiqc/star_salmon/multiqc_report.html ~/Desktop Strandedness = unstranded Need to fix csv file and rerun pipeline SRP371294_multiqc_report.html Fixed Successful Run: Scaled Coverage Graphs: ./sbatch-doIt.sh .bam bamCoverage.sh >jobs.out 2>jobs.err /nobackup/tomato_genome/alt_splicing/SRP371294/results/star_salmon Junction Files: Reference ticket for code IGBF-3165 Had to change 2bit file in find_junctions.sh script to 'A_thaliana_Jun_2009.2bit' (see ticket description) ./sbatch-doIt.sh .bam find_junctions.sh >jobs.out 2>jobs.err /nobackup/tomato_genome/alt_splicing/SRP371294/results/star_salmon
            Mdavis4290 Molly Davis made changes -
            Attachment Screenshot 2023-04-20 at 1.32.08 PM.png [ 17864 ]
            Hide
            ann.loraine Ann Loraine added a comment -

            TAIR10.gtf (NEW):

            local aloraine$ grep AT1G01010 TAIR10.gtf 
            Chr1	TAIR10	exon	3631	3913	.	+	.	transcript_id "AT1G01010.1"; gene_id "AT1G01010";
            Chr1	TAIR10	exon	3996	4276	.	+	.	transcript_id "AT1G01010.1"; gene_id "AT1G01010";
            Chr1	TAIR10	exon	4486	4605	.	+	.	transcript_id "AT1G01010.1"; gene_id "AT1G01010";
            Chr1	TAIR10	exon	4706	5095	.	+	.	transcript_id "AT1G01010.1"; gene_id "AT1G01010";
            Chr1	TAIR10	exon	5174	5326	.	+	.	transcript_id "AT1G01010.1"; gene_id "AT1G01010";
            Chr1	TAIR10	exon	5439	5899	.	+	.	transcript_id "AT1G01010.1"; gene_id "AT1G01010";
            
            Show
            ann.loraine Ann Loraine added a comment - TAIR10.gtf (NEW): TAIR10 gene models only from: https://bitbucket.org/hotpollen/splicing-analysis/src/main/ExternalData/TAIR10.gtf local aloraine$ grep AT1G01010 TAIR10.gtf Chr1 TAIR10 exon 3631 3913 . + . transcript_id "AT1G01010.1" ; gene_id "AT1G01010" ; Chr1 TAIR10 exon 3996 4276 . + . transcript_id "AT1G01010.1" ; gene_id "AT1G01010" ; Chr1 TAIR10 exon 4486 4605 . + . transcript_id "AT1G01010.1" ; gene_id "AT1G01010" ; Chr1 TAIR10 exon 4706 5095 . + . transcript_id "AT1G01010.1" ; gene_id "AT1G01010" ; Chr1 TAIR10 exon 5174 5326 . + . transcript_id "AT1G01010.1" ; gene_id "AT1G01010" ; Chr1 TAIR10 exon 5439 5899 . + . transcript_id "AT1G01010.1" ; gene_id "AT1G01010" ;
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Compare Araport11 GTF with Tomato GTF to find any formatting issues:

            Araport11_GTF_genes_transposons.current.gtf:

            Chr1    Araport11       gene    3631    5899    .       +       .       transcript_id "AT1G01010"; gene_id "AT1G01010";
            Chr1    Araport11       mRNA    3631    5899    .       +       .       transcript_id "AT1G01010.1"; gene_id "AT1G01010";
            Chr1    Araport11       CDS     3760    3913    .       +       0       transcript_id "AT1G01010.1"; gene_id "AT1G01010";
            Chr1    Araport11       CDS     3996    4276    .       +       2       transcript_id "AT1G01010.1"; gene_id "AT1G01010";
            Chr1    Araport11       CDS     4486    4605    .       +       0       transcript_id "AT1G01010.1"; gene_id "AT1G01010";
            Chr1    Araport11       CDS     4706    5095    .       +       0       transcript_id "AT1G01010.1"; gene_id "AT1G01010";
            Chr1    Araport11       CDS     5174    5326    .       +       0       transcript_id "AT1G01010.1"; gene_id "AT1G01010";
            Chr1    Araport11       CDS     5439    5630    .       +       0       transcript_id "AT1G01010.1"; gene_id "AT1G01010";
            Chr1    Araport11       exon    3631    3913    .       +       .       transcript_id "AT1G01010.1"; gene_id "AT1G01010";
            Chr1    Araport11       exon    3996    4276    .       +       .       transcript_id "AT1G01010.1"; gene_id "AT1G01010";
            

            S_lycopersicum_Jun_2022.gtf:

            • Tomato gtf for comparison
            • Directory: /nobackup/tomato_genome/scripts/flavonoid-rnaseq/ExternalDataSets/S_lycopersicum_Jun_2022.gtf
            0       NA      exon    153300  153429  .       -       .       transcript_id "Solyc00T000001.1"; gene_id "Solyc00G000001";
            0       NA      exon    159764  159897  .       -       .       transcript_id "Solyc00T000001.1"; gene_id "Solyc00G000001";
            0       NA      exon    238088  238260  .       +       .       transcript_id "Solyc00T000002.1"; gene_id "Solyc00G000002";
            0       NA      exon    238619  238737  .       +       .       transcript_id "Solyc00T000002.1"; gene_id "Solyc00G000002";
            0       NA      exon    240718  242241  .       -       .       transcript_id "Solyc00T000003.1"; gene_id "Solyc00G000003";
            0       NA      exon    242297  242691  .       -       .       transcript_id "Solyc00T000004.1"; gene_id "Solyc00G000004";
            0       NA      exon    243423  243537  .       -       .       transcript_id "Solyc00T000004.1"; gene_id "Solyc00G000004";
            0       NA      exon    242523  242706  .       +       .       transcript_id "Solyc00T000005.1"; gene_id "Solyc00G000005";
            0       NA      exon    243393  243590  .       +       .       transcript_id "Solyc00T000005.1"; gene_id "Solyc00G000005";
            0       NA      exon    245339  245748  .       -       .       transcript_id "Solyc00T000006.1"; gene_id "Solyc00G000006";
            

            [~aloraine]

            Show
            Mdavis4290 Molly Davis added a comment - - edited Compare Araport11 GTF with Tomato GTF to find any formatting issues : Araport11_GTF_genes_transposons.current.gtf: https://www.arabidopsis.org/download_files/Genes/Araport11_genome_release/Araport11_GTF_genes_transposons.current.gtf.gz Chr1 Araport11 gene 3631 5899 . + . transcript_id "AT1G01010" ; gene_id "AT1G01010" ; Chr1 Araport11 mRNA 3631 5899 . + . transcript_id "AT1G01010.1" ; gene_id "AT1G01010" ; Chr1 Araport11 CDS 3760 3913 . + 0 transcript_id "AT1G01010.1" ; gene_id "AT1G01010" ; Chr1 Araport11 CDS 3996 4276 . + 2 transcript_id "AT1G01010.1" ; gene_id "AT1G01010" ; Chr1 Araport11 CDS 4486 4605 . + 0 transcript_id "AT1G01010.1" ; gene_id "AT1G01010" ; Chr1 Araport11 CDS 4706 5095 . + 0 transcript_id "AT1G01010.1" ; gene_id "AT1G01010" ; Chr1 Araport11 CDS 5174 5326 . + 0 transcript_id "AT1G01010.1" ; gene_id "AT1G01010" ; Chr1 Araport11 CDS 5439 5630 . + 0 transcript_id "AT1G01010.1" ; gene_id "AT1G01010" ; Chr1 Araport11 exon 3631 3913 . + . transcript_id "AT1G01010.1" ; gene_id "AT1G01010" ; Chr1 Araport11 exon 3996 4276 . + . transcript_id "AT1G01010.1" ; gene_id "AT1G01010" ; S_lycopersicum_Jun_2022.gtf: Tomato gtf for comparison Directory: /nobackup/tomato_genome/scripts/flavonoid-rnaseq/ExternalDataSets/S_lycopersicum_Jun_2022.gtf 0 NA exon 153300 153429 . - . transcript_id "Solyc00T000001.1" ; gene_id "Solyc00G000001" ; 0 NA exon 159764 159897 . - . transcript_id "Solyc00T000001.1" ; gene_id "Solyc00G000001" ; 0 NA exon 238088 238260 . + . transcript_id "Solyc00T000002.1" ; gene_id "Solyc00G000002" ; 0 NA exon 238619 238737 . + . transcript_id "Solyc00T000002.1" ; gene_id "Solyc00G000002" ; 0 NA exon 240718 242241 . - . transcript_id "Solyc00T000003.1" ; gene_id "Solyc00G000003" ; 0 NA exon 242297 242691 . - . transcript_id "Solyc00T000004.1" ; gene_id "Solyc00G000004" ; 0 NA exon 243423 243537 . - . transcript_id "Solyc00T000004.1" ; gene_id "Solyc00G000004" ; 0 NA exon 242523 242706 . + . transcript_id "Solyc00T000005.1" ; gene_id "Solyc00G000005" ; 0 NA exon 243393 243590 . + . transcript_id "Solyc00T000005.1" ; gene_id "Solyc00G000005" ; 0 NA exon 245339 245748 . - . transcript_id "Solyc00T000006.1" ; gene_id "Solyc00G000006" ; [~aloraine]
            Show
            Mdavis4290 Molly Davis added a comment - - edited Optional file resources : https://www.arabidopsis.org/download/index-auto.jsp?dir=%2Fdownload_files%2FGenes%2FTAIR10_genome_release%2FTAIR10_chromosome_files or https://www.ncbi.nlm.nih.gov/genome/4
            Mdavis4290 Molly Davis made changes -
            Comment [ Pipeline Error 1 solution:
            Araport11_GTF_genes_transposons.current.gtf.gz is working with the fasta file. ]
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Pipeline Error 2:
            out.2.txt

            Command error:
              INFO:    Converting SIF file to temporary sandbox...
              Version Info: ### PLEASE UPGRADE SALMON ###
              ### A newer version of salmon with important bug fixes and improvements is available. ####
              ###
              The newest version, available at https://github.com/COMBINE-lab/salmon/releases
              contains new features, improvements, and bug fixes; please upgrade at your
              earliest convenience.
            
            ...
            
            [2023-04-18 15:58:36.566] [jointLog] [critical] Transcript ATMG01375.1 appeared in the BAM header, but was not in the provided FASTA file
              [2023-04-18 15:58:36.571] [jointLog] [critical] Please provide a reference FASTA file that includes all targets present in the BAM header
              If you have access to the genome FASTA and GTF used for alignment 
              consider generating a transcriptome fasta using a command like: 
              gffread -w output.fa -g genome.fa genome.gtf
            
            
            Show
            Mdavis4290 Molly Davis added a comment - - edited Pipeline Error 2: out.2.txt Command error: INFO: Converting SIF file to temporary sandbox... Version Info: ### PLEASE UPGRADE SALMON ### ### A newer version of salmon with important bug fixes and improvements is available. #### ### The newest version, available at https: //github.com/COMBINE-lab/salmon/releases contains new features, improvements, and bug fixes; please upgrade at your earliest convenience. ... [2023-04-18 15:58:36.566] [jointLog] [critical] Transcript ATMG01375.1 appeared in the BAM header, but was not in the provided FASTA file [2023-04-18 15:58:36.571] [jointLog] [critical] Please provide a reference FASTA file that includes all targets present in the BAM header If you have access to the genome FASTA and GTF used for alignment consider generating a transcriptome fasta using a command like: gffread -w output.fa -g genome.fa genome.gtf
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            If the GTF file above can't be used as-is, let's not use it.

            Instead, I think we should run pipeline with GTF files from two sources:

            Running pipeline with both, separately, could expose flaws or limitations we would want to know about.

            Show
            ann.loraine Ann Loraine added a comment - - edited If the GTF file above can't be used as-is, let's not use it. Instead, I think we should run pipeline with GTF files from two sources: hotpollen/splicing-analysis/ExternalData/Araport11.gtf from https://bitbucket.org/hotpollen/flavonoid-rnaseq (see above, [~aloraine] made it) mentioned in previous comment from [~molly] : https://www.arabidopsis.org/download_files/Genes/Araport11_genome_release/Araport11_GTF_genes_transposons.current.gtf.gz Running pipeline with both, separately, could expose flaws or limitations we would want to know about.
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Pipeline Error 1:
            out.1.txt

            Command output:
              rsem-extract-reference-transcripts rsem/genome 0 A_thaliana_Jun_2009_genes.gtf None 0 rsem/A_thaliana_Jun_2009.fa
              "rsem-extract-reference-transcripts rsem/genome 0 A_thaliana_Jun_2009_genes.gtf None 0 rsem/A_thaliana_Jun_2009.fa" failed! Plase check if you provide correct parameters/options for the pipeline!
            
            Show
            Mdavis4290 Molly Davis added a comment - - edited Pipeline Error 1: out.1.txt Command output: rsem-extract-reference-transcripts rsem/genome 0 A_thaliana_Jun_2009_genes.gtf None 0 rsem/A_thaliana_Jun_2009.fa "rsem-extract-reference-transcripts rsem/genome 0 A_thaliana_Jun_2009_genes.gtf None 0 rsem/A_thaliana_Jun_2009.fa" failed! Plase check if you provide correct parameters/options for the pipeline!
            Mdavis4290 Molly Davis made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ] Molly Davis [ molly ]
            ann.loraine Ann Loraine made changes -
            Status In Progress [ 3 ] To-Do [ 10305 ]
            Hide
            ann.loraine Ann Loraine added a comment -

            GTF file is created and added to hotpollen/splicing-analysis as ExternalData/Araport11.gtf.

            attn: [~molly]

            Show
            ann.loraine Ann Loraine added a comment - GTF file is created and added to hotpollen/splicing-analysis as ExternalData/Araport11.gtf. attn: [~molly]
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Making GTF file:

            • Downloaded Araport11.bed.gz and added to splicing-analysis repository
            • Used code from repository genomesource to make a GTF version of Araport.bed.gz:
            gunzip -c ~/src/splicing-analysis/ExternalData/Araport11.bed.gz | bed2gtf.py -g 4 > ~/src/splicing-analysis/ExternalData/Araport11.gtf
            

            Repository with bed2gtf.py:

            local aloraine$ git remote -v
            origin	git@bitbucket.org:lorainelab/genomesource.git (fetch)
            

            Version of bed2gtf.py used:

            5dd6d47 (HEAD -> master, origin/master, origin/HEAD) IGBF-3256 Make conversation file executable
            92eb050 IGBF-3256 Support CHESS GFF format
            34eac2e Improve GTF to/from BED conversion
            

            Oops! "conversation" should be "conversion"

            Show
            ann.loraine Ann Loraine added a comment - - edited Making GTF file: Downloaded Araport11.bed.gz and added to splicing-analysis repository Used code from repository genomesource to make a GTF version of Araport.bed.gz: gunzip -c ~/src/splicing-analysis/ExternalData/Araport11.bed.gz | bed2gtf.py -g 4 > ~/src/splicing-analysis/ExternalData/Araport11.gtf Repository with bed2gtf.py: local aloraine$ git remote -v origin git@bitbucket.org:lorainelab/genomesource.git (fetch) Version of bed2gtf.py used: 5dd6d47 (HEAD -> master, origin/master, origin/HEAD) IGBF-3256 Make conversation file executable 92eb050 IGBF-3256 Support CHESS GFF format 34eac2e Improve GTF to/from BED conversion Oops! "conversation" should be "conversion"
            ann.loraine Ann Loraine made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            Hide
            Mdavis4290 Molly Davis added a comment -

            Need approval from [~aloraine] to decide which GTF file to use.

            Show
            Mdavis4290 Molly Davis added a comment - Need approval from [~aloraine] to decide which GTF file to use.
            Mdavis4290 Molly Davis made changes -
            Assignee Molly Davis [ molly ] Ann Loraine [ aloraine ]
            Mdavis4290 Molly Davis made changes -
            Status In Progress [ 3 ] To-Do [ 10305 ]
            Mdavis4290 Molly Davis made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            ann.loraine Ann Loraine made changes -
            Sprint Spring 6 2023 Mar 20, Spring 7 2023 Apr 10 [ 166, 167 ] Spring 6 2023 Mar 20, Spring 7 2023 Apr 10, Spring 8 2023 Apr 24 [ 166, 167, 168 ]
            ann.loraine Ann Loraine made changes -
            Status In Progress [ 3 ] To-Do [ 10305 ]
            Mdavis4290 Molly Davis made changes -
            Description This [data set from Arabidopsis thaliana|https://trace.ncbi.nlm.nih.gov/Traces/?view=study&acc=SRP371294] contains six samples of FACS-sorted sperm and vegetative cells from mature pollen. The publication is here: https://pubmed.ncbi.nlm.nih.gov/36515615/

            This would be a useful reference data set for our studies, as the authors reported many differentially and alternatively spliced genes between sperm and vegetative cells harvested from mature Arabidopsis pollen.

            For this task

            * download the data as fastq files from SRA
            * align fastq files using nf-core/rnaseq vs. TAIR10 genome (see link below)
            * for alignment parameters, check one of our RNA-Seq data analysis papers that used Arabidopsis TAIR10 genome. Specifically, you need to get the "maximum intron" parameter. Every RNA-Seq to genome alignment tool requires the user to define a maximum intron size parameter. Never use the default! Customize for your species!
            * Use reference genome annotations from the Araport 11 data set
            * align using same maxIntron parameter reported in the methods section for the paper
            * for the above, make a new "config" file
            * create coverage graphs
            * create junction files

            Use this reference genome for alignment:

            * http://lorainelab-quickload.scidas.org/quickload/A_thaliana_Jun_2009/A_thaliana_Jun_2009.2bit

            Create the "fasta" file from the above 2bit file using blat suite tools on cluster. The program you need is 2bitToFa (I think to load it, you have to use "module load blatsuite" or something like that. Use "module avail" to find the correct module name.)

            2bitToFa command:

            {code}
            twoBitToFa A_thaliana_Jun_2009.2bit A_thaliana_Jun_2009.fa
            {code}

            This [data set from Arabidopsis thaliana|https://trace.ncbi.nlm.nih.gov/Traces/?view=study&acc=SRP371294] contains six samples of FACS-sorted sperm and vegetative cells from mature pollen. The publication is here: https://pubmed.ncbi.nlm.nih.gov/36515615/

            This would be a useful reference data set for our studies, as the authors reported many differentially and alternatively spliced genes between sperm and vegetative cells harvested from mature Arabidopsis pollen.

            For this task

            * download the data as fastq files from SRA
            * align fastq files using nf-core/rnaseq vs. TAIR10 genome (see link below)
            * for alignment parameters, use original publication (referenced above) and their parameters that they used in their experiment. Every RNA-Seq to genome alignment tool requires the user to define a maximum intron size parameter. Never use the default! Customize for your species!
            * align using same maxIntron parameter reported in the methods section for the paper
            * for the above, make a new "config" file
            * create coverage graphs
            * create junction files

            Use this reference genome for alignment:

            * http://lorainelab-quickload.scidas.org/quickload/A_thaliana_Jun_2009/A_thaliana_Jun_2009.2bit

            Create the "fasta" file from the above 2bit file using blat suite tools on cluster. The program you need is 2bitToFa (I think to load it, you have to use "module load blatsuite" or something like that. Use "module avail" to find the correct module name.)

            2bitToFa command:

            {code}
            twoBitToFa A_thaliana_Jun_2009.2bit A_thaliana_Jun_2009.fa
            {code}

            ann.loraine Ann Loraine made changes -
            Description This [data set from Arabidopsis thaliana|https://trace.ncbi.nlm.nih.gov/Traces/?view=study&acc=SRP371294] contains six samples of FACS-sorted sperm and vegetative cells from mature pollen. The publication is here: https://pubmed.ncbi.nlm.nih.gov/36515615/

            This would be a useful reference data set for our studies, as the authors reported many differentially and alternatively spliced genes between sperm and vegetative cells harvested from mature Arabidopsis pollen.

            For this task

            * download the data as fastq files from SRA
            * align fastq files using nf-core/rnaseq vs. TAIR10 genome (see link below)
            * for alignment parameters, check one of our RNA-Seq data analysis papers that used Arabidopsis TAIR10 genome. Specifically, you need to get the "maximum intron" parameter. Every RNA-Seq to genome alignment tool requires the user to define a maximum intron size parameter. Never use the default! Customize for your species!
            * align using same maxIntron parameter reported in the methods section for the paper
            * for the above, make a new "config" file
            * create coverage graphs
            * create junction files

            Use this reference genome for alignment:

            * http://lorainelab-quickload.scidas.org/quickload/A_thaliana_Jun_2009/A_thaliana_Jun_2009.2bit

            Create the "fasta" file from the above 2bit file using blat suite tools on cluster. The program you need is 2bitToFa (I think to load it, you have to use "module load blatsuite" or something like that. Use "module avail" to find the correct module name.)

            2bitToFa command:

            {code}
            twoBitToFa A_thaliana_Jun_2009.2bit A_thaliana_Jun_2009.fa
            {code}

            This [data set from Arabidopsis thaliana|https://trace.ncbi.nlm.nih.gov/Traces/?view=study&acc=SRP371294] contains six samples of FACS-sorted sperm and vegetative cells from mature pollen. The publication is here: https://pubmed.ncbi.nlm.nih.gov/36515615/

            This would be a useful reference data set for our studies, as the authors reported many differentially and alternatively spliced genes between sperm and vegetative cells harvested from mature Arabidopsis pollen.

            For this task

            * download the data as fastq files from SRA
            * align fastq files using nf-core/rnaseq vs. TAIR10 genome (see link below)
            * for alignment parameters, check one of our RNA-Seq data analysis papers that used Arabidopsis TAIR10 genome. Specifically, you need to get the "maximum intron" parameter. Every RNA-Seq to genome alignment tool requires the user to define a maximum intron size parameter. Never use the default! Customize for your species!
            * Use reference genome annotations from the Araport 11 data set
            * align using same maxIntron parameter reported in the methods section for the paper
            * for the above, make a new "config" file
            * create coverage graphs
            * create junction files

            Use this reference genome for alignment:

            * http://lorainelab-quickload.scidas.org/quickload/A_thaliana_Jun_2009/A_thaliana_Jun_2009.2bit

            Create the "fasta" file from the above 2bit file using blat suite tools on cluster. The program you need is 2bitToFa (I think to load it, you have to use "module load blatsuite" or something like that. Use "module avail" to find the correct module name.)

            2bitToFa command:

            {code}
            twoBitToFa A_thaliana_Jun_2009.2bit A_thaliana_Jun_2009.fa
            {code}

            Mdavis4290 Molly Davis made changes -
            Description This [data set from Arabidopsis thaliana|https://trace.ncbi.nlm.nih.gov/Traces/?view=study&acc=SRP371294] contains six samples of FACS-sorted sperm and vegetative cells from mature pollen. The publication is here: https://pubmed.ncbi.nlm.nih.gov/36515615/

            This would be a useful reference data set for our studies, as the authors reported many differentially and alternatively spliced genes between sperm and vegetative cells harvested from mature Arabidopsis pollen.

            For this task

            * download the data as fastq files from SRA
            * align fastq files using nf-core/rnaseq vs. TAIR10 genome (see link below)
            * for alignment parameters, check one of our RNA-Seq data analysis papers that used Arabidopsis TAIR10 genome. Specifically, you need to get the "maximum intron" parameter. Every RNA-Seq to genome alignment tool requires the user to define a maximum intron size parameter. Never use the default! Customize for your species!
            * align using same maxIntron parameter reported in the methods section for the paper
            * for the above, make a new "config" file
            * create coverage graphs
            * create junction files

            Use this reference genome for alignment:

            * http://lorainelab-quickload.scidas.org/quickload/A_thaliana_Jun_2009/A_thaliana_Jun_2009.2bit

            Create the "fasta" file from the above 2bit file using blat suite tools on cluster. The program you need is 2bitToFa (I think to load it, you have to use "module load blatsuite" or something like that. Use "module avail" to find the correct module name.)

            2bitToFa command:

            {code}
            2bitToFa A_thaliana_Jun_2009.2bit A_thaliana_Jun_2009.fa
            {code}

            This [data set from Arabidopsis thaliana|https://trace.ncbi.nlm.nih.gov/Traces/?view=study&acc=SRP371294] contains six samples of FACS-sorted sperm and vegetative cells from mature pollen. The publication is here: https://pubmed.ncbi.nlm.nih.gov/36515615/

            This would be a useful reference data set for our studies, as the authors reported many differentially and alternatively spliced genes between sperm and vegetative cells harvested from mature Arabidopsis pollen.

            For this task

            * download the data as fastq files from SRA
            * align fastq files using nf-core/rnaseq vs. TAIR10 genome (see link below)
            * for alignment parameters, check one of our RNA-Seq data analysis papers that used Arabidopsis TAIR10 genome. Specifically, you need to get the "maximum intron" parameter. Every RNA-Seq to genome alignment tool requires the user to define a maximum intron size parameter. Never use the default! Customize for your species!
            * align using same maxIntron parameter reported in the methods section for the paper
            * for the above, make a new "config" file
            * create coverage graphs
            * create junction files

            Use this reference genome for alignment:

            * http://lorainelab-quickload.scidas.org/quickload/A_thaliana_Jun_2009/A_thaliana_Jun_2009.2bit

            Create the "fasta" file from the above 2bit file using blat suite tools on cluster. The program you need is 2bitToFa (I think to load it, you have to use "module load blatsuite" or something like that. Use "module avail" to find the correct module name.)

            2bitToFa command:

            {code}
            twoBitToFa A_thaliana_Jun_2009.2bit A_thaliana_Jun_2009.fa
            {code}

            ann.loraine Ann Loraine made changes -
            Description This [data set from Arabidopsis thaliana|https://trace.ncbi.nlm.nih.gov/Traces/?view=study&acc=SRP371294] contains six samples of FACS-sorted sperm and vegetative cells from mature pollen. The publication is here: https://pubmed.ncbi.nlm.nih.gov/36515615/

            This would be a useful reference data set for our studies, as the authors reported many differentially and alternatively spliced genes between sperm and vegetative cells harvested from mature Arabidopsis pollen.

            For this task

            * download the data as fastq files from SRA
            * align fastq files using nf-core/rnaseq vs. TAIR10 genome (see link below)
            * for alignment parameters, check the paper:
            * align using same maxIntron parameter reported in the methods section for the paper
            * for the above, make a new "config" file
            * create coverage graphs
            * create junction files

            Use this reference genome for alignment:

            * http://lorainelab-quickload.scidas.org/quickload/A_thaliana_Jun_2009/A_thaliana_Jun_2009.2bit

            Create the "fasta" file from the above 2bit file using blat suite tools on cluster. The program you need is 2bitToFa (I think to load it, you have to use "module load blatsuite" or something like that. Use "module avail" to find the correct module name.)

            2bitToFa command:

            {code}
            2bitToFa A_thaliana_Jun_2009.2bit A_thaliana_Jun_2009.fa
            {code}

            This [data set from Arabidopsis thaliana|https://trace.ncbi.nlm.nih.gov/Traces/?view=study&acc=SRP371294] contains six samples of FACS-sorted sperm and vegetative cells from mature pollen. The publication is here: https://pubmed.ncbi.nlm.nih.gov/36515615/

            This would be a useful reference data set for our studies, as the authors reported many differentially and alternatively spliced genes between sperm and vegetative cells harvested from mature Arabidopsis pollen.

            For this task

            * download the data as fastq files from SRA
            * align fastq files using nf-core/rnaseq vs. TAIR10 genome (see link below)
            * for alignment parameters, check one of our RNA-Seq data analysis papers that used Arabidopsis TAIR10 genome. Specifically, you need to get the "maximum intron" parameter. Every RNA-Seq to genome alignment tool requires the user to define a maximum intron size parameter. Never use the default! Customize for your species!
            * align using same maxIntron parameter reported in the methods section for the paper
            * for the above, make a new "config" file
            * create coverage graphs
            * create junction files

            Use this reference genome for alignment:

            * http://lorainelab-quickload.scidas.org/quickload/A_thaliana_Jun_2009/A_thaliana_Jun_2009.2bit

            Create the "fasta" file from the above 2bit file using blat suite tools on cluster. The program you need is 2bitToFa (I think to load it, you have to use "module load blatsuite" or something like that. Use "module avail" to find the correct module name.)

            2bitToFa command:

            {code}
            2bitToFa A_thaliana_Jun_2009.2bit A_thaliana_Jun_2009.fa
            {code}

            ann.loraine Ann Loraine made changes -
            Description This [data set from Arabidopsis thaliana|https://trace.ncbi.nlm.nih.gov/Traces/?view=study&acc=SRP371294] contains six samples of FACS-sorted sperm and vegetative cells from mature pollen. The publication is here: https://pubmed.ncbi.nlm.nih.gov/36515615/

            This would be a useful reference data set for our studies, as the authors reported many differentially and alternatively spliced genes between sperm and vegetative cells harvested from mature Arabidopsis pollen.

            For this task

            * download the data as fastq files from SRA
            * align fastq files using nf-core/rnaseq vs. TAIR10 genome (see link below)
            * for alignment parameters, check the paper:
            * align using same maxIntron parameter reported in the methods section for the paper
            * for the above, make a new "config" file
            * create coverage graphs
            * create junction files

            Use this reference genome for alignment:

            * http://lorainelab-quickload.scidas.org/quickload/A_thaliana_Jun_2009/A_thaliana_Jun_2009.2bit

            Create the "fasta" file from the above 2bit file using blat suite tools on cluster. The program you need is 2bitToFa (I think to load it, you have to use "module load blatsuite" or something like that. Use "module avail" to find the correct module name.)
            This [data set from Arabidopsis thaliana|https://trace.ncbi.nlm.nih.gov/Traces/?view=study&acc=SRP371294] contains six samples of FACS-sorted sperm and vegetative cells from mature pollen. The publication is here: https://pubmed.ncbi.nlm.nih.gov/36515615/

            This would be a useful reference data set for our studies, as the authors reported many differentially and alternatively spliced genes between sperm and vegetative cells harvested from mature Arabidopsis pollen.

            For this task

            * download the data as fastq files from SRA
            * align fastq files using nf-core/rnaseq vs. TAIR10 genome (see link below)
            * for alignment parameters, check the paper:
            * align using same maxIntron parameter reported in the methods section for the paper
            * for the above, make a new "config" file
            * create coverage graphs
            * create junction files

            Use this reference genome for alignment:

            * http://lorainelab-quickload.scidas.org/quickload/A_thaliana_Jun_2009/A_thaliana_Jun_2009.2bit

            Create the "fasta" file from the above 2bit file using blat suite tools on cluster. The program you need is 2bitToFa (I think to load it, you have to use "module load blatsuite" or something like that. Use "module avail" to find the correct module name.)

            2bitToFa command:

            {code}
            2bitToFa A_thaliana_Jun_2009.2bit A_thaliana_Jun_2009.fa
            {code}

            ann.loraine Ann Loraine made changes -
            Description This [data set from Arabidopsis thaliana|https://trace.ncbi.nlm.nih.gov/Traces/?view=study&acc=SRP371294] contains six samples of FACS-sorted sperm and vegetative cells from mature pollen. The publication is here: https://pubmed.ncbi.nlm.nih.gov/36515615/

            This would be a useful reference data set for our studies, as the authors reported many differentially and alternatively spliced genes between sperm and vegetative cells harvested from mature Arabidopsis pollen.

            For this task

            * download the data as fastq files from SRA
            * align fastq files using nf-core/rnaseq vs. TAIR10 genome (see link below)
            * for alignment parameters, check the paper:
            * align using same maxIntron parameter reported in the methods section for the paper
            * for the above, make a new "config" file
            * create coverage graphs
            * create junction files

            Use this reference genome for alignment:

            * http://lorainelab-quickload.scidas.org/quickload/A_thaliana_Jun_2009/A_thaliana_Jun_2009.2bit

            Create the "fasta" file from the above 2bit file using blat suite tools on cluster. (I think to load it, you have to use "module load blatsuite" or something like that. Use "module avail" to find the correct module name.)
            This [data set from Arabidopsis thaliana|https://trace.ncbi.nlm.nih.gov/Traces/?view=study&acc=SRP371294] contains six samples of FACS-sorted sperm and vegetative cells from mature pollen. The publication is here: https://pubmed.ncbi.nlm.nih.gov/36515615/

            This would be a useful reference data set for our studies, as the authors reported many differentially and alternatively spliced genes between sperm and vegetative cells harvested from mature Arabidopsis pollen.

            For this task

            * download the data as fastq files from SRA
            * align fastq files using nf-core/rnaseq vs. TAIR10 genome (see link below)
            * for alignment parameters, check the paper:
            * align using same maxIntron parameter reported in the methods section for the paper
            * for the above, make a new "config" file
            * create coverage graphs
            * create junction files

            Use this reference genome for alignment:

            * http://lorainelab-quickload.scidas.org/quickload/A_thaliana_Jun_2009/A_thaliana_Jun_2009.2bit

            Create the "fasta" file from the above 2bit file using blat suite tools on cluster. The program you need is 2bitToFa (I think to load it, you have to use "module load blatsuite" or something like that. Use "module avail" to find the correct module name.)
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Directory: /nobackup/tomato_genome/alt_splicing/SRP371294

            Config parameters from paper that published results(link in description): “–dta –min-intronlen 60 –max-intronlen 6000”

            • Should I add min parameter also?
            • Couldn't find "alignIntronmax" parameters in paper, do I need to find that one?
            params {
                modules {
            	'star_align' {
                        args            = '--alignIntronMax 6000 --quantMode TranscriptomeSAM --twopassMode Basic --outSAMtype BAM Unsorted --readF$
                    }
            	'hisat2_align' {
                        args            = " --max-intronlen 6000 --met-stderr --new-summary --dta"
                    }
                }
            }
            

            GTF file: https://www.arabidopsis.org/download/index-auto.jsp?dir=%2Fdownload_files%2FGenes%2FAraport11_genome_release

            • Can I use this Araport11 genome gtf file? [~aloraine]

            Bed file: Location of bed file?

            See: http://lorainelab-quickload.scidas.org/quickload/A_thaliana_Jun_2009/Araport11.bed.gz.
            Note that this is a "BED-detail" file and has fields 13 and 14. The nfcore/rnaseq pipeline may not accept this. You may need to modify it to remove fields 13 and 14.

            gunzip -c Araport11.bed.gz | cut -f1-12 > Araport11.bed
            

            CSV file: Make sure to check mulitqc report to see if strandedness is correct!

            Show
            Mdavis4290 Molly Davis added a comment - - edited Directory : /nobackup/tomato_genome/alt_splicing/SRP371294 Config parameters from paper that published results(link in description): “–dta –min-intronlen 60 –max-intronlen 6000” Should I add min parameter also? Couldn't find "alignIntronmax" parameters in paper, do I need to find that one? params { modules { 'star_align' { args = '--alignIntronMax 6000 --quantMode TranscriptomeSAM --twopassMode Basic --outSAMtype BAM Unsorted --readF$ } 'hisat2_align' { args = " --max-intronlen 6000 --met-stderr -- new -summary --dta" } } } GTF file : https://www.arabidopsis.org/download/index-auto.jsp?dir=%2Fdownload_files%2FGenes%2FAraport11_genome_release Can I use this Araport11 genome gtf file? [~aloraine] Bed file : Location of bed file? See: http://lorainelab-quickload.scidas.org/quickload/A_thaliana_Jun_2009/Araport11.bed.gz . Note that this is a "BED-detail" file and has fields 13 and 14. The nfcore/rnaseq pipeline may not accept this. You may need to modify it to remove fields 13 and 14. gunzip -c Araport11.bed.gz | cut -f1-12 > Araport11.bed CSV file : Make sure to check mulitqc report to see if strandedness is correct!
            Mdavis4290 Molly Davis made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            ann.loraine Ann Loraine made changes -
            Sprint Spring 6 2023 Mar 20 [ 166 ] Spring 6 2023 Mar 20, Spring 7 2023 Apr 10 [ 166, 167 ]
            ann.loraine Ann Loraine made changes -
            Assignee Molly Davis [ molly ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            ann.loraine Ann Loraine made changes -
            Sprint Spring 5 2023 Mar 6 [ 165 ] Spring 6 2023 Mar 20 [ 166 ]
            ann.loraine Ann Loraine made changes -
            Sprint Spring 4 2023 Feb 21 [ 164 ] Spring 5 2023 Mar 6 [ 165 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            ann.loraine Ann Loraine made changes -
            Description This [data set from Arabidopsis thaliana|https://trace.ncbi.nlm.nih.gov/Traces/?view=study&acc=SRP371294] contains six samples of FACS-sorted sperm and vegetative cells from mature pollen.

            This would be a useful reference data set for our studies, as the authors reported many differentially and alternatively spliced genes between sperm and vegetative cells harvested from mature Arabidopsis pollen.

            For this task

            * download the data as fastq files from SRA
            * align fastq files using nf-core/rnaseq vs. TAIR10 genome (see link below)
            * for alignment parameters, check the paper:
            * align using same maxIntron parameter reported in the methods section for the paper
            * for the above, make a new "config" file
            * create coverage graphs
            * create junction files

            Use this reference genome for alignment:

            * http://lorainelab-quickload.scidas.org/quickload/A_thaliana_Jun_2009/A_thaliana_Jun_2009.2bit

            Create the "fasta" file from the above 2bit file using blat suite tools on cluster. (I think to load it, you have to use "module load blatsuite" or something like that. Use "module avail" to find the correct module name.)
            This [data set from Arabidopsis thaliana|https://trace.ncbi.nlm.nih.gov/Traces/?view=study&acc=SRP371294] contains six samples of FACS-sorted sperm and vegetative cells from mature pollen. The publication is here: https://pubmed.ncbi.nlm.nih.gov/36515615/

            This would be a useful reference data set for our studies, as the authors reported many differentially and alternatively spliced genes between sperm and vegetative cells harvested from mature Arabidopsis pollen.

            For this task

            * download the data as fastq files from SRA
            * align fastq files using nf-core/rnaseq vs. TAIR10 genome (see link below)
            * for alignment parameters, check the paper:
            * align using same maxIntron parameter reported in the methods section for the paper
            * for the above, make a new "config" file
            * create coverage graphs
            * create junction files

            Use this reference genome for alignment:

            * http://lorainelab-quickload.scidas.org/quickload/A_thaliana_Jun_2009/A_thaliana_Jun_2009.2bit

            Create the "fasta" file from the above 2bit file using blat suite tools on cluster. (I think to load it, you have to use "module load blatsuite" or something like that. Use "module avail" to find the correct module name.)
            ann.loraine Ann Loraine made changes -
            Sprint Spring 3 2023 Feb 1 [ 163 ] Spring 4 2023 Feb 13 [ 164 ]
            ann.loraine Ann Loraine made changes -
            Description This [data set from Arabidopsis thaliana|https://trace.ncbi.nlm.nih.gov/Traces/?view=study&acc=SRP371294] contains six samples of FACS-sorted sperm and vegetative cells from mature pollen.

            This would be a useful reference data set for our studies, as the authors reported many differentially and alternatively spliced genes between sperm and vegetative cells harvested from mature Arabidopsis pollen.

            For this task

            * download the data as fastq files from SRA
            * align fastq files using nf-core/rnaseq vs. TAIR10 genome (see link below)
            * for alignment parameters, check the paper:
            * align using same maxIntron parameter reported in the methods section for the paper
            * for the above, make a new "config" file and version-control it using name of
            * create coverage graphs
            * create junction files

            Use this reference genome for alignment:

            * http://lorainelab-quickload.scidas.org/quickload/A_thaliana_Jun_2009/A_thaliana_Jun_2009.2bit

            Create the "fasta" file from the above 2bit file using blat suite tools on cluster. (I think to load it, you have to use "module load blatsuite" or something like that. Use "module avail" to find the correct module name.)
            This [data set from Arabidopsis thaliana|https://trace.ncbi.nlm.nih.gov/Traces/?view=study&acc=SRP371294] contains six samples of FACS-sorted sperm and vegetative cells from mature pollen.

            This would be a useful reference data set for our studies, as the authors reported many differentially and alternatively spliced genes between sperm and vegetative cells harvested from mature Arabidopsis pollen.

            For this task

            * download the data as fastq files from SRA
            * align fastq files using nf-core/rnaseq vs. TAIR10 genome (see link below)
            * for alignment parameters, check the paper:
            * align using same maxIntron parameter reported in the methods section for the paper
            * for the above, make a new "config" file
            * create coverage graphs
            * create junction files

            Use this reference genome for alignment:

            * http://lorainelab-quickload.scidas.org/quickload/A_thaliana_Jun_2009/A_thaliana_Jun_2009.2bit

            Create the "fasta" file from the above 2bit file using blat suite tools on cluster. (I think to load it, you have to use "module load blatsuite" or something like that. Use "module avail" to find the correct module name.)
            ann.loraine Ann Loraine made changes -
            Description This [data set from Arabidopsis thaliana|https://trace.ncbi.nlm.nih.gov/Traces/?view=study&acc=SRP371294] contains six samples of FACS-sorted sperm and vegetative cells from mature pollen.

            This would be a useful reference data set for our studies, as the authors reported many differentially and alternatively spliced genes.

            For this task, download the data as fastq files and align them against the Arabidopsis TAIR10 genome. Create coverage graphs and junction files, as per usual.

            Use this reference genome:

            http://lorainelab-quickload.scidas.org/quickload/A_thaliana_Jun_2009/A_thaliana_Jun_2009.2bit

            This [data set from Arabidopsis thaliana|https://trace.ncbi.nlm.nih.gov/Traces/?view=study&acc=SRP371294] contains six samples of FACS-sorted sperm and vegetative cells from mature pollen.

            This would be a useful reference data set for our studies, as the authors reported many differentially and alternatively spliced genes between sperm and vegetative cells harvested from mature Arabidopsis pollen.

            For this task

            * download the data as fastq files from SRA
            * align fastq files using nf-core/rnaseq vs. TAIR10 genome (see link below)
            * for alignment parameters, check the paper:
            * align using same maxIntron parameter reported in the methods section for the paper
            * for the above, make a new "config" file and version-control it using name of
            * create coverage graphs
            * create junction files

            Use this reference genome for alignment:

            * http://lorainelab-quickload.scidas.org/quickload/A_thaliana_Jun_2009/A_thaliana_Jun_2009.2bit

            Create the "fasta" file from the above 2bit file using blat suite tools on cluster. (I think to load it, you have to use "module load blatsuite" or something like that. Use "module avail" to find the correct module name.)
            ann.loraine Ann Loraine made changes -
            Story Points 2 3
            ann.loraine Ann Loraine made changes -
            Field Original Value New Value
            Epic Link IGBF-2993 [ 21429 ]
            ann.loraine Ann Loraine created issue -

              People

              • Assignee:
                Mdavis4290 Molly Davis
                Reporter:
                ann.loraine Ann Loraine
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: