Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-2333

Add wheat genome to IGB quickload system

    Details

    • Story Points:
      2
    • Sprint:
      Spring 7 : 13 Apr to 24 Apr, Spring 8 : 24 Apr to 8 May, Spring 8 : 11 May to 25 May, Spring 9 : 25 May to 8 Jun, Summer 1: 8 Jun - 19 Jun, Summer 2: 22 Jun - 3 Jul, Summer 3: 6 Jul - 17 Jul, Summer 4: 14 Jul - 28 Jul, Summer 5: 3 Aug - 14 Aug, Summer 6: 17 Aug - 28 Aug, Summer 7: 31 Aug - 11 Sep, Fall 1: 14 Sep - 25 Sep, Fall 2: 28 Sep - 9 Oct

      Description

      We received a request to wheat genome to IGB Quickload.

        Attachments

          Issue Links

            Activity

            Hide
            ann.loraine Ann Loraine added a comment -
            Show
            ann.loraine Ann Loraine added a comment - Bread wheat genome Int Wheat Genome Sequencing Consortium https://www.wheatgenome.org/News/Latest-news/All-IWGSC-data-related-to-the-reference-sequence-of-bread-wheat-IWGSC-RefSeq-v1.0-publicly-available-at-URGI https://plants.ensembl.org/Triticum_aestivum/Info/Index (These links are all for the RefSeq v1.0 version, which is most widely accepted and used today in the community) Only last week released T4 genome (but based on the IWGSC work): https://www.ncbi.nlm.nih.gov/bioproject/PRJNA392179 https://www.biorxiv.org/content/10.1101/2020.04.06.028746v1.full.pdf Durum wheat genome https://www.interomics.eu/durum-wheat-genome http://plants.ensembl.org/Triticum_turgidum/Info/Index wheat training (Small summary from our group about the wheat genome assemblies) http://www.wheat-training.com/wp-content/uploads/Genomic_resources/pdfs/Genome_assemblies.pdf
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Downloaded bread wheat genome from Ensembl. Fixed headers with:

            sed 's| dna_sm:chromosome chromosome:IWGSC:..:1:.*:1 REF||
            

            IGB name for bread wheat genome version 1 is T_aestivum_Aug_2018.
            Looked for wheat in https://usegalaxy.org/api/genomes (galaxy-supported genomes) but did not find it.
            A new version 2 is coming soon but not yet publicly available for redistribution in IGB Quickload.
            Converted fasta file is 2bit file of size is 3.6 Gb.
            Converted from: Triticum_aestivum.IWGSC.dna_sm.toplevel.fa, downloaded from ftp://ftp.ensemblgenomes.org/pub/plants/release-47/fasta/triticum_aestivum/dna/Triticum_aestivum.IWGSC.dna_sm.toplevel.fa.gz.
            Saved 2bit file in /nobackup at UNCC cluster.
            Wrote new code for parsing ensembl gff.

            Show
            ann.loraine Ann Loraine added a comment - - edited Downloaded bread wheat genome from Ensembl. Fixed headers with: sed 's| dna_sm:chromosome chromosome:IWGSC:..:1:.*:1 REF|| IGB name for bread wheat genome version 1 is T_aestivum_Aug_2018. Looked for wheat in https://usegalaxy.org/api/genomes (galaxy-supported genomes) but did not find it. A new version 2 is coming soon but not yet publicly available for redistribution in IGB Quickload. Converted fasta file is 2bit file of size is 3.6 Gb. Converted from: Triticum_aestivum.IWGSC.dna_sm.toplevel.fa, downloaded from ftp://ftp.ensemblgenomes.org/pub/plants/release-47/fasta/triticum_aestivum/dna/Triticum_aestivum.IWGSC.dna_sm.toplevel.fa.gz . Saved 2bit file in /nobackup at UNCC cluster. Wrote new code for parsing ensembl gff.
            Hide
            ann.loraine Ann Loraine added a comment -

            Bread wheat genome is too large for .bai and .tbi indexes.
            Need to support .csi indexes.

            Show
            ann.loraine Ann Loraine added a comment - Bread wheat genome is too large for .bai and .tbi indexes. Need to support .csi indexes.
            Hide
            ann.loraine Ann Loraine added a comment -

            Added csi-index and bgzip sorted BED file to quickload. Did not add genome 2bit to the svn repo because it is too big - 3.9 Gb.
            igbquickload.org host is down. Updated sci-das server and ec2 server.

            Show
            ann.loraine Ann Loraine added a comment - Added csi-index and bgzip sorted BED file to quickload. Did not add genome 2bit to the svn repo because it is too big - 3.9 Gb. igbquickload.org host is down. Updated sci-das server and ec2 server.
            Hide
            ann.loraine Ann Loraine added a comment -

            Attached image shows newly added wheat genome.

            Show
            ann.loraine Ann Loraine added a comment - Attached image shows newly added wheat genome.
            Hide
            nfreese Nowlan Freese added a comment -

            Tested on mac with IGB 9.1.6 (Sept 24 master).

            When I load the Triticum aestivum genome in IGB the gene annotations are loading for some chromosomes but not others. If I uncheck the IGB Quickload Gene models and then re-select it and change Load Mode to Genome all of the chromosome gene models load correctly.

            This may have to do with the annotations not being tabix indexed, due to the large size of the wheat chromosomes.

            Show
            nfreese Nowlan Freese added a comment - Tested on mac with IGB 9.1.6 (Sept 24 master). When I load the Triticum aestivum genome in IGB the gene annotations are loading for some chromosomes but not others. If I uncheck the IGB Quickload Gene models and then re-select it and change Load Mode to Genome all of the chromosome gene models load correctly. This may have to do with the annotations not being tabix indexed, due to the large size of the wheat chromosomes.

              People

              • Assignee:
                ann.loraine Ann Loraine
                Reporter:
                ann.loraine Ann Loraine
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: