Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3409

Fix wheat genome files and documentation

    Details

    • Type: Task
    • Status: To-Do (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      This directory:

      http://lorainelab-quickload.scidas.org/quickload/T_aestivum_Aug_2018/

      contains a ".csi" file instead of a ".tbi" file.

      Not clear why. Investigate and fix.

      Also, modify the "annots.xml" to point IGB to a sequence file stored at EBI or wherever it is.

      After doing this, update HEADER.md in the quickload svn repository to no longer advise users to retrieve and deploy the wheat genome file.

        Attachments

          Issue Links

            Activity

            ann.loraine Ann Loraine created issue -
            pkulzer Paige Kulzer made changes -
            Field Original Value New Value
            Assignee Ann Loraine [ aloraine ] Paige Kulzer [ pkulzer ]
            pkulzer Paige Kulzer made changes -
            Epic Link IGBF-1395 [ 17470 ]
            pkulzer Paige Kulzer made changes -
            Sprint Spring 1 [ 210 ]
            pkulzer Paige Kulzer made changes -
            Story Points 0.25 1.25
            pkulzer Paige Kulzer made changes -
            Story Points 1.25 0.25
            pkulzer Paige Kulzer made changes -
            Issue Type Task [ 3 ] Updated Genome [ 11 ]
            pkulzer Paige Kulzer made changes -
            Issue Type Updated Genome [ 11 ] Task [ 3 ]
            pkulzer Paige Kulzer made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            Hide
            pkulzer Paige Kulzer added a comment -

            It appears that the tabix (.tbi) index format can only handle up to a certain chromosome size (512Mbp). For chromosome sizes greater than 512Mbp, the CSI index format should be used. Per this tabix documentation (https://www.htslib.org/doc/tabix.html):

            The tabix (.tbi) and BAI index formats can handle individual chromosomes up to 512 Mbp (2^29 bases) in length. If your input file might contain data lines with begin or end positions greater than that, you will need to use a CSI index.

            The bread wheat genome is larger than 512Mbp and therefore a CSI index was created for this quickload instead of a tabix index.

            Show
            pkulzer Paige Kulzer added a comment - It appears that the tabix (.tbi) index format can only handle up to a certain chromosome size (512Mbp). For chromosome sizes greater than 512Mbp, the CSI index format should be used. Per this tabix documentation ( https://www.htslib.org/doc/tabix.html): The tabix (.tbi) and BAI index formats can handle individual chromosomes up to 512 Mbp (2^29 bases) in length. If your input file might contain data lines with begin or end positions greater than that, you will need to use a CSI index. The bread wheat genome is larger than 512Mbp and therefore a CSI index was created for this quickload instead of a tabix index.
            Hide
            pkulzer Paige Kulzer added a comment - - edited

            I modified the annots.xml file to point to the genome sequence file hosted by ensembl. I also added the label_field attribute such that the id of each gene model is automatically displayed.

            Note: I tested these changes in IGB and found that clicking the Load Sequence button produced an error, saying there was no genome sequence to load. I then retrieved the genome sequence file from ensembl, converted it to 2bit format, and added it to the Triticum aestivum Quickload folder before testing again. Now, there are no errors being thrown after clicking the Load Sequence button.

            Show
            pkulzer Paige Kulzer added a comment - - edited I modified the annots.xml file to point to the genome sequence file hosted by ensembl. I also added the label_field attribute such that the id of each gene model is automatically displayed. Note: I tested these changes in IGB and found that clicking the Load Sequence button produced an error, saying there was no genome sequence to load. I then retrieved the genome sequence file from ensembl, converted it to 2bit format, and added it to the Triticum aestivum Quickload folder before testing again. Now, there are no errors being thrown after clicking the Load Sequence button.
            Hide
            pkulzer Paige Kulzer added a comment -

            I'm not seeing anything in HEADER.md that advise users to retrieve and deploy the wheat genome file - maybe this has already been fixed. Therefore, this Quickload is now ready for testing!

            I've put a zipped copy of the Quickload on Google Drive for review: https://drive.google.com/drive/folders/1bFRx4PqldxNf400n7Vr9SD_dNeNmtpvk?usp=drive_link

            Show
            pkulzer Paige Kulzer added a comment - I'm not seeing anything in HEADER.md that advise users to retrieve and deploy the wheat genome file - maybe this has already been fixed. Therefore, this Quickload is now ready for testing! I've put a zipped copy of the Quickload on Google Drive for review: https://drive.google.com/drive/folders/1bFRx4PqldxNf400n7Vr9SD_dNeNmtpvk?usp=drive_link
            pkulzer Paige Kulzer made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            pkulzer Paige Kulzer made changes -
            Assignee Paige Kulzer [ pkulzer ]
            pkulzer Paige Kulzer made changes -
            Assignee Nowlan Freese [ nfreese ]
            ann.loraine Ann Loraine made changes -
            Sprint Spring 1 [ 210 ] Spring 1, Spring 2 [ 210, 211 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            ann.loraine Ann Loraine made changes -
            Sprint Spring 1, Spring 2 [ 210, 211 ] Spring 1, Spring 2, Spring 3 [ 210, 211, 212 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            nfreese Nowlan Freese made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            Hide
            nfreese Nowlan Freese added a comment -

            Unclear if anything needs to be done for this ticket.

            • See IGBF-2333 for why the CSI index is needed (IGBF-2538 would add support for CSI).
            • 2bit file is found in the QuickLoad, but we did not put it in the SVN repo due to the size (see IGBF-2333).
            • Header.md no longer advises users to retrieve and deploy the wheat genome file.

            I think we can close this ticket.

            Show
            nfreese Nowlan Freese added a comment - Unclear if anything needs to be done for this ticket. See IGBF-2333 for why the CSI index is needed ( IGBF-2538 would add support for CSI). 2bit file is found in the QuickLoad, but we did not put it in the SVN repo due to the size (see IGBF-2333 ). Header.md no longer advises users to retrieve and deploy the wheat genome file. I think we can close this ticket.
            nfreese Nowlan Freese made changes -
            Status First Level Review in Progress [ 10301 ] Needs 1st Level Review [ 10005 ]
            nfreese Nowlan Freese made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            nfreese Nowlan Freese made changes -
            Status First Level Review in Progress [ 10301 ] Ready for Pull Request [ 10304 ]
            nfreese Nowlan Freese made changes -
            Status Ready for Pull Request [ 10304 ] Pull Request Submitted [ 10101 ]
            nfreese Nowlan Freese made changes -
            Status Pull Request Submitted [ 10101 ] Reviewing Pull Request [ 10303 ]
            nfreese Nowlan Freese made changes -
            Status Reviewing Pull Request [ 10303 ] Merged Needs Testing [ 10002 ]
            Hide
            ann.loraine Ann Loraine added a comment -

            To-do:

            • Improve header markdown (quickload) to better document where the files came from
            Show
            ann.loraine Ann Loraine added a comment - To-do: Improve header markdown (quickload) to better document where the files came from
            ann.loraine Ann Loraine made changes -
            Summary Fix wheat genome files Fix wheat genome files and documentation
            ann.loraine Ann Loraine made changes -
            Assignee Nowlan Freese [ nfreese ]
            nfreese Nowlan Freese made changes -
            Status Merged Needs Testing [ 10002 ] Post-merge Testing In Progress [ 10003 ]
            nfreese Nowlan Freese made changes -
            Status Post-merge Testing In Progress [ 10003 ] To-Do [ 10305 ]
            ann.loraine Ann Loraine made changes -
            Link This issue relates to IGBF-2333 [ IGBF-2333 ]
            ann.loraine Ann Loraine made changes -
            Link This issue relates to IGBF-2538 [ IGBF-2538 ]
            ann.loraine Ann Loraine made changes -
            Epic Link IGBF-1395 [ 17470 ] IGBF-1765 [ 17855 ]
            ann.loraine Ann Loraine made changes -
            Sprint Spring 1, Spring 2, Spring 3 [ 210, 211, 212 ] Spring 1, Spring 2 [ 210, 211 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher

              People

              • Assignee:
                Unassigned
                Reporter:
                ann.loraine Ann Loraine
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: