Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3409

Fix wheat genome files and documentation

    Details

    • Type: Task
    • Status: To-Do (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      This directory:

      http://lorainelab-quickload.scidas.org/quickload/T_aestivum_Aug_2018/

      contains a ".csi" file instead of a ".tbi" file.

      Not clear why. Investigate and fix.

      Also, modify the "annots.xml" to point IGB to a sequence file stored at EBI or wherever it is.

      After doing this, update HEADER.md in the quickload svn repository to no longer advise users to retrieve and deploy the wheat genome file.

        Attachments

          Issue Links

            Activity

            Hide
            pkulzer Paige Kulzer added a comment -

            It appears that the tabix (.tbi) index format can only handle up to a certain chromosome size (512Mbp). For chromosome sizes greater than 512Mbp, the CSI index format should be used. Per this tabix documentation (https://www.htslib.org/doc/tabix.html):

            The tabix (.tbi) and BAI index formats can handle individual chromosomes up to 512 Mbp (2^29 bases) in length. If your input file might contain data lines with begin or end positions greater than that, you will need to use a CSI index.

            The bread wheat genome is larger than 512Mbp and therefore a CSI index was created for this quickload instead of a tabix index.

            Show
            pkulzer Paige Kulzer added a comment - It appears that the tabix (.tbi) index format can only handle up to a certain chromosome size (512Mbp). For chromosome sizes greater than 512Mbp, the CSI index format should be used. Per this tabix documentation ( https://www.htslib.org/doc/tabix.html): The tabix (.tbi) and BAI index formats can handle individual chromosomes up to 512 Mbp (2^29 bases) in length. If your input file might contain data lines with begin or end positions greater than that, you will need to use a CSI index. The bread wheat genome is larger than 512Mbp and therefore a CSI index was created for this quickload instead of a tabix index.
            Hide
            pkulzer Paige Kulzer added a comment - - edited

            I modified the annots.xml file to point to the genome sequence file hosted by ensembl. I also added the label_field attribute such that the id of each gene model is automatically displayed.

            Note: I tested these changes in IGB and found that clicking the Load Sequence button produced an error, saying there was no genome sequence to load. I then retrieved the genome sequence file from ensembl, converted it to 2bit format, and added it to the Triticum aestivum Quickload folder before testing again. Now, there are no errors being thrown after clicking the Load Sequence button.

            Show
            pkulzer Paige Kulzer added a comment - - edited I modified the annots.xml file to point to the genome sequence file hosted by ensembl. I also added the label_field attribute such that the id of each gene model is automatically displayed. Note: I tested these changes in IGB and found that clicking the Load Sequence button produced an error, saying there was no genome sequence to load. I then retrieved the genome sequence file from ensembl, converted it to 2bit format, and added it to the Triticum aestivum Quickload folder before testing again. Now, there are no errors being thrown after clicking the Load Sequence button.
            Hide
            pkulzer Paige Kulzer added a comment -

            I'm not seeing anything in HEADER.md that advise users to retrieve and deploy the wheat genome file - maybe this has already been fixed. Therefore, this Quickload is now ready for testing!

            I've put a zipped copy of the Quickload on Google Drive for review: https://drive.google.com/drive/folders/1bFRx4PqldxNf400n7Vr9SD_dNeNmtpvk?usp=drive_link

            Show
            pkulzer Paige Kulzer added a comment - I'm not seeing anything in HEADER.md that advise users to retrieve and deploy the wheat genome file - maybe this has already been fixed. Therefore, this Quickload is now ready for testing! I've put a zipped copy of the Quickload on Google Drive for review: https://drive.google.com/drive/folders/1bFRx4PqldxNf400n7Vr9SD_dNeNmtpvk?usp=drive_link
            Hide
            nfreese Nowlan Freese added a comment -

            Unclear if anything needs to be done for this ticket.

            • See IGBF-2333 for why the CSI index is needed (IGBF-2538 would add support for CSI).
            • 2bit file is found in the QuickLoad, but we did not put it in the SVN repo due to the size (see IGBF-2333).
            • Header.md no longer advises users to retrieve and deploy the wheat genome file.

            I think we can close this ticket.

            Show
            nfreese Nowlan Freese added a comment - Unclear if anything needs to be done for this ticket. See IGBF-2333 for why the CSI index is needed ( IGBF-2538 would add support for CSI). 2bit file is found in the QuickLoad, but we did not put it in the SVN repo due to the size (see IGBF-2333 ). Header.md no longer advises users to retrieve and deploy the wheat genome file. I think we can close this ticket.
            Hide
            ann.loraine Ann Loraine added a comment -

            To-do:

            • Improve header markdown (quickload) to better document where the files came from
            Show
            ann.loraine Ann Loraine added a comment - To-do: Improve header markdown (quickload) to better document where the files came from

              People

              • Assignee:
                Unassigned
                Reporter:
                ann.loraine Ann Loraine
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: