Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-2696

Add blacklist regions to IGB quickload for mouse, human, fruit fly

    Details

    • Story Points:
      1.5
    • Sprint:
      Fall 7 Dec 14 - Dec 23, Winter 1 Dec 28 - Jan 8, Winter 2 Jan 11 - Jan 22, Winter 3 Jan 25 - Feb 5, Winter 4 Feb 8 - Feb 19, Winter 5 Feb 22 - Mar 5, Spring 6 2021 May 31 - June 11, Summer 1 2021 Jun 14 - Jun 25, Summer 2 2021 Jun 28 - Jul 9, Summer 3 2021 Jul 12 - Jul 23, Summer 4 2021 Aug 2 - Aug 13, Fall 1 2021 Aug 16 - Aug 27, Fall 2 2021 Aug 30 - Sep10, Fall 3 2021 Sep 13 - Sep 24

      Description

      "blacklist" genes are genes or regions in a genome that give bizarre results in sequence-based assays.

      Let's include these in IGB.

      This github repo directory contains bed files with blacklisted regions:

      References:

      Visualizations:

        Attachments

          Issue Links

            Activity

            Hide
            ann.loraine Ann Loraine added a comment -

            To show these in IGB:

            • Create files that IGB can read (e.g., BED4 format)

            The regions are probably simple spans with chromosome name, start, and end position, and maybe a strand indicator. When users select and load data from the file with blacklist regions, they should see the name of the region displayed.

            Show
            ann.loraine Ann Loraine added a comment - To show these in IGB: Create files that IGB can read (e.g., BED4 format) The regions are probably simple spans with chromosome name, start, and end position, and maybe a strand indicator. When users select and load data from the file with blacklist regions, they should see the name of the region displayed.
            Hide
            ann.loraine Ann Loraine added a comment -

            Please use genome_src repository to place code used for this.

            Show
            ann.loraine Ann Loraine added a comment - Please use genome_src repository to place code used for this.
            Hide
            nfreese Nowlan Freese added a comment -

            [~aloraine] - do we want to store the files in IGB Quickload?

            Show
            nfreese Nowlan Freese added a comment - [~aloraine] - do we want to store the files in IGB Quickload?
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Replying to previous comment: Yes, alongside gene annotations. When you commit to svn, please include ticket number in the commit message (for future reference).

            Show
            ann.loraine Ann Loraine added a comment - - edited Replying to previous comment: Yes, alongside gene annotations. When you commit to svn, please include ticket number in the commit message (for future reference).
            Hide
            nfreese Nowlan Freese added a comment -

            I have been unable to identify specific patch versions for the blacklist genomes. There is a file for each genome with metadata, but it only lists the assembly without the patch information. From reading this description on patches, they do not affect chromosome coordinates, so the patch version should not have any effect on the blacklist regions.

            Show
            nfreese Nowlan Freese added a comment - I have been unable to identify specific patch versions for the blacklist genomes. There is a file for each genome with metadata , but it only lists the assembly without the patch information. From reading this description on patches, they do not affect chromosome coordinates, so the patch version should not have any effect on the blacklist regions.
            Hide
            nfreese Nowlan Freese added a comment -

            Heads up in case you are on Mac and update to Big Sur, SVN is not officially supported but can be installed through HomeBrew.

            https://developer.apple.com/forums/thread/666689

            Show
            nfreese Nowlan Freese added a comment - Heads up in case you are on Mac and update to Big Sur, SVN is not officially supported but can be installed through HomeBrew. https://developer.apple.com/forums/thread/666689
            Hide
            nfreese Nowlan Freese added a comment -

            [~aloraine] - I think I found a discrepancy in the naming of one of the Drosophila genomes.

            UCSC lists the Drosophila Aug. 2014 assembly as dm6.
            igbquickload.org/quickload has D_melanogaster_Jul_2014 as a genome version.
            Synonyms.txt lists D_melanogaster_Aug_2014 as being the dm6 assembly.

            Looking at the flybase site, it is unclear precisely what date is correct. NCBI lists the same date of release (August 2014) as UCSC for dm6. NCBI states that "The gene annotation is based on Release 6.32 provided by FlyBase." The file in igbquickload.org is 6.03.

            Any thoughts on this would be appreciated. I would assume that various versions of dm6 contain the same chromosome coordinates, so the blacklist should be applicable to any version of dm6. I could add the blacklist file to D_melanogaster_Jul_2014. We may want to then update the synonyms.txt to point to D_melanogaster_Jul_2014 for dm6, so that the UCSC data can also be loaded.

            Show
            nfreese Nowlan Freese added a comment - [~aloraine] - I think I found a discrepancy in the naming of one of the Drosophila genomes. UCSC lists the Drosophila Aug. 2014 assembly as dm6. igbquickload.org/quickload has D_melanogaster_Jul_2014 as a genome version. Synonyms.txt lists D_melanogaster_Aug_2014 as being the dm6 assembly. Looking at the flybase site, it is unclear precisely what date is correct. NCBI lists the same date of release (August 2014) as UCSC for dm6. NCBI states that "The gene annotation is based on Release 6.32 provided by FlyBase." The file in igbquickload.org is 6.03. Any thoughts on this would be appreciated. I would assume that various versions of dm6 contain the same chromosome coordinates, so the blacklist should be applicable to any version of dm6. I could add the blacklist file to D_melanogaster_Jul_2014. We may want to then update the synonyms.txt to point to D_melanogaster_Jul_2014 for dm6, so that the UCSC data can also be loaded.
            Hide
            nfreese Nowlan Freese added a comment -

            After reviewing the dm6 issue with Dr. Loraine, I have created a new ticket (IGBF-2952) to address the issue.

            Wait to push the dm6 blacklist until IGBF-2952 is complete.

            Show
            nfreese Nowlan Freese added a comment - After reviewing the dm6 issue with Dr. Loraine, I have created a new ticket ( IGBF-2952 ) to address the issue. Wait to push the dm6 blacklist until IGBF-2952 is complete.
            Hide
            nfreese Nowlan Freese added a comment -

            All of the blacklist files have been pushed to the SVN repository, with the exception of dm6.

            I have created IGBF-2955 to address pushing the changes to the SciDas and Quickload servers.

            Show
            nfreese Nowlan Freese added a comment - All of the blacklist files have been pushed to the SVN repository, with the exception of dm6. I have created IGBF-2955 to address pushing the changes to the SciDas and Quickload servers.
            Hide
            ann.loraine Ann Loraine added a comment -

            Please don't test until sites are updated and deployed. (Sorry - I moved this forward to "ready for testing" prematurely.)

            Show
            ann.loraine Ann Loraine added a comment - Please don't test until sites are updated and deployed. (Sorry - I moved this forward to "ready for testing" prematurely.)
            Hide
            ann.loraine Ann Loraine added a comment -

            Now it is ready for testing.
            attn: Nowlan Freese

            Show
            ann.loraine Ann Loraine added a comment - Now it is ready for testing. attn: Nowlan Freese
            Hide
            nfreese Nowlan Freese added a comment -

            Discrepancy - While testing I noticed that in the C_elegans_oct_2010 (ce10) genome the blacklist region was outside of the range of some of the chromosomes (i.e. the blacklist regions were greater than the max chromosome size). The genome length in IGB (C_elegans_oct_2010) is correct when compared to UCSC. I also re-downloaded the ce10 blacklist file. It appears that the locations of some of the blacklist regions are incorrect or mis-annotated, indicating an issue with the pipeline that the blacklist authors used.

            Show
            nfreese Nowlan Freese added a comment - Discrepancy - While testing I noticed that in the C_elegans_oct_2010 (ce10) genome the blacklist region was outside of the range of some of the chromosomes (i.e. the blacklist regions were greater than the max chromosome size). The genome length in IGB (C_elegans_oct_2010) is correct when compared to UCSC. I also re-downloaded the ce10 blacklist file. It appears that the locations of some of the blacklist regions are incorrect or mis-annotated, indicating an issue with the pipeline that the blacklist authors used.
            Hide
            nfreese Nowlan Freese added a comment -

            Testing complete, no issues.

            Closing ticket.

            Show
            nfreese Nowlan Freese added a comment - Testing complete, no issues. Closing ticket.

              People

              • Assignee:
                nfreese Nowlan Freese
                Reporter:
                ann.loraine Ann Loraine
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: