Uploaded image for project: 'Deep Backlog'
  1. Deep Backlog
  2. DB-130

Edge-matching is being randomly selective (for Fruit Fly Gene Model)

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Labels:
      None

      Description

      *Version noticed: 8.2.0

      THIS ISSUE WAS ONLY NOTICED FOR 1 GENE ONLY SO FAR!
      (but still, what is it about this gene that causes the edge-matching to display inappropriately?)

      *From Ivory:
      IGB is being randomly selective about what edges to highlight.
      See attached images EdgeMapA and B. Depending on which feature is selected, the edge mapping indicates that only some of the exons shown are the same (A) or that all exons shown are the same (B).

      To see more instances of inconsistent edge matching in a case where you would really want to be able to use edge mapping, see this region: 3R:21,350,072-21,378,314
      In the latest fruit fly genome (Drosophila melanogaster).

        Attachments

        1. EdgeMapA.png
          72 kB
          Mason Meyer
        2. EdgeMapB.png
          72 kB
          Mason Meyer

          Issue Links

            Activity

            mason Mason Meyer (Inactive) created issue -
            mason Mason Meyer (Inactive) made changes -
            Field Original Value New Value
            Summary Edge-matching is being randomly selective Edge-matching is being randomly selective
            mason Mason Meyer (Inactive) made changes -
            Description *Version noticed: 8.2.0

            THIS ISSUE WAS ONLY NOTICED FOR 1 GENE!
            (but still, what is it about this gene that causes the edge-matching to display inappropriately?)

            *From Ivory:
            IGB is being randomly selective about what edges to highlight.
            See attached images EdgeMapA and B. Depending on which feature is selected, the edge mapping indicates that only some of the exons shown are the same (A) or that all exons shown are the same (B).

            To see more instances of inconsistent edge matching in a case where you would really want to be able to use edge mapping, see this region: 3R:21,350,072-21,378,314
            In the latest fruit fly genome (Drosophila melanogaster).

            *Version noticed: 8.2.0

            THIS ISSUE WAS ONLY NOTICED FOR 1 GENE ONLY SO FAR!
            (but still, what is it about this gene that causes the edge-matching to display inappropriately?)

            *From Ivory:
            IGB is being randomly selective about what edges to highlight.
            See attached images EdgeMapA and B. Depending on which feature is selected, the edge mapping indicates that only some of the exons shown are the same (A) or that all exons shown are the same (B).

            To see more instances of inconsistent edge matching in a case where you would really want to be able to use edge mapping, see this region: 3R:21,350,072-21,378,314
            In the latest fruit fly genome (Drosophila melanogaster).

            mason Mason Meyer (Inactive) made changes -
            Summary Edge-matching is being randomly selective Edge-matching is being randomly selective (for Fruit Fly Gene Model)
            mason Mason Meyer (Inactive) made changes -
            Link This issue relates to IGBF-259 [ IGBF-259 ]
            mason Mason Meyer (Inactive) made changes -
            Resolution Unresolved [ 10101 ]
            Hide
            mason Mason Meyer (Inactive) added a comment - - edited

            From Ivory:

            These genes are all in the same general area: (grouped in sets of nearly identical genes)
            FBgn0002781
            FBgn0261837
            FBgn0261838
            FBgn0261839
            FBgn0261840
            FBgn0261841
            FBgn0261842
            FBgn0261843
            FBgn0261844
            FBgn0261845

            FBgn0266170
            FBgn0266171
            FBgn0266172

            FBgn0266174
            FBgn0266175
            FBgn0266176
            FBgn0266177
            FBgn0266178

            FBgn0267648
            FBgn0267649

            FBgn0267650
            FBgn0267651
            FBgn0267652

            And these are in one area:
            FBgn0005630
            FBgn0264817
            I don’t understand how this gene (FBgn0264817) can have a trans-spiced exon, it only has one exon. Guess its translated in the revers direction.

            So the comment is limited to two regions on the genome.

            AND, it looks like the strand issue that I noticed with FBgn0002781 (the fact that it had some components listed as being on the + strand and some on the – strand) is unique to that gene. I assumed it was related to the comment, and that other genes with the comment would have the same issue, but for all of the other genes that have this comment, they note all of their components as begin on the same strand. (See "Note 1" below).

            Maybe the strand issue is just a fluke? Maybe we should just manually curate this gene in the file and move on. (and see if the edge matching is fixed)

            As a test, I downloaded the file and manually changed the FBgn0002781 transcripts that were to ,
            The transcripts were: FBtr0084079 , FBtr0084085, FBtr0084084, FBtr0084080, FBtr0084081, FBtr0084082, FBtr0307759, FBtr0307760, FBtr0084083
            Just changing the strand to be (–) didn’t solve it.
            If I select transcript FBtr0084061, all transcripts match at almost all edges.
            If I select transcript FBtr0084083, nearly half of the transcripts do not edge match.
            This holds in the original quickload file as well.

            Attached is a bed file that has ONLY gene FBgn0002781, taken directly from the igb quickload file, and a gtf for just this gene taken from the fly base file.
            Both of these can be viewed in IGB. The gtf file shows all the transcripts on the negative strand.
            Strangely, the two small files do not have the edge matching issue. (at least I didn’t notice it using the two transcripts that I noted in the last paragraph), so they may not be as helpful as I had hoped.

            $ grep FBgn0002781 dmel-all-r6.03.gtf > FBgn0002781.gtf
            $ gunzip -c D_melanogaster_Jul_2014.bed.gz | grep FBgn0002781 > FBgn0002781.bed

            *Note 1:

            for all of the other genes that have this comment, they note all of their components as begin on the same strand
            $ for GENE in $(grep "SO:0000459:gene_with_trans_spliced_transcript" dmel-all-r6.03.gtf | grep -o FBgn[0-9]* | uniq); do echo $GENE; grep $GENE dmel-all-r6.03.gtf | cut -f7 | sort | uniq | wc -l; done
            FBgn0002781
            2
            FBgn0005630
            1
            FBgn0261837
            1
            FBgn0261838
            1
            FBgn0261839
            1
            FBgn0261840
            1
            FBgn0261841
            1
            FBgn0261842
            1
            FBgn0261843
            1
            FBgn0261844
            1
            FBgn0261845
            1
            FBgn0264817
            1
            FBgn0266170
            1
            FBgn0266171
            1
            FBgn0266172
            1
            FBgn0266173
            1
            FBgn0266174
            1
            FBgn0266175
            1
            FBgn0266176
            1
            FBgn0266177
            1
            FBgn0266178
            1
            FBgn0267648
            1
            FBgn0267649
            1
            FBgn0267650
            1
            FBgn0267651
            1
            FBgn0267652
            1

            Show
            mason Mason Meyer (Inactive) added a comment - - edited From Ivory: These genes are all in the same general area: (grouped in sets of nearly identical genes) FBgn0002781 FBgn0261837 FBgn0261838 FBgn0261839 FBgn0261840 FBgn0261841 FBgn0261842 FBgn0261843 FBgn0261844 FBgn0261845 FBgn0266170 FBgn0266171 FBgn0266172 FBgn0266174 FBgn0266175 FBgn0266176 FBgn0266177 FBgn0266178 FBgn0267648 FBgn0267649 FBgn0267650 FBgn0267651 FBgn0267652 And these are in one area: FBgn0005630 FBgn0264817 I don’t understand how this gene (FBgn0264817) can have a trans-spiced exon, it only has one exon. Guess its translated in the revers direction. So the comment is limited to two regions on the genome. AND, it looks like the strand issue that I noticed with FBgn0002781 (the fact that it had some components listed as being on the + strand and some on the – strand) is unique to that gene. I assumed it was related to the comment, and that other genes with the comment would have the same issue, but for all of the other genes that have this comment, they note all of their components as begin on the same strand. (See "Note 1" below). Maybe the strand issue is just a fluke? Maybe we should just manually curate this gene in the file and move on. (and see if the edge matching is fixed) As a test, I downloaded the file and manually changed the FBgn0002781 transcripts that were to , The transcripts were: FBtr0084079 , FBtr0084085, FBtr0084084, FBtr0084080, FBtr0084081, FBtr0084082, FBtr0307759, FBtr0307760, FBtr0084083 Just changing the strand to be (–) didn’t solve it. If I select transcript FBtr0084061, all transcripts match at almost all edges. If I select transcript FBtr0084083, nearly half of the transcripts do not edge match. This holds in the original quickload file as well. Attached is a bed file that has ONLY gene FBgn0002781, taken directly from the igb quickload file, and a gtf for just this gene taken from the fly base file. Both of these can be viewed in IGB. The gtf file shows all the transcripts on the negative strand. Strangely, the two small files do not have the edge matching issue. (at least I didn’t notice it using the two transcripts that I noted in the last paragraph), so they may not be as helpful as I had hoped. $ grep FBgn0002781 dmel-all-r6.03.gtf > FBgn0002781.gtf $ gunzip -c D_melanogaster_Jul_2014.bed.gz | grep FBgn0002781 > FBgn0002781.bed *Note 1: for all of the other genes that have this comment, they note all of their components as begin on the same strand $ for GENE in $(grep "SO:0000459:gene_with_trans_spliced_transcript" dmel-all-r6.03.gtf | grep -o FBgn [0-9] * | uniq); do echo $GENE; grep $GENE dmel-all-r6.03.gtf | cut -f7 | sort | uniq | wc -l; done FBgn0002781 2 FBgn0005630 1 FBgn0261837 1 FBgn0261838 1 FBgn0261839 1 FBgn0261840 1 FBgn0261841 1 FBgn0261842 1 FBgn0261843 1 FBgn0261844 1 FBgn0261845 1 FBgn0264817 1 FBgn0266170 1 FBgn0266171 1 FBgn0266172 1 FBgn0266173 1 FBgn0266174 1 FBgn0266175 1 FBgn0266176 1 FBgn0266177 1 FBgn0266178 1 FBgn0267648 1 FBgn0267649 1 FBgn0267650 1 FBgn0267651 1 FBgn0267652 1
            mason Mason Meyer (Inactive) made changes -
            Link This issue relates to IGBF-291 [ IGBF-291 ]
            mason Mason Meyer (Inactive) made changes -
            Epic Link IGBF-497 [ 15559 ]
            mason Mason Meyer (Inactive) made changes -
            Rank Ranked lower
            mason Mason Meyer (Inactive) made changes -
            Rank Ranked higher
            mason Mason Meyer (Inactive) made changes -
            Rank Ranked higher
            mason Mason Meyer (Inactive) made changes -
            Resolution Unresolved [ 10101 ]
            Status Open [ 1 ] Open [ 1 ]
            mason Mason Meyer (Inactive) made changes -
            Project IGB [ 10840 ] Deep Backlog [ 11041 ]
            Key IGBF-278 DB-130
            Workflow Loraine Lab Workflow [ 15504 ] jira [ 16837 ]

              People

              • Assignee:
                dcnorris David Norris (Inactive)
                Reporter:
                mason Mason Meyer (Inactive)
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated: