From Ivory:
These genes are all in the same general area: (grouped in sets of nearly identical genes)
FBgn0002781
FBgn0261837
FBgn0261838
FBgn0261839
FBgn0261840
FBgn0261841
FBgn0261842
FBgn0261843
FBgn0261844
FBgn0261845
FBgn0266170
FBgn0266171
FBgn0266172
FBgn0266174
FBgn0266175
FBgn0266176
FBgn0266177
FBgn0266178
FBgn0267648
FBgn0267649
FBgn0267650
FBgn0267651
FBgn0267652
And these are in one area:
FBgn0005630
FBgn0264817
I don’t understand how this gene (FBgn0264817) can have a trans-spiced exon, it only has one exon. Guess its translated in the revers direction.
So the comment is limited to two regions on the genome.
AND, it looks like the strand issue that I noticed with FBgn0002781 (the fact that it had some components listed as being on the + strand and some on the – strand) is unique to that gene. I assumed it was related to the comment, and that other genes with the comment would have the same issue, but for all of the other genes that have this comment, they note all of their components as begin on the same strand. (See "Note 1" below).
Maybe the strand issue is just a fluke? Maybe we should just manually curate this gene in the file and move on. (and see if the edge matching is fixed)
As a test, I downloaded the file and manually changed the FBgn0002781 transcripts that were to ,
The transcripts were: FBtr0084079 , FBtr0084085, FBtr0084084, FBtr0084080, FBtr0084081, FBtr0084082, FBtr0307759, FBtr0307760, FBtr0084083
Just changing the strand to be (–) didn’t solve it.
If I select transcript FBtr0084061, all transcripts match at almost all edges.
If I select transcript FBtr0084083, nearly half of the transcripts do not edge match.
This holds in the original quickload file as well.
Attached is a bed file that has ONLY gene FBgn0002781, taken directly from the igb quickload file, and a gtf for just this gene taken from the fly base file.
Both of these can be viewed in IGB. The gtf file shows all the transcripts on the negative strand.
Strangely, the two small files do not have the edge matching issue. (at least I didn’t notice it using the two transcripts that I noted in the last paragraph), so they may not be as helpful as I had hoped.
$ grep FBgn0002781 dmel-all-r6.03.gtf > FBgn0002781.gtf
$ gunzip -c D_melanogaster_Jul_2014.bed.gz | grep FBgn0002781 > FBgn0002781.bed
*Note 1:
for all of the other genes that have this comment, they note all of their components as begin on the same strand
$ for GENE in $(grep "SO:0000459:gene_with_trans_spliced_transcript" dmel-all-r6.03.gtf | grep -o FBgn[0-9]* | uniq); do echo $GENE; grep $GENE dmel-all-r6.03.gtf | cut -f7 | sort | uniq | wc -l; done
FBgn0002781
2
FBgn0005630
1
FBgn0261837
1
FBgn0261838
1
FBgn0261839
1
FBgn0261840
1
FBgn0261841
1
FBgn0261842
1
FBgn0261843
1
FBgn0261844
1
FBgn0261845
1
FBgn0264817
1
FBgn0266170
1
FBgn0266171
1
FBgn0266172
1
FBgn0266173
1
FBgn0266174
1
FBgn0266175
1
FBgn0266176
1
FBgn0266177
1
FBgn0266178
1
FBgn0267648
1
FBgn0267649
1
FBgn0267650
1
FBgn0267651
1
FBgn0267652
1
From Ivory:
These genes are all in the same general area: (grouped in sets of nearly identical genes)
FBgn0002781
FBgn0261837
FBgn0261838
FBgn0261839
FBgn0261840
FBgn0261841
FBgn0261842
FBgn0261843
FBgn0261844
FBgn0261845
FBgn0266170
FBgn0266171
FBgn0266172
FBgn0266174
FBgn0266175
FBgn0266176
FBgn0266177
FBgn0266178
FBgn0267648
FBgn0267649
FBgn0267650
FBgn0267651
FBgn0267652
And these are in one area:
FBgn0005630
FBgn0264817
I don’t understand how this gene (FBgn0264817) can have a trans-spiced exon, it only has one exon. Guess its translated in the revers direction.
So the comment is limited to two regions on the genome.
AND, it looks like the strand issue that I noticed with FBgn0002781 (the fact that it had some components listed as being on the + strand and some on the – strand) is unique to that gene. I assumed it was related to the comment, and that other genes with the comment would have the same issue, but for all of the other genes that have this comment, they note all of their components as begin on the same strand. (See "Note 1" below).
Maybe the strand issue is just a fluke? Maybe we should just manually curate this gene in the file and move on. (and see if the edge matching is fixed)
As a test, I downloaded the file and manually changed the FBgn0002781 transcripts that were to ,
The transcripts were: FBtr0084079 , FBtr0084085, FBtr0084084, FBtr0084080, FBtr0084081, FBtr0084082, FBtr0307759, FBtr0307760, FBtr0084083
Just changing the strand to be (–) didn’t solve it.
If I select transcript FBtr0084061, all transcripts match at almost all edges.
If I select transcript FBtr0084083, nearly half of the transcripts do not edge match.
This holds in the original quickload file as well.
Attached is a bed file that has ONLY gene FBgn0002781, taken directly from the igb quickload file, and a gtf for just this gene taken from the fly base file.
Both of these can be viewed in IGB. The gtf file shows all the transcripts on the negative strand.
Strangely, the two small files do not have the edge matching issue. (at least I didn’t notice it using the two transcripts that I noted in the last paragraph), so they may not be as helpful as I had hoped.
$ grep FBgn0002781 dmel-all-r6.03.gtf > FBgn0002781.gtf
$ gunzip -c D_melanogaster_Jul_2014.bed.gz | grep FBgn0002781 > FBgn0002781.bed
*Note 1:
for all of the other genes that have this comment, they note all of their components as begin on the same strand
$ for GENE in $(grep "SO:0000459:gene_with_trans_spliced_transcript" dmel-all-r6.03.gtf | grep -o FBgn[0-9]* | uniq); do echo $GENE; grep $GENE dmel-all-r6.03.gtf | cut -f7 | sort | uniq | wc -l; done
FBgn0002781
2
FBgn0005630
1
FBgn0261837
1
FBgn0261838
1
FBgn0261839
1
FBgn0261840
1
FBgn0261841
1
FBgn0261842
1
FBgn0261843
1
FBgn0261844
1
FBgn0261845
1
FBgn0264817
1
FBgn0266170
1
FBgn0266171
1
FBgn0266172
1
FBgn0266173
1
FBgn0266174
1
FBgn0266175
1
FBgn0266176
1
FBgn0266177
1
FBgn0266178
1
FBgn0267648
1
FBgn0267649
1
FBgn0267650
1
FBgn0267651
1
FBgn0267652
1