[IGBF-3927] Investigate: GFF parser error - JIRA UNCC

Paige Kulzer (Inactive) created issue - 01/Oct/24 3:53 PM

Paige Kulzer (Inactive) made changes - 01/Oct/24 3:53 PM

Field	Original Value	New Value
Epic Link		IGBF-1765 [ 17855 ]

Paige Kulzer (Inactive) made changes - 01/Oct/24 3:53 PM

Link

This issue relates to ~~IGBF-3924~~ [ ~~IGBF-3924~~ ]

Nowlan Freese made changes - 02/Oct/24 9:40 AM

Sprint

Fall 4 [ 205 ]

Paige Kulzer (Inactive) made changes - 15/Oct/24 10:17 AM

Description

*Situation:* The GFF3 file format may include a Sequence Section in FASTA format at the end of the file, but IGB is not currently able to parse a GFF3 file when this section is present. See ~~IGBF-3924~~ for a report on the resulting error when a GFF3 file with this section is loaded into IGB.

Here's some more info from the GFF3 documentation (https://gmod.org/wiki/GFF3):

{quote}*GFF3 Sequence Section*
GFF3 files can also include sequence in FASTA format at the end of the file. The FASTA sequences are preceded by a ##FASTA line. This sequence section is optional. If present, the sequence section can define sequence for any landmark used in column 1 (the frame of reference). For example:

{noformat}
##gff-version 3
ctg123 . exon 1300 1500 . + . ID=exon00001
ctg123 . exon 1050 1500 . + . ID=exon00002
ctg123 . exon 3000 3902 . + . ID=exon00003
ctg123 . exon 5000 5500 . + . ID=exon00004
ctg123 . exon 7000 9000 . + . ID=exon00005
##FASTA
>ctg123
cttctgggcgtacccgattctcggagaacttgccgcaccattccgccttg
tgttcattgctgcctgcatgttcattgtctacctcggctacgtgtggcta
tctttcctcggtgccctcgtgcacggagtcgagaaaccaaagaacaaaaa
aagaaattaaaatatttattttgctgtggtttttgatgtgtgttttttat
aatgatttttgatgtgaccaattgtacttttcctttaaatgaaatgtaat
cttaaatgtatttccgacgaattcgaggcctgaaaagtgtgacgccattc
...
{noformat}

When the GFF3 file is processed the IDs on the header line of FASTA entries are matched with IDs used in column 1 in the annotation section of the file.

You don’t have to store the FASTA in the GFF file. You can also store your sequences in a separate file containing only FASTA entries.{quote}

*Task:* Upgrade the GFF parser logic to be able to handle GFF3 files with a Sequence Section.

*Situation:* The GFF3 file format may include a Sequence Section in FASTA format at the end of the file, but IGB is not currently able to parse a GFF3 file when this section is present. See ~~IGBF-3924~~ for a report on the resulting error when a GFF3 file with this section is loaded into IGB.

Files which are not being parsed correctly:
* prodigal_Lambda_phage_sequences.gff
* FragGeneScan_Lambda_phage_sequences.gff

Link to Metacerberus output on Google Drive: https://drive.google.com/drive/folders/14noPsmKYMxX9jgHYQhkjqaTGzT8z8bSK

Here's some more info from the GFF3 documentation (https://gmod.org/wiki/GFF3):

{quote}*GFF3 Sequence Section*
GFF3 files can also include sequence in FASTA format at the end of the file. The FASTA sequences are preceded by a ##FASTA line. This sequence section is optional. If present, the sequence section can define sequence for any landmark used in column 1 (the frame of reference). For example:

{noformat}
##gff-version 3
ctg123 . exon 1300 1500 . + . ID=exon00001
ctg123 . exon 1050 1500 . + . ID=exon00002
ctg123 . exon 3000 3902 . + . ID=exon00003
ctg123 . exon 5000 5500 . + . ID=exon00004
ctg123 . exon 7000 9000 . + . ID=exon00005
##FASTA
>ctg123
cttctgggcgtacccgattctcggagaacttgccgcaccattccgccttg
tgttcattgctgcctgcatgttcattgtctacctcggctacgtgtggcta
tctttcctcggtgccctcgtgcacggagtcgagaaaccaaagaacaaaaa
aagaaattaaaatatttattttgctgtggtttttgatgtgtgttttttat
aatgatttttgatgtgaccaattgtacttttcctttaaatgaaatgtaat
cttaaatgtatttccgacgaattcgaggcctgaaaagtgtgacgccattc
...
{noformat}

When the GFF3 file is processed the IDs on the header line of FASTA entries are matched with IDs used in column 1 in the annotation section of the file.

You don’t have to store the FASTA in the GFF file. You can also store your sequences in a separate file containing only FASTA entries.{quote}

*Task:* Upgrade the GFF parser logic to be able to handle GFF3 files with a Sequence Section.

Paige Kulzer (Inactive) made changes - 15/Oct/24 10:18 AM

Description

*Situation:* The GFF3 file format may include a Sequence Section in FASTA format at the end of the file, but IGB is not currently able to parse a GFF3 file when this section is present. See ~~IGBF-3924~~ for a report on the resulting error when a GFF3 file with this section is loaded into IGB.

Files which are not being parsed correctly:
* prodigal_Lambda_phage_sequences.gff
* FragGeneScan_Lambda_phage_sequences.gff

Link to Metacerberus output on Google Drive: https://drive.google.com/drive/folders/14noPsmKYMxX9jgHYQhkjqaTGzT8z8bSK

Here's some more info from the GFF3 documentation (https://gmod.org/wiki/GFF3):

{quote}*GFF3 Sequence Section*
GFF3 files can also include sequence in FASTA format at the end of the file. The FASTA sequences are preceded by a ##FASTA line. This sequence section is optional. If present, the sequence section can define sequence for any landmark used in column 1 (the frame of reference). For example:

{noformat}
##gff-version 3
ctg123 . exon 1300 1500 . + . ID=exon00001
ctg123 . exon 1050 1500 . + . ID=exon00002
ctg123 . exon 3000 3902 . + . ID=exon00003
ctg123 . exon 5000 5500 . + . ID=exon00004
ctg123 . exon 7000 9000 . + . ID=exon00005
##FASTA
>ctg123
cttctgggcgtacccgattctcggagaacttgccgcaccattccgccttg
tgttcattgctgcctgcatgttcattgtctacctcggctacgtgtggcta
tctttcctcggtgccctcgtgcacggagtcgagaaaccaaagaacaaaaa
aagaaattaaaatatttattttgctgtggtttttgatgtgtgttttttat
aatgatttttgatgtgaccaattgtacttttcctttaaatgaaatgtaat
cttaaatgtatttccgacgaattcgaggcctgaaaagtgtgacgccattc
...
{noformat}

When the GFF3 file is processed the IDs on the header line of FASTA entries are matched with IDs used in column 1 in the annotation section of the file.

You don’t have to store the FASTA in the GFF file. You can also store your sequences in a separate file containing only FASTA entries.{quote}

*Task:* Upgrade the GFF parser logic to be able to handle GFF3 files with a Sequence Section.

*Situation:* The GFF3 file format may include a Sequence Section in FASTA format at the end of the file, but IGB is not currently able to parse a GFF3 file when this section is present. See ~~IGBF-3924~~ for a report on the resulting error when a GFF3 file with this section is loaded into IGB.

Here's some more info from the GFF3 documentation (https://gmod.org/wiki/GFF3):

{quote}*GFF3 Sequence Section*
GFF3 files can also include sequence in FASTA format at the end of the file. The FASTA sequences are preceded by a ##FASTA line. This sequence section is optional. If present, the sequence section can define sequence for any landmark used in column 1 (the frame of reference). For example:

{noformat}
##gff-version 3
ctg123 . exon 1300 1500 . + . ID=exon00001
ctg123 . exon 1050 1500 . + . ID=exon00002
ctg123 . exon 3000 3902 . + . ID=exon00003
ctg123 . exon 5000 5500 . + . ID=exon00004
ctg123 . exon 7000 9000 . + . ID=exon00005
##FASTA
>ctg123
cttctgggcgtacccgattctcggagaacttgccgcaccattccgccttg
tgttcattgctgcctgcatgttcattgtctacctcggctacgtgtggcta
tctttcctcggtgccctcgtgcacggagtcgagaaaccaaagaacaaaaa
aagaaattaaaatatttattttgctgtggtttttgatgtgtgttttttat
aatgatttttgatgtgaccaattgtacttttcctttaaatgaaatgtaat
cttaaatgtatttccgacgaattcgaggcctgaaaagtgtgacgccattc
...
{noformat}

When the GFF3 file is processed the IDs on the header line of FASTA entries are matched with IDs used in column 1 in the annotation section of the file.

You don’t have to store the FASTA in the GFF file. You can also store your sequences in a separate file containing only FASTA entries.{quote}

*Task:* Upgrade the GFF parser logic to be able to handle GFF3 files with a Sequence Section.

----

Example files which are not being parsed correctly:
* prodigal_Lambda_phage_sequences.gff
* FragGeneScan_Lambda_phage_sequences.gff

Link to those files on Google Drive: https://drive.google.com/drive/folders/14noPsmKYMxX9jgHYQhkjqaTGzT8z8bSK

Paige Kulzer (Inactive) made changes - 15/Oct/24 10:19 AM

Labels

intermediate

Paige Kulzer (Inactive) made changes - 15/Oct/24 10:19 AM

Labels

intermediate

Intermediate

Paige Kulzer (Inactive) made changes - 24/Oct/24 12:11 PM

Link

This issue relates to IGBF-3955 [ IGBF-3955 ]

Paige Kulzer (Inactive) made changes - 24/Oct/24 12:19 PM

Description

*Situation:* The GFF3 file format may include a Sequence Section in FASTA format at the end of the file, but IGB is not currently able to parse a GFF3 file when this section is present. See ~~IGBF-3924~~ for a report on the resulting error when a GFF3 file with this section is loaded into IGB.

Here's some more info from the GFF3 documentation (https://gmod.org/wiki/GFF3):

{quote}*GFF3 Sequence Section*
GFF3 files can also include sequence in FASTA format at the end of the file. The FASTA sequences are preceded by a ##FASTA line. This sequence section is optional. If present, the sequence section can define sequence for any landmark used in column 1 (the frame of reference). For example:

{noformat}
##gff-version 3
ctg123 . exon 1300 1500 . + . ID=exon00001
ctg123 . exon 1050 1500 . + . ID=exon00002
ctg123 . exon 3000 3902 . + . ID=exon00003
ctg123 . exon 5000 5500 . + . ID=exon00004
ctg123 . exon 7000 9000 . + . ID=exon00005
##FASTA
>ctg123
cttctgggcgtacccgattctcggagaacttgccgcaccattccgccttg
tgttcattgctgcctgcatgttcattgtctacctcggctacgtgtggcta
tctttcctcggtgccctcgtgcacggagtcgagaaaccaaagaacaaaaa
aagaaattaaaatatttattttgctgtggtttttgatgtgtgttttttat
aatgatttttgatgtgaccaattgtacttttcctttaaatgaaatgtaat
cttaaatgtatttccgacgaattcgaggcctgaaaagtgtgacgccattc
...
{noformat}

When the GFF3 file is processed the IDs on the header line of FASTA entries are matched with IDs used in column 1 in the annotation section of the file.

You don’t have to store the FASTA in the GFF file. You can also store your sequences in a separate file containing only FASTA entries.{quote}

*Task:* Upgrade the GFF parser logic to be able to handle GFF3 files with a Sequence Section.

----

Example files which are not being parsed correctly:
* prodigal_Lambda_phage_sequences.gff
* FragGeneScan_Lambda_phage_sequences.gff

Link to those files on Google Drive: https://drive.google.com/drive/folders/14noPsmKYMxX9jgHYQhkjqaTGzT8z8bSK

*Situation:* The GFF3 file format may include a Sequence Section in FASTA format at the end of the file, but IGB is not currently able to parse a GFF3 file when this section is present. See ~~IGBF-3924~~ for a report on the resulting error when a GFF3 file with this section is loaded into IGB.

Here's some more info from the GFF3 documentation - https://gmod.org/wiki/GFF3

{quote}*GFF3 Sequence Section*
GFF3 files can also include sequence in FASTA format at the end of the file. The FASTA sequences are preceded by a ##FASTA line. This sequence section is optional. If present, the sequence section can define sequence for any landmark used in column 1 (the frame of reference). For example:

{noformat}
##gff-version 3
ctg123 . exon 1300 1500 . + . ID=exon00001
ctg123 . exon 1050 1500 . + . ID=exon00002
ctg123 . exon 3000 3902 . + . ID=exon00003
ctg123 . exon 5000 5500 . + . ID=exon00004
ctg123 . exon 7000 9000 . + . ID=exon00005
##FASTA
>ctg123
cttctgggcgtacccgattctcggagaacttgccgcaccattccgccttg
tgttcattgctgcctgcatgttcattgtctacctcggctacgtgtggcta
tctttcctcggtgccctcgtgcacggagtcgagaaaccaaagaacaaaaa
aagaaattaaaatatttattttgctgtggtttttgatgtgtgttttttat
aatgatttttgatgtgaccaattgtacttttcctttaaatgaaatgtaat
cttaaatgtatttccgacgaattcgaggcctgaaaagtgtgacgccattc
...
{noformat}

When the GFF3 file is processed the IDs on the header line of FASTA entries are matched with IDs used in column 1 in the annotation section of the file.

You don’t have to store the FASTA in the GFF file. You can also store your sequences in a separate file containing only FASTA entries.{quote}

*Task:* Upgrade the GFF parser logic to be able to handle GFF3 files with a Sequence Section.

----

Example files which are not being parsed correctly:
* prodigal_Lambda_phage_sequences.gff
* FragGeneScan_Lambda_phage_sequences.gff

Link to those files on Google Drive: https://drive.google.com/drive/folders/14noPsmKYMxX9jgHYQhkjqaTGzT8z8bSK

Nowlan Freese made changes - 25/Oct/24 10:34 AM

Assignee

Nowlan Freese [ nfreese ]

Ann Loraine made changes - 01/Nov/24 10:08 AM

Sprint

Fall 4 [ 205 ]

Nowlan Freese made changes - 01/Nov/24 10:09 AM

Sprint

Fall 6 [ 207 ]

Paige Kulzer (Inactive) made changes - 14/Nov/24 8:40 AM

Link

This issue relates to ~~IGBF-3884~~ [ ~~IGBF-3884~~ ]

Paige Kulzer (Inactive) made changes - 14/Nov/24 8:51 AM

Link

This issue relates to ~~IGBF-3884~~ [ ~~IGBF-3884~~ ]

Paige Kulzer (Inactive) made changes - 14/Nov/24 8:51 AM

Link

This issue blocks ~~IGBF-3884~~ [ ~~IGBF-3884~~ ]

Ann Loraine made changes - 09/Dec/24 10:28 AM

Sprint

Fall 6 [ 207 ]

Fall 6, Fall 7 [ 207, 208 ]

Ann Loraine made changes - 09/Dec/24 10:28 AM

Rank

Ranked higher

Paige Kulzer (Inactive) made changes - 09/Dec/24 1:55 PM

Summary

Upgrade GFF parser

Investigate: GFF parser error

Paige Kulzer (Inactive) made changes - 09/Dec/24 1:55 PM

Assignee

Paige Kulzer [ pkulzer ]

Paige Kulzer (Inactive) made changes - 09/Dec/24 1:56 PM

Status

To-Do [ 10305 ]

In Progress [ 3 ]

Paige Kulzer (Inactive) made changes - 09/Dec/24 1:56 PM

Status

In Progress [ 3 ]

Needs 1st Level Review [ 10005 ]

Paige Kulzer (Inactive) made changes - 09/Dec/24 1:56 PM

Status

Needs 1st Level Review [ 10005 ]

First Level Review in Progress [ 10301 ]

Paige Kulzer (Inactive) made changes - 09/Dec/24 1:56 PM

Status

First Level Review in Progress [ 10301 ]

Ready for Pull Request [ 10304 ]

Paige Kulzer (Inactive) made changes - 09/Dec/24 1:56 PM

Status

Ready for Pull Request [ 10304 ]

Pull Request Submitted [ 10101 ]

Paige Kulzer (Inactive) made changes - 09/Dec/24 1:56 PM

Status

Pull Request Submitted [ 10101 ]

Reviewing Pull Request [ 10303 ]

Paige Kulzer (Inactive) made changes - 09/Dec/24 1:56 PM

Status

Reviewing Pull Request [ 10303 ]

Merged Needs Testing [ 10002 ]

Paige Kulzer (Inactive) made changes - 09/Dec/24 1:56 PM

Status

Merged Needs Testing [ 10002 ]

Post-merge Testing In Progress [ 10003 ]

Paige Kulzer (Inactive) made changes - 09/Dec/24 1:56 PM

Resolution		Done [ 10000 ]
Status	Post-merge Testing In Progress [ 10003 ]	Closed [ 6 ]

Paige Kulzer (Inactive) made changes - 09/Dec/24 4:32 PM

Link

This issue relates to ~~IGBF-4002~~ [ ~~IGBF-4002~~ ]

Nowlan Freese made changes - 16/Dec/24 2:12 PM

Epic Link

IGBF-1765 [ 17855 ]

IGBF-4028 [ 23324 ]

Nowlan Freese made changes - 03/Dec/25 11:13 AM

Description

*Situation:* The GFF3 file format may include a Sequence Section in FASTA format at the end of the file, but IGB is not currently able to parse a GFF3 file when this section is present. See ~~IGBF-3924~~ for a report on the resulting error when a GFF3 file with this section is loaded into IGB.

Here's some more info from the GFF3 documentation - https://gmod.org/wiki/GFF3

{quote}*GFF3 Sequence Section*
GFF3 files can also include sequence in FASTA format at the end of the file. The FASTA sequences are preceded by a ##FASTA line. This sequence section is optional. If present, the sequence section can define sequence for any landmark used in column 1 (the frame of reference). For example:

{noformat}
##gff-version 3
ctg123 . exon 1300 1500 . + . ID=exon00001
ctg123 . exon 1050 1500 . + . ID=exon00002
ctg123 . exon 3000 3902 . + . ID=exon00003
ctg123 . exon 5000 5500 . + . ID=exon00004
ctg123 . exon 7000 9000 . + . ID=exon00005
##FASTA
>ctg123
cttctgggcgtacccgattctcggagaacttgccgcaccattccgccttg
tgttcattgctgcctgcatgttcattgtctacctcggctacgtgtggcta
tctttcctcggtgccctcgtgcacggagtcgagaaaccaaagaacaaaaa
aagaaattaaaatatttattttgctgtggtttttgatgtgtgttttttat
aatgatttttgatgtgaccaattgtacttttcctttaaatgaaatgtaat
cttaaatgtatttccgacgaattcgaggcctgaaaagtgtgacgccattc
...
{noformat}

When the GFF3 file is processed the IDs on the header line of FASTA entries are matched with IDs used in column 1 in the annotation section of the file.

You don’t have to store the FASTA in the GFF file. You can also store your sequences in a separate file containing only FASTA entries.{quote}

*Task:* Upgrade the GFF parser logic to be able to handle GFF3 files with a Sequence Section.

----

Example files which are not being parsed correctly:
* prodigal_Lambda_phage_sequences.gff
* FragGeneScan_Lambda_phage_sequences.gff

Link to those files on Google Drive: https://drive.google.com/drive/folders/14noPsmKYMxX9jgHYQhkjqaTGzT8z8bSK

*Situation:* The GFF3 file format may include a Sequence Section in FASTA format at the end of the file, but IGB is not currently able to parse a GFF3 file when this section is present. See ~~IGBF-3924~~ for a report on the resulting error when a GFF3 file with this section is loaded into IGB.

Here's some more info from the GFF3 documentation - https://gmod.org/wiki/GFF3

{quote}*GFF3 Sequence Section*
GFF3 files can also include sequence in FASTA format at the end of the file. The FASTA sequences are preceded by a ##FASTA line. This sequence section is optional. If present, the sequence section can define sequence for any landmark used in column 1 (the frame of reference). For example:

{noformat}
##gff-version 3
ctg123 . exon 1300 1500 . + . ID=exon00001
ctg123 . exon 1050 1500 . + . ID=exon00002
ctg123 . exon 3000 3902 . + . ID=exon00003
ctg123 . exon 5000 5500 . + . ID=exon00004
ctg123 . exon 7000 9000 . + . ID=exon00005
##FASTA
>ctg123
cttctgggcgtacccgattctcggagaacttgccgcaccattccgccttg
tgttcattgctgcctgcatgttcattgtctacctcggctacgtgtggcta
tctttcctcggtgccctcgtgcacggagtcgagaaaccaaagaacaaaaa
aagaaattaaaatatttattttgctgtggtttttgatgtgtgttttttat
aatgatttttgatgtgaccaattgtacttttcctttaaatgaaatgtaat
cttaaatgtatttccgacgaattcgaggcctgaaaagtgtgacgccattc
...
{noformat}

When the GFF3 file is processed the IDs on the header line of FASTA entries are matched with IDs used in column 1 in the annotation section of the file.

You don’t have to store the FASTA in the GFF file. You can also store your sequences in a separate file containing only FASTA entries.{quote}

*Task:* Upgrade the GFF parser logic to be able to handle GFF3 files with a Sequence Section.

----

Example files which are not being parsed correctly:
* prodigal_Lambda_phage_sequences.gff
* FragGeneScan_Lambda_phage_sequences.gff

Link to those files on Google Drive: https://drive.google.com/drive/folders/14noPsmKYMxX9jgHYQhkjqaTGzT8z8bSK
Link to files on Loraine Lab Google Drive: https://drive.google.com/drive/folders/1MLsVItXNcskfiCAg62GFmxWc1-NR40Tx?usp=drive_link

Investigate: GFF parser error

Details

Description

Attachments

Issue Links

Activity

People

Dates