Details
-
Type:
Improvement
-
Status: Closed (View Workflow)
-
Priority:
Minor
-
Resolution: Done
-
Affects Version/s: None
-
Fix Version/s: None
-
Labels:
-
Story Points:1.5
-
Epic Link:
-
Sprint:Fall 6, Fall 7
Description
Situation: The GFF3 file format may include a Sequence Section in FASTA format at the end of the file, but IGB is not currently able to parse a GFF3 file when this section is present. See IGBF-3924 for a report on the resulting error when a GFF3 file with this section is loaded into IGB.
Here's some more info from the GFF3 documentation - https://gmod.org/wiki/GFF3
GFF3 Sequence Section
GFF3 files can also include sequence in FASTA format at the end of the file. The FASTA sequences are preceded by a ##FASTA line. This sequence section is optional. If present, the sequence section can define sequence for any landmark used in column 1 (the frame of reference). For example:##gff-version 3 ctg123 . exon 1300 1500 . + . ID=exon00001 ctg123 . exon 1050 1500 . + . ID=exon00002 ctg123 . exon 3000 3902 . + . ID=exon00003 ctg123 . exon 5000 5500 . + . ID=exon00004 ctg123 . exon 7000 9000 . + . ID=exon00005 ##FASTA >ctg123 cttctgggcgtacccgattctcggagaacttgccgcaccattccgccttg tgttcattgctgcctgcatgttcattgtctacctcggctacgtgtggcta tctttcctcggtgccctcgtgcacggagtcgagaaaccaaagaacaaaaa aagaaattaaaatatttattttgctgtggtttttgatgtgtgttttttat aatgatttttgatgtgaccaattgtacttttcctttaaatgaaatgtaat cttaaatgtatttccgacgaattcgaggcctgaaaagtgtgacgccattc ...When the GFF3 file is processed the IDs on the header line of FASTA entries are matched with IDs used in column 1 in the annotation section of the file.
You don’t have to store the FASTA in the GFF file. You can also store your sequences in a separate file containing only FASTA entries.
Task: Upgrade the GFF parser logic to be able to handle GFF3 files with a Sequence Section.
Example files which are not being parsed correctly:
- prodigal_Lambda_phage_sequences.gff
- FragGeneScan_Lambda_phage_sequences.gff
Link to those files on Google Drive: https://drive.google.com/drive/folders/14noPsmKYMxX9jgHYQhkjqaTGzT8z8bSK
Link to files on Loraine Lab Google Drive: https://drive.google.com/drive/folders/1MLsVItXNcskfiCAg62GFmxWc1-NR40Tx?usp=drive_link
Note: The issue appears to be with a second new line character after the fasta section causing a null pointer exception. The issue is not with the fasta section itself.