Details

    • Type: Improvement
    • Status: Closed (View Workflow)
    • Priority: Minor
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:

      Description

      Situation: The GFF3 file format may include a Sequence Section in FASTA format at the end of the file, but IGB is not currently able to parse a GFF3 file when this section is present. See IGBF-3924 for a report on the resulting error when a GFF3 file with this section is loaded into IGB.

      Here's some more info from the GFF3 documentation - https://gmod.org/wiki/GFF3

      GFF3 Sequence Section
      GFF3 files can also include sequence in FASTA format at the end of the file. The FASTA sequences are preceded by a ##FASTA line. This sequence section is optional. If present, the sequence section can define sequence for any landmark used in column 1 (the frame of reference). For example:

      ##gff-version 3
      ctg123 . exon            1300  1500  .  +  .  ID=exon00001
      ctg123 . exon            1050  1500  .  +  .  ID=exon00002
      ctg123 . exon            3000  3902  .  +  .  ID=exon00003
      ctg123 . exon            5000  5500  .  +  .  ID=exon00004
      ctg123 . exon            7000  9000  .  +  .  ID=exon00005
      ##FASTA
      >ctg123
      cttctgggcgtacccgattctcggagaacttgccgcaccattccgccttg
      tgttcattgctgcctgcatgttcattgtctacctcggctacgtgtggcta
      tctttcctcggtgccctcgtgcacggagtcgagaaaccaaagaacaaaaa
      aagaaattaaaatatttattttgctgtggtttttgatgtgtgttttttat
      aatgatttttgatgtgaccaattgtacttttcctttaaatgaaatgtaat
      cttaaatgtatttccgacgaattcgaggcctgaaaagtgtgacgccattc
      ...
      

      When the GFF3 file is processed the IDs on the header line of FASTA entries are matched with IDs used in column 1 in the annotation section of the file.

      You don’t have to store the FASTA in the GFF file. You can also store your sequences in a separate file containing only FASTA entries.

      Task: Upgrade the GFF parser logic to be able to handle GFF3 files with a Sequence Section.


      Example files which are not being parsed correctly:

      • prodigal_Lambda_phage_sequences.gff
      • FragGeneScan_Lambda_phage_sequences.gff

      Link to those files on Google Drive: https://drive.google.com/drive/folders/14noPsmKYMxX9jgHYQhkjqaTGzT8z8bSK
      Link to files on Loraine Lab Google Drive: https://drive.google.com/drive/folders/1MLsVItXNcskfiCAg62GFmxWc1-NR40Tx?usp=drive_link

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                pkulzer Paige Kulzer
                Reporter:
                pkulzer Paige Kulzer
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: