Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-4362

Fix GFF loading with multiple FASTA sequences

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: 10.2.0
    • Labels:
      None

      Description

      Situation: When loading a GFF in IGB that includes multiple FASTA sequences at the bottom of the file, IGB is adding each of the FASTA as a new chromosome/contig, but only the last chromosome/contig will correctly load the residues in IGB. The rest of the chromosome/contigs are not including the residues.

      Task: Investigate and if possible fix it so that all of the chromosomes/contigs are able to load the residues.

      To reproduce the issue:

      1. Start IGB
      2. Do not select a genome version
      3. Drag and drop the attached file (testFile.gff) into IGB (or select File > Open File...)
      4. Select chr3 and click Load Data - a gene model should appear and residues should appear in the Coordinates track
      5. Select chr2 and click Load Data - a gene model should appear, but no residues will be present in the Coordinates track

        Attachments

          Issue Links

            Activity

            Hide
            nfreese Nowlan Freese added a comment -
            Show
            nfreese Nowlan Freese added a comment - Additional GFF file for testing with multiple fasta sequences Link to description: https://www.culturecollections.org.uk/nop/product/staphylococcus-aureus-183 Link to file: ftp://ftp.sanger.ac.uk/pub/project/pathogens/NCTC3000/datalinks_manual/ERS654933.gff
            Hide
            sjagarap saideepthi jagarapu (Inactive) added a comment -
            Show
            sjagarap saideepthi jagarapu (Inactive) added a comment - code changes: https://bitbucket.org/lorainelab-deepthi/integrated-genome-browser/branch/IGBF-4362 Tested with gff testing files attached.
            Hide
            nfreese Nowlan Freese added a comment -

            Tested Deepthi's branch on Mac.

            Loading the testFile.gff (without loading a genome first) now shows the three chromosomes with their sequence. I also tested the ERS654933.gff file and it worked as expected.

            Note: If the GFF with fasta included is added to IGB without first loading a genome or a custom genome, the sequence (chromosome/contig) lengths reported in IGB will be limited to the GFF annotation lengths. You can see this in the testFile.gff where the chromosome lengths in IGB are slightly shorter than the number of residues in the fasta section of the file. This behavior is the same in IGB 10.1.0 and is not related to the changes in this ticket.

            Show
            nfreese Nowlan Freese added a comment - Tested Deepthi's branch on Mac. Loading the testFile.gff (without loading a genome first) now shows the three chromosomes with their sequence. I also tested the ERS654933.gff file and it worked as expected. Note: If the GFF with fasta included is added to IGB without first loading a genome or a custom genome, the sequence (chromosome/contig) lengths reported in IGB will be limited to the GFF annotation lengths. You can see this in the testFile.gff where the chromosome lengths in IGB are slightly shorter than the number of residues in the fasta section of the file. This behavior is the same in IGB 10.1.0 and is not related to the changes in this ticket.
            Hide
            sjagarap saideepthi jagarapu (Inactive) added a comment -

            Code changes glimpse:

            • When a FASTA header (>sequenceId) is found, extract the seqID and check if a writer exists for that chromosome using a set.
            • If the chromosome is seen for the first time, addToLists() creates a new output writer for that chromosome and registers it in chrs
            • bw is updated to point to the writer belonging to the current FASTA sequence ID.
            • fastaWritten.add(fastaSeqId) ensures that ##FASTA is written once per chromosome (writer considers each sequence as a separate fasta file so we need ##fasta for each chromosome), preventing duplicate FASTA section headers.
            • Finally, any FASTA line (header or sequence) is written to the correct chromosome file by using the updated bw.

            attn: Nowlan Freese Ann Loraine

            Show
            sjagarap saideepthi jagarapu (Inactive) added a comment - Code changes glimpse: When a FASTA header (>sequenceId) is found, extract the seqID and check if a writer exists for that chromosome using a set. If the chromosome is seen for the first time, addToLists() creates a new output writer for that chromosome and registers it in chrs bw is updated to point to the writer belonging to the current FASTA sequence ID. fastaWritten.add(fastaSeqId) ensures that ##FASTA is written once per chromosome (writer considers each sequence as a separate fasta file so we need ##fasta for each chromosome), preventing duplicate FASTA section headers. Finally, any FASTA line (header or sequence) is written to the correct chromosome file by using the updated bw. attn: Nowlan Freese Ann Loraine
            Hide
            nfreese Nowlan Freese added a comment -

            Testing complete.

            Ready for PR.

            Show
            nfreese Nowlan Freese added a comment - Testing complete. Ready for PR.
            Show
            sjagarap saideepthi jagarapu (Inactive) added a comment - PR - https://bitbucket.org/lorainelab/integrated-genome-browser/pull-requests/1087
            Hide
            ann.loraine Ann Loraine added a comment -

            PR is merged.

            Show
            ann.loraine Ann Loraine added a comment - PR is merged.
            Hide
            nfreese Nowlan Freese added a comment -

            Tested main branch on Mac.

            Able to load gff files that include multiple fasta sequences and view the sequences for each chromosome/contig in IGB.

            Closing ticket.

            Show
            nfreese Nowlan Freese added a comment - Tested main branch on Mac. Able to load gff files that include multiple fasta sequences and view the sequences for each chromosome/contig in IGB. Closing ticket.

              People

              • Assignee:
                sjagarap saideepthi jagarapu (Inactive)
                Reporter:
                nfreese Nowlan Freese
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: