Uploaded image for project: 'User Support'
  1. User Support
  2. HELP-319

Investigate why NCBI GFF files will not open

    Details

    • Story Points:
      1.5

      Description

      Email from user, via sourceforge, from: sjack93@users.sourceforge.net

      The IGB website says their program supports many different file formats, including gff. I have saved a genome from NCBI in several different file formats to my computer (using Ubuntu), but when I go to open the custom genome from IGB, it only recognizes the .fna file type, when I have genomes saved as .gbff and .gff in the same folder. It's like the program won't recognize these file types saved on my computer. Does anyone know why this might be happening? Thanks in advance.

      and direct email from user:

      Hi Ann,

      I found your e-mail address on the troubleshooting page of the IGB (integrated genome browser) website and I'm hoping you can help me. The IGB user guide states that many different file formats are supported for this program (>20 file types including gff, gbff, fna and more). However, when I attempt to open genome from file, it recognizes only about half the file types it claims to be compatible with. I am running IGB on Ubuntu. Do you have any idea why this might be happening?

      from: stephanie.jack@unb.ca

      Reply:

      Reply from Dr. Loraine:

      Thank you for getting in touch.

      Can you send me URLs of the GFF files you are trying to open?

      We've had some issues with NCBI's GFF files in the past and I think we may have fixed those problems ... but this may need to be updated!

      If we are able to fix the problem, we should be able to roll it out to you fairly quickly as an "early access" IGB release. We are setting the early access mechanism in the next few weeks, so hopefully we can get your problem addressed in a few weeks, as well.

        Attachments

          Issue Links

            Activity

            Hide
            nfreese Nowlan Freese added a comment -

            User help thread on sourceforge :

            Stephanie:
            The IGB website says their program supports many different file formats, including gff. I have saved a genome from NCBI in several different file formats to my computer (using Ubuntu), but when I go to open the custom genome from IGB, it only recognizes the .fna file type, when I have genomes saved as .gbff and .gff in the same folder. It's like the program won't recognize these file types saved on my computer. Does anyone know why this might be happening? Thanks in advance.

            Nowlan:
            To load a custom genome, IGB is looking for sequence files such as fasta, fna, or 2bit. If the genome for your data is not available, and you do not have a sequence file, you can drag and drop the .gff file directly into IGB and then click Load Data.

            Stephanie:
            Thanks for the info. I loaded the genome as a fna file, then the annotation as a .gff file, which was recommended to me on another online forum. The program has been "retreiving chromosomes" for many hours, do you know why this might be? Also, do you know what the "maximum heap size" is? It's displayed on the bottom right corner of the IGB interface, and the proportin of max heap size being used contnues to change as the program attempts to retreive chromosomes.

            Dr. Loraine:
            Are there a lot of reference sequences mentioned in the GFF file?
            I would recommend opening the same file sequence in anIDE with debugger to see where the hang up occurs.
            It would be nice if IGB could handle the various issues that come up with NCBI gff — NCBI is a major clearinghouse for genomic data that many people use.

            Nowlan:
            The issue appears to be with the gff file you are trying to view. The file does not appear to contain gene annotations mapped to a genome.

            The Anguilla rostrata (American eel) annotation is available from dryad. If you unpack the file, there is a file called american_eel_genome_v5.gff that appears to contain the annotation. I sorted and compressed the file and index (attached). Try loading it (american_eel_genome_v5.sorted.gff.gz) in IGB and let me know if it works for you.

            Show
            nfreese Nowlan Freese added a comment - User help thread on sourceforge : Stephanie: The IGB website says their program supports many different file formats, including gff. I have saved a genome from NCBI in several different file formats to my computer (using Ubuntu), but when I go to open the custom genome from IGB, it only recognizes the .fna file type, when I have genomes saved as .gbff and .gff in the same folder. It's like the program won't recognize these file types saved on my computer. Does anyone know why this might be happening? Thanks in advance. Nowlan: To load a custom genome, IGB is looking for sequence files such as fasta, fna, or 2bit. If the genome for your data is not available, and you do not have a sequence file, you can drag and drop the .gff file directly into IGB and then click Load Data. Stephanie: Thanks for the info. I loaded the genome as a fna file, then the annotation as a .gff file, which was recommended to me on another online forum. The program has been "retreiving chromosomes" for many hours, do you know why this might be? Also, do you know what the "maximum heap size" is? It's displayed on the bottom right corner of the IGB interface, and the proportin of max heap size being used contnues to change as the program attempts to retreive chromosomes. Dr. Loraine: Are there a lot of reference sequences mentioned in the GFF file? I would recommend opening the same file sequence in anIDE with debugger to see where the hang up occurs. It would be nice if IGB could handle the various issues that come up with NCBI gff — NCBI is a major clearinghouse for genomic data that many people use. Nowlan: The issue appears to be with the gff file you are trying to view. The file does not appear to contain gene annotations mapped to a genome. The Anguilla rostrata (American eel) annotation is available from dryad. If you unpack the file, there is a file called american_eel_genome_v5.gff that appears to contain the annotation. I sorted and compressed the file and index (attached). Try loading it (american_eel_genome_v5.sorted.gff.gz) in IGB and let me know if it works for you.
            Hide
            nfreese Nowlan Freese added a comment -

            User help thread on Biostars

            Show
            nfreese Nowlan Freese added a comment - User help thread on Biostars
            Hide
            nfreese Nowlan Freese added a comment -

            There seem to be a couple of issues. The gff file from NCBI she was trying to load is oddly formatted, and it is unclear if it contains actual annotations.

            The other issue is that the file contains ~12,000 assemblies/chromosomes. This causes the following exception in IGB:

            Feb 20, 2019 3:45:51 PM com.affymetrix.genometry.quickload.QuickLoadSymLoader logException
            SEVERE: Too many open files
            java.io.IOException: Too many open files
            at java.io.UnixFileSystem.createFileExclusively(Native Method)
            at java.io.File.createTempFile(File.java:2024)
            at java.io.File.createTempFile(File.java:2070)
            at com.affymetrix.genometry.symloader.SymLoader.addToLists(SymLoader.java:421)
            at com.affymetrix.genometry.symloader.GFF3.parseLines(GFF3.java:215)
            at com.affymetrix.genometry.symloader.SymLoader.buildIndex(SymLoader.java:241)
            at com.affymetrix.genometry.symloader.GFF3.init(GFF3.java:76)
            at com.affymetrix.genometry.symloader.GFF3.getChromosomeList(GFF3.java:83)
            at com.affymetrix.genometry.quickload.QuickLoadSymLoader.getChromosomeList(QuickLoadSymLoader.java:254)
            at com.affymetrix.igb.view.load.GeneralLoadUtils$3.runInBackground(GeneralLoadUtils.java:1002)
            at com.affymetrix.igb.view.load.GeneralLoadUtils$3.runInBackground(GeneralLoadUtils.java:996)
            at com.affymetrix.genometry.thread.CThreadWorker.doInBackground(CThreadWorker.java:73)
            at javax.swing.SwingWorker$1.call(SwingWorker.java:295)
            at java.util.concurrent.FutureTask.run(FutureTask.java:266)
            at javax.swing.SwingWorker.run(SwingWorker.java:334)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)

            Show
            nfreese Nowlan Freese added a comment - There seem to be a couple of issues. The gff file from NCBI she was trying to load is oddly formatted, and it is unclear if it contains actual annotations. The other issue is that the file contains ~12,000 assemblies/chromosomes. This causes the following exception in IGB: Feb 20, 2019 3:45:51 PM com.affymetrix.genometry.quickload.QuickLoadSymLoader logException SEVERE: Too many open files java.io.IOException: Too many open files at java.io.UnixFileSystem.createFileExclusively(Native Method) at java.io.File.createTempFile(File.java:2024) at java.io.File.createTempFile(File.java:2070) at com.affymetrix.genometry.symloader.SymLoader.addToLists(SymLoader.java:421) at com.affymetrix.genometry.symloader.GFF3.parseLines(GFF3.java:215) at com.affymetrix.genometry.symloader.SymLoader.buildIndex(SymLoader.java:241) at com.affymetrix.genometry.symloader.GFF3.init(GFF3.java:76) at com.affymetrix.genometry.symloader.GFF3.getChromosomeList(GFF3.java:83) at com.affymetrix.genometry.quickload.QuickLoadSymLoader.getChromosomeList(QuickLoadSymLoader.java:254) at com.affymetrix.igb.view.load.GeneralLoadUtils$3.runInBackground(GeneralLoadUtils.java:1002) at com.affymetrix.igb.view.load.GeneralLoadUtils$3.runInBackground(GeneralLoadUtils.java:996) at com.affymetrix.genometry.thread.CThreadWorker.doInBackground(CThreadWorker.java:73) at javax.swing.SwingWorker$1.call(SwingWorker.java:295) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at javax.swing.SwingWorker.run(SwingWorker.java:334) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
            Hide
            nfreese Nowlan Freese added a comment -

            The best current solution is to download the annotation file from dryad , which is the annotation source listed in the original American eel paper. The file needs to be bgzipped and tabix indexed or else igb will throw an exception: Too many open files.

            Unpack the file.
            Navigate to american_eel_genome_v5.sorted.gff
            (grep ^"#" american_eel_genome_v5.gff; grep -v ^"#" american_eel_genome_v5.gff | sort -k1,1 -k4,4n) > american_eel_genome_v5.sorted.gff
            bgzip american_eel_genome_v5.sorted.gff
            tabix american_eel_genome_v5.sorted.gff

            Show
            nfreese Nowlan Freese added a comment - The best current solution is to download the annotation file from dryad , which is the annotation source listed in the original American eel paper. The file needs to be bgzipped and tabix indexed or else igb will throw an exception: Too many open files. Unpack the file. Navigate to american_eel_genome_v5.sorted.gff (grep ^"#" american_eel_genome_v5.gff; grep -v ^"#" american_eel_genome_v5.gff | sort -k1,1 -k4,4n) > american_eel_genome_v5.sorted.gff bgzip american_eel_genome_v5.sorted.gff tabix american_eel_genome_v5.sorted.gff

              People

              • Assignee:
                ann.loraine Ann Loraine
                Reporter:
                ann.loraine Ann Loraine
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated: