Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-1883

Find Daphnia genome and annotation

    Details

    • Type: New Feature
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
    • Story Points:
      1
    • Epic Link:
    • Sprint:
      Summer 2019 Sprint 10, Summer 2019 Sprint 11

      Description

      Find the Daphnia magna and Daphnia Pulex genomes and annotations so that they can be added to the IGB Quickload.

        Attachments

          Issue Links

            Activity

            nfreese Nowlan Freese created issue -
            nfreese Nowlan Freese made changes -
            Field Original Value New Value
            Status Open [ 1 ] In Progress [ 3 ]
            Show
            nfreese Nowlan Freese added a comment - - edited Daphnia pulex assembly version 1.0 (February 2011) From Ensembl: https://metazoa.ensembl.org/Daphnia_pulex/Info/Index Assembly: ftp://ftp.ensemblgenomes.org/pub/metazoa/release-44/fasta/daphnia_pulex/dna/Daphnia_pulex.V1.0.dna.toplevel.fa.gz Annotation: ftp://ftp.ensemblgenomes.org/pub/metazoa/release-44/gff3/daphnia_pulex/Daphnia_pulex.V1.0.44.gff3.gz Daphnia magna assembly version daphmag2.4 (April 2010) From Ensembl: http://metazoa.ensembl.org/Daphnia_magna/Info/Index?db=core Assembly: ftp://ftp.ensemblgenomes.org/pub/metazoa/release-44/fasta/daphnia_magna/dna/Daphnia_magna.daphmag2.4.dna.toplevel.fa.gz Annotation: ftp://ftp.ensemblgenomes.org/pub/metazoa/release-44/gff3/daphnia_magna/Daphnia_magna.daphmag2.4.44.gff3.gz
            nfreese Nowlan Freese made changes -
            Status In Progress [ 3 ] Open [ 1 ]
            nfreese Nowlan Freese made changes -
            Assignee Nowlan Freese [ nfreese ]
            Hide
            ann.loraine Ann Loraine added a comment -

            Next steps:

            • Create HELP ticket containing user's email send to Ann and link it to this one (it will still be protected)
            • Google search to see if more recent assemblies can be found.

            I found this paper that appears to be a newer, better version for Daphnia magna, but I don't know if the data can be obtained:

            Depending on above, create a new ticket for assembling the Quickload files for the two Daphnias:

            BED-detail file with:

            • Human-friendly text functional annotations in field 14
            • Gene name in field 13
            • Transcript name in field 4

            Two-bit file with genome sequence

            • Reference sequence names should match reference sequence names used in BED-detail file above

            genome.txt file with reference sequences and sizes from 2bit file above

            • Reference sequences should be listed in a user-friendly order; depends on how the sequences are named and the completeness of the genome

            Also, if the genome(s) are very much in fragments and there are tens of thousands of sequences per assembly, we need to do some filtering to screen out reference sequences that lack annotations or are extremely small. This requires some research to find out how many is too many – i.e., how long does it take the gene models to load into IGB.

            Show
            ann.loraine Ann Loraine added a comment - Next steps: Create HELP ticket containing user's email send to Ann and link it to this one (it will still be protected) Google search to see if more recent assemblies can be found. I found this paper that appears to be a newer, better version for Daphnia magna, but I don't know if the data can be obtained: The genome of the freshwater water flea Daphnia magna: A potential use for freshwater molecular ecotoxicology Link: https://www.researchgate.net/publication/330313926_The_genome_of_the_freshwater_water_flea_Daphnia_magna_A_potential_use_for_freshwater_molecular_ecotoxicology Depending on above, create a new ticket for assembling the Quickload files for the two Daphnias: BED-detail file with: Human-friendly text functional annotations in field 14 Gene name in field 13 Transcript name in field 4 Two-bit file with genome sequence Reference sequence names should match reference sequence names used in BED-detail file above genome.txt file with reference sequences and sizes from 2bit file above Reference sequences should be listed in a user-friendly order; depends on how the sequences are named and the completeness of the genome Also, if the genome(s) are very much in fragments and there are tens of thousands of sequences per assembly, we need to do some filtering to screen out reference sequences that lack annotations or are extremely small. This requires some research to find out how many is too many – i.e., how long does it take the gene models to load into IGB.
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Thanks Nowlan! I have added some new parts of the task. If you are able to take this on, I would appreciate it!

            If not, un-assign it move it to Sprint 11.

            Show
            ann.loraine Ann Loraine added a comment - - edited Thanks Nowlan! I have added some new parts of the task. If you are able to take this on, I would appreciate it! If not, un-assign it move it to Sprint 11.
            ann.loraine Ann Loraine made changes -
            Assignee Nowlan Freese [ nfreese ]
            ann.loraine Ann Loraine made changes -
            Epic Link IGBF-501 [ 15563 ]
            ann.loraine Ann Loraine made changes -
            Labels Advanced
            ann.loraine Ann Loraine made changes -
            Sprint Summer 2019 Sprint 10 [ 69 ] Summer 2019 Sprint 10, Summer 2019 Sprint 11 [ 69, 70 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            nfreese Nowlan Freese made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            nfreese Nowlan Freese made changes -
            Link This issue relates to HELP-323 [ HELP-323 ]
            Hide
            nfreese Nowlan Freese added a comment - - edited

            The paper: The genome of the freshwater water flea Daphnia magna: A potential use for freshwater molecular ecotoxicology links out to both an assembly (NCBI PRJNA490418) and annotation (GenBank SRP161660), but the annotation does not appear to have been uploaded .

            As the user mentioned the Ensembl files, I think it would be best to start with those as they have both assembly and annotation available for Daphnia magna and pulex genomes. This more recent Daphnia magna genome could be added in the future if the annotation can be found.

            Show
            nfreese Nowlan Freese added a comment - - edited The paper: The genome of the freshwater water flea Daphnia magna: A potential use for freshwater molecular ecotoxicology links out to both an assembly (NCBI PRJNA490418) and annotation (GenBank SRP161660), but the annotation does not appear to have been uploaded . As the user mentioned the Ensembl files, I think it would be best to start with those as they have both assembly and annotation available for Daphnia magna and pulex genomes. This more recent Daphnia magna genome could be added in the future if the annotation can be found.
            nfreese Nowlan Freese made changes -
            Link This issue relates to IGBF-1915 [ IGBF-1915 ]
            nfreese Nowlan Freese made changes -
            Resolution Done [ 10000 ]
            Status In Progress [ 3 ] Closed [ 6 ]
            Hide
            ann.loraine Ann Loraine added a comment -

            Suggestion:

            • Contact the authors of the newer paper and tell them we would like to feature their new genome in Integrated Genome Browser.
            • Ask them when they are planning to upload the annotations file.
            • Encourage them to do it soon! And let them know we would be grateful if they would do it in bed-detail format rather than GFF

            My guess is they are still struggling with the evil horror that is GFF

            Show
            ann.loraine Ann Loraine added a comment - Suggestion: Contact the authors of the newer paper and tell them we would like to feature their new genome in Integrated Genome Browser. Ask them when they are planning to upload the annotations file. Encourage them to do it soon! And let them know we would be grateful if they would do it in bed-detail format rather than GFF My guess is they are still struggling with the evil horror that is GFF
            Hide
            ann.loraine Ann Loraine added a comment -

            Also let them know we are collaborating with the Galaxy project and are hoping also to feature their new genome in the Galaxy project system.

            If we do that, then everybody in the world can start using Galaxy to work with their genome.

            I think that if we do that, they will be very happy because their work will get cited more?

            Show
            ann.loraine Ann Loraine added a comment - Also let them know we are collaborating with the Galaxy project and are hoping also to feature their new genome in the Galaxy project system. If we do that, then everybody in the world can start using Galaxy to work with their genome. I think that if we do that, they will be very happy because their work will get cited more?
            Hide
            ann.loraine Ann Loraine added a comment -

            Also, we could include all the versions, possibly?

            Show
            ann.loraine Ann Loraine added a comment - Also, we could include all the versions, possibly?
            Hide
            ann.loraine Ann Loraine added a comment -

            Thank you for doing this!

            This type of thing is hugely time-consuming, and it really requires someone with a PhD and tons of experience to do it. (Yourself.)

            Show
            ann.loraine Ann Loraine added a comment - Thank you for doing this! This type of thing is hugely time-consuming, and it really requires someone with a PhD and tons of experience to do it. (Yourself.)
            Hide
            nfreese Nowlan Freese added a comment -

            I emailed the contact person for the paper - Professor Jae-Seong Lee (slee2@skku.edu) to ask if he could provide the annotation file.

            His lab also has a JBrowse site set up to view the Daphnia Magna assembly and annotation. While I cannot access the data itself, JBrowse does allow downloading a gff of each linkage group (there are 10 linkage groups in addition to ~hundreds of scaffolds). This could be a way to at least get the linkage group annotations.

            Show
            nfreese Nowlan Freese added a comment - I emailed the contact person for the paper - Professor Jae-Seong Lee (slee2@skku.edu) to ask if he could provide the annotation file. His lab also has a JBrowse site set up to view the Daphnia Magna assembly and annotation. While I cannot access the data itself, JBrowse does allow downloading a gff of each linkage group (there are 10 linkage groups in addition to ~hundreds of scaffolds). This could be a way to at least get the linkage group annotations.
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Please allow your computer to mirror "Data Management for Quickload" Dropbox folder.
            Once it is mirrored on your machine, please add the new folders to the checked-out svn repository there - it is in the folder "genomes"
            Edit the contents.txt file you see there to include the new genome versions.
            Once everything is there, I will be able to commit it to the repository.

            Show
            ann.loraine Ann Loraine added a comment - - edited Please allow your computer to mirror "Data Management for Quickload" Dropbox folder. Once it is mirrored on your machine, please add the new folders to the checked-out svn repository there - it is in the folder "genomes" Edit the contents.txt file you see there to include the new genome versions. Once everything is there, I will be able to commit it to the repository.
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Regarding the species.txt file:

            • Please edit the corresponding files in the IGB code base so that when we do a new release of IGB, the new version will pick up the new genome version synonyms.
            Show
            ann.loraine Ann Loraine added a comment - - edited Regarding the species.txt file: Please edit the corresponding files in the IGB code base so that when we do a new release of IGB, the new version will pick up the new genome version synonyms.
            Hide
            ann.loraine Ann Loraine added a comment -

            Sorry, I forgot this request:

            • Please update .htaccess in Quickload as well
            Show
            ann.loraine Ann Loraine added a comment - Sorry, I forgot this request: Please update .htaccess in Quickload as well
            nfreese Nowlan Freese made changes -
            Comment [ I added the daphnia files to the shared dropbox folder: Quickload > daphnia. There are four Daphnia genomes. The contents.txt file contains the current contents.txt as of July 26, 2019 in addition to the four daphnia genomes. The species.txt includes the current species.txt as of July 26, 2019 in addition to the Daphnia magna and Daphnia pulex names. ]
            Hide
            nfreese Nowlan Freese added a comment -

            Added the four Daphnia assemblies and annotations to "Data Management for Quickload" Dropbox folder. Updated contents.txt. I modified the .htaccess.

            I did not modify the species.txt or synonyms.txt in the dropbox folder. I created the pull request IGBF-1883-AddDaphnia with modifications to the IGB codebase species.txt and synonyms.txt (core/synonym-lookup/src/main/resources/species.txt and core/synonym-lookup/src/main/resources/synonyms.txt). Tested in IGB and the species/synonyms appeared to be working correctly.

            Show
            nfreese Nowlan Freese added a comment - Added the four Daphnia assemblies and annotations to "Data Management for Quickload" Dropbox folder. Updated contents.txt. I modified the .htaccess. I did not modify the species.txt or synonyms.txt in the dropbox folder. I created the pull request IGBF-1883-AddDaphnia with modifications to the IGB codebase species.txt and synonyms.txt (core/synonym-lookup/src/main/resources/species.txt and core/synonym-lookup/src/main/resources/synonyms.txt). Tested in IGB and the species/synonyms appeared to be working correctly.
            Hide
            ann.loraine Ann Loraine added a comment -

            PR merged.

            Show
            ann.loraine Ann Loraine added a comment - PR merged.
            ann.loraine Ann Loraine made changes -
            Workflow Loraine Lab Workflow [ 18606 ] Fall 2019 Workflow Update [ 20272 ]
            ann.loraine Ann Loraine made changes -
            Workflow Fall 2019 Workflow Update [ 20272 ] Revised Fall 2019 Workflow Update [ 22419 ]

              People

              • Assignee:
                nfreese Nowlan Freese
                Reporter:
                nfreese Nowlan Freese
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: