Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-4038

Improve documentation description quickload files for painted lady genome assembly

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      We are offering a genome assembly for painted lady as part of our Quickload site - see: http://igbquickload-main.bioviz.org/quickload/V_cardui_Feb_2021/

      However, the in the HEADER file (see above link) does not have enough information for a user to locate the original files we used to create our IGB-friendly versions of the files. This can become a problem for users who are working with bioinformatics pipelines that use assembly or annotation files. They will need to know if the files we are showing in IGB are the same as the files they are working with, or not.

      Also, it would be good to link to a research article describing this assembly, to help users learn more about this assembly.

      For this task, improve the documentation so that a user who clicks the "get info" link within IGB will be able to find information needed to locate the original data files we used to create our IGB-friendly versions for the Quickload repository.

      Notes:

        Attachments

          Issue Links

            Activity

            ann.loraine Ann Loraine created issue -
            ann.loraine Ann Loraine made changes -
            Field Original Value New Value
            Epic Link IGBF-1765 [ 17855 ]
            ann.loraine Ann Loraine made changes -
            Link This issue relates to IGBF-3892 [ IGBF-3892 ]
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ]
            ann.loraine Ann Loraine made changes -
            Description We are offering a genome assembly for painted lady as part of our Quickload site - see: http://igbquickload-main.bioviz.org/quickload/V_cardui_Feb_2021/

            However, the in the HEADER file (see above link) does not have enough information for a user to locate the original files we used to create our IGB-friendly versions of the files.

            Also, it would be good to link to a research article describing this assembly.

            For this task, improve the documentation so that a user who clicks the "get info" link within IGB will be able to find information needed to locate the original data files we used to create our IGB-friendly versions for the Quickload repository.

            Notes:

            * It looks like this might be "the" painted lady genome paper: https://pmc.ncbi.nlm.nih.gov/articles/PMC10061037
            We are offering a genome assembly for painted lady as part of our Quickload site - see: http://igbquickload-main.bioviz.org/quickload/V_cardui_Feb_2021/

            However, the in the HEADER file (see above link) does not have enough information for a user to locate the original files we used to create our IGB-friendly versions of the files. This can become a problem for users who are working with bioinformatics pipelines that use assembly or annotation files. They will need to know if the files we are showing in IGB are the same as the files they are working with, or not.

            Also, it would be good to link to a research article describing this assembly, to help users learn more about this assembly.

            For this task, improve the documentation so that a user who clicks the "get info" link within IGB will be able to find information needed to locate the original data files we used to create our IGB-friendly versions for the Quickload repository.

            Notes:

            * It looks like this might be "the" painted lady genome paper: https://pmc.ncbi.nlm.nih.gov/articles/PMC10061037
            pkulzer Paige Kulzer made changes -
            Assignee Paige Kulzer [ pkulzer ]
            pkulzer Paige Kulzer made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            pkulzer Paige Kulzer made changes -
            Attachment HEADER.md [ 18616 ]
            Hide
            pkulzer Paige Kulzer added a comment -

            I've added a link to the research article describing this assembly to HEADER.md. There, direct links to the sequence files (chromosome-specific FASTAs) on ENA are provided in the Data Availability section. The NCBI Genomes resource is also linked in HEADER.md and the specific genome assembly (ilVanCard2.1) has been specified in the first line so that a user can determine which of the reference genomes we used.

            Ready for review, as well as any more suggestions for improving quickload descriptions!

            Show
            pkulzer Paige Kulzer added a comment - I've added a link to the research article describing this assembly to HEADER.md . There, direct links to the sequence files (chromosome-specific FASTAs) on ENA are provided in the Data Availability section. The NCBI Genomes resource is also linked in HEADER.md and the specific genome assembly (ilVanCard2.1) has been specified in the first line so that a user can determine which of the reference genomes we used. Ready for review, as well as any more suggestions for improving quickload descriptions!
            pkulzer Paige Kulzer made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            pkulzer Paige Kulzer made changes -
            Assignee Paige Kulzer [ pkulzer ] Ann Loraine [ aloraine ]
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Thank you for making the suggested changes!

            I have a request which I hope will not be too difficult to implement:

            • The link to the article is good, but one thing that can easily happen over time is that the provided link could become obsolete when or if the organization that hosts the paper changes the link pattern. So, instead of simply providing just the link, I recommend using the paper title as the link label, and using the PubMed "perma-URL" as the link. All articles that are published in the scientific literature should have a record in the PubMed database. The pattern of links to those articles looks like: https://pubmed.ncbi.nlm.nih.gov/[ pubmed numeric identifier ].

            I tried to test whether the provided information is sufficient to located the original data files used.

            Here's what I did:

            The two files I think might be the original data files Paige Kulzer used are:

            and

            The above GFF file is probably not the right file because it's last update was in 2022, not 2021, as mentioned in the annots.xml. I tried to find something that included the word "refgene", but I saw nothing with that term in the file name.

            Question for Paige Kulzer: Is there some way that you can make it clearer to users where the gene model file comes from? The ftp directory seems like it is showing a more recent data file than what IGB shows, because the release date for what NCBI shows is 2022, not 2021.

            Show
            ann.loraine Ann Loraine added a comment - - edited Thank you for making the suggested changes! I have a request which I hope will not be too difficult to implement: The link to the article is good, but one thing that can easily happen over time is that the provided link could become obsolete when or if the organization that hosts the paper changes the link pattern. So, instead of simply providing just the link, I recommend using the paper title as the link label, and using the PubMed "perma-URL" as the link. All articles that are published in the scientific literature should have a record in the PubMed database. The pattern of links to those articles looks like: https://pubmed.ncbi.nlm.nih.gov/[ pubmed numeric identifier ]. I tried to test whether the provided information is sufficient to located the original data files used. Here's what I did: The HEADER.md file mentions NCBI Genome, so I visited https://www.ncbi.nlm.nih.gov/home/genomes/ I entered "ilVanCard2.1" in the above page's query form. (ilVanCard2.1 is listed in the contents.txt file's second column, which shows up in IGB's title page when displaying this assembly.) The above search showed me a page listing the assembly, with a hyperlink to https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_905220355.1/ From there, I tried to locate (a) the original fasta file and (b) the original gene models file that Paige Kulzer used as inputs to create the 2bit and bed.gz files. The above link ( https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_905220355.1/ ) had another link labeled "ftp" that opened a web directory with some files. The URL of that Web directory is: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/905/220/365/GCF_905220365.1_ilVanCard2.1/README_Vanessa_cardui_annotation_release_100 The two files I think might be the original data files Paige Kulzer used are: genome file: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/905/220/365/GCF_905220365.1_ilVanCard2.1/GCF_905220365.1_ilVanCard2.1_genomic.fna.gz and gene models file: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/905/220/365/GCF_905220365.1_ilVanCard2.1/GCF_905220365.1_ilVanCard2.1_genomic.gff.gz The above GFF file is probably not the right file because it's last update was in 2022, not 2021, as mentioned in the annots.xml. I tried to find something that included the word "refgene", but I saw nothing with that term in the file name. Question for Paige Kulzer : Is there some way that you can make it clearer to users where the gene model file comes from? The ftp directory seems like it is showing a more recent data file than what IGB shows, because the release date for what NCBI shows is 2022, not 2021.
            ann.loraine Ann Loraine made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            ann.loraine Ann Loraine made changes -
            Status First Level Review in Progress [ 10301 ] To-Do [ 10305 ]
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ] Paige Kulzer [ pkulzer ]
            ann.loraine Ann Loraine made changes -
            Sprint Spring 1 [ 210 ] Spring 1, Spring 2 [ 210, 211 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            pkulzer Paige Kulzer made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            Hide
            pkulzer Paige Kulzer added a comment -

            I've changed the publication link label to the name of the paper ("The genome sequence of the painted lady, Vanessa cardui Linnaeus 1758") and I've changed the publication link to the PubMed URL (https://pubmed.ncbi.nlm.nih.gov/37008186/).

            Note: The DOI link that I used originally (https://doi.org/10.12688/wellcomeopenres.17358.1) should also serve as a "perma-URL" because DOIs never change. See this FAQ from the UIC Library for more info: https://ask.library.uic.edu/faq/345899#:~:text=A%20DOI%20is%20like%20a,DOIs%20will%20stay%20the%20same..

            The HEADER.md file mentions NCBI Genome, but the URL that Ann Loraine used above (https://www.ncbi.nlm.nih.gov/home/genomes/) is different from what is listed in the file (ncbi.nlm.nih.gov/datasets/genome/). This is what resulted in her being unable to find the correct files in her comment above.

            I'm not sure how useful it is to reference annots.xml for more details like we're currently recommending in HEADER.md. Instead, it might be clearer to users where the gene model file comes from if we include the NCBI RefSeq assembly identifier (GCF_905220365.1). Once on that correct page, the Download button is fairly obvious, and the files to download are specified in HEADER.md (GFF, fasta).

            Please review HEADER_V2.md which contains all of these changes and let me know what you think!

            Show
            pkulzer Paige Kulzer added a comment - I've changed the publication link label to the name of the paper ("The genome sequence of the painted lady, Vanessa cardui Linnaeus 1758") and I've changed the publication link to the PubMed URL ( https://pubmed.ncbi.nlm.nih.gov/37008186/ ). Note: The DOI link that I used originally ( https://doi.org/10.12688/wellcomeopenres.17358.1 ) should also serve as a "perma-URL" because DOIs never change. See this FAQ from the UIC Library for more info: https://ask.library.uic.edu/faq/345899#:~:text=A%20DOI%20is%20like%20a,DOIs%20will%20stay%20the%20same .. The HEADER.md file mentions NCBI Genome, but the URL that Ann Loraine used above ( https://www.ncbi.nlm.nih.gov/home/genomes/ ) is different from what is listed in the file (ncbi.nlm.nih.gov/datasets/genome/). This is what resulted in her being unable to find the correct files in her comment above. I'm not sure how useful it is to reference annots.xml for more details like we're currently recommending in HEADER.md. Instead, it might be clearer to users where the gene model file comes from if we include the NCBI RefSeq assembly identifier (GCF_905220365.1). Once on that correct page, the Download button is fairly obvious, and the files to download are specified in HEADER.md (GFF, fasta). Please review HEADER_V2.md which contains all of these changes and let me know what you think!
            pkulzer Paige Kulzer made changes -
            Attachment HEADER_V2.md [ 18636 ]
            pkulzer Paige Kulzer made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            pkulzer Paige Kulzer made changes -
            Assignee Paige Kulzer [ pkulzer ] Ann Loraine [ aloraine ]
            Hide
            ann.loraine Ann Loraine added a comment -

            Let's do an on-line session so that I can show you the problems I am having understanding where the data file came from.

            Show
            ann.loraine Ann Loraine added a comment - Let's do an on-line session so that I can show you the problems I am having understanding where the data file came from.
            ann.loraine Ann Loraine made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            Hide
            ann.loraine Ann Loraine added a comment -

            Adding the new file to the ticket by pasting in the code here:

            <html>
            <body>
            <h1>Vanessa cardui (Feb 2021) painted lady (ilVanCard2.1) genome assembly</h1>
            <p>
            The files listed below are formatted for visualization in the Integrated Genome
            Browser, available from <a href="https://bioviz.org">BioViz.org</a>.
            </p>
            <p>
            Annotation features (gff) files were downloaded from NCBI Genome (<a href="ncbi.nlm.nih.gov/datasets/genome/">ncbi.nlm.nih.gov/datasets/genome/</a>; NCBI RefSeq assembly GCF_905220365.1). See the <a href="annots.xml">annots.xml</a> meta-data file in this directory for details.
            </p>
            <p>
            Files with extension .gz were compressed and indexed using bgzip and
            tabix tools from <a href="https://www.htslib.org">htslib.org</a>.
            </p>
            <p>
            The file named V_cardui_Feb_2021.2bit contains sequence data. It was originally downloaded in fasta format from NCBI Genome and then converted to 2bit by running the faToTwoBit program provided by UCSC. The file <a href="genome.txt">genome.txt</a> lists sequences and their sizes. It was made from the V_cardui_Feb_2021.2bit sequence file using the twoBitInfo program.
            </p>
            <p>
            Both twoBitInfo and faToTwoBit are available from <a href="http://hgdownload.cse.ucsc.edu/admin/exe/">http://hgdownload.cse.ucsc.edu/admin/exe/</a>.
            </p>
            <p>
            More information about this genome assembly and its gene models can be found in the following publication: <a href="https://doi.org/10.12688/wellcomeopenres.17358.1">The genome sequence of the painted lady, Vanessa cardui Linnaeus 1758</a>.
            </p>
            </body>
            </html>
            
            Show
            ann.loraine Ann Loraine added a comment - Adding the new file to the ticket by pasting in the code here: <html> <body> <h1>Vanessa cardui (Feb 2021) painted lady (ilVanCard2.1) genome assembly</h1> <p> The files listed below are formatted for visualization in the Integrated Genome Browser, available from <a href= "https: //bioviz.org" >BioViz.org</a>. </p> <p> Annotation features (gff) files were downloaded from NCBI Genome (<a href= "ncbi.nlm.nih.gov/datasets/genome/" >ncbi.nlm.nih.gov/datasets/genome/</a>; NCBI RefSeq assembly GCF_905220365.1). See the <a href= "annots.xml" >annots.xml</a> meta-data file in this directory for details. </p> <p> Files with extension .gz were compressed and indexed using bgzip and tabix tools from <a href= "https: //www.htslib.org" >htslib.org</a>. </p> <p> The file named V_cardui_Feb_2021.2bit contains sequence data. It was originally downloaded in fasta format from NCBI Genome and then converted to 2bit by running the faToTwoBit program provided by UCSC. The file <a href= "genome.txt" >genome.txt</a> lists sequences and their sizes. It was made from the V_cardui_Feb_2021.2bit sequence file using the twoBitInfo program. </p> <p> Both twoBitInfo and faToTwoBit are available from <a href= "http: //hgdownload.cse.ucsc.edu/admin/exe/" >http://hgdownload.cse.ucsc.edu/admin/exe/</a>. </p> <p> More information about this genome assembly and its gene models can be found in the following publication: <a href= "https: //doi.org/10.12688/wellcomeopenres.17358.1" >The genome sequence of the painted lady, Vanessa cardui Linnaeus 1758</a>. </p> </body> </html>
            Hide
            pkulzer Paige Kulzer added a comment -

            After our discussion this morning, my task is to rewrite this Quickload description without using past HEADER.md files from UCSC Quickloads as templates because they don't adequately document the process of retrieving files from NCBI. This rewrite should include a description of annots.xml rather than a simple reference to it, an updated link to the NCBI database, instructions for downloading files from NCBI, information about the date these NCBI files were accessed to create this Quickload, and documentation regarding how we converted GFF to BED.

            Show
            pkulzer Paige Kulzer added a comment - After our discussion this morning, my task is to rewrite this Quickload description without using past HEADER.md files from UCSC Quickloads as templates because they don't adequately document the process of retrieving files from NCBI. This rewrite should include a description of annots.xml rather than a simple reference to it, an updated link to the NCBI database, instructions for downloading files from NCBI, information about the date these NCBI files were accessed to create this Quickload, and documentation regarding how we converted GFF to BED.
            ann.loraine Ann Loraine made changes -
            Sprint Spring 1, Spring 2 [ 210, 211 ] Spring 1, Spring 2, Spring 3 [ 210, 211, 212 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            Hide
            ann.loraine Ann Loraine added a comment -

            Thank you for your patience with my many questions.
            How about if I make a stab at updating the documentation?
            Then you can see if I got it right or not, and correct as needed?

            Show
            ann.loraine Ann Loraine added a comment - Thank you for your patience with my many questions. How about if I make a stab at updating the documentation? Then you can see if I got it right or not, and correct as needed?
            ann.loraine Ann Loraine made changes -
            Status First Level Review in Progress [ 10301 ] Needs 1st Level Review [ 10005 ]
            ann.loraine Ann Loraine made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            ann.loraine Ann Loraine made changes -
            Status First Level Review in Progress [ 10301 ] To-Do [ 10305 ]
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            New documentation page is deployed to IGB quickload sites and ready for testing.
            See: https://data.bioviz.org/quickload/D_dama_Nov_2023/

            Show
            ann.loraine Ann Loraine added a comment - - edited New documentation page is deployed to IGB quickload sites and ready for testing. See: https://data.bioviz.org/quickload/D_dama_Nov_2023/
            ann.loraine Ann Loraine made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            ann.loraine Ann Loraine made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            ann.loraine Ann Loraine made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            ann.loraine Ann Loraine made changes -
            Status First Level Review in Progress [ 10301 ] Ready for Pull Request [ 10304 ]
            ann.loraine Ann Loraine made changes -
            Status Ready for Pull Request [ 10304 ] Pull Request Submitted [ 10101 ]
            ann.loraine Ann Loraine made changes -
            Status Pull Request Submitted [ 10101 ] Reviewing Pull Request [ 10303 ]
            ann.loraine Ann Loraine made changes -
            Status Reviewing Pull Request [ 10303 ] Merged Needs Testing [ 10002 ]
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ]
            ann.loraine Ann Loraine made changes -
            Assignee Paige Kulzer [ pkulzer ]
            Hide
            pkulzer Paige Kulzer added a comment - - edited

            Question for Ann Loraine - Did you forget to link to the research article describing this Dama dama assembly (see https://www.sciencedirect.com/science/article/pii/S2666937423000124#da0005)?

            Show
            pkulzer Paige Kulzer added a comment - - edited Question for Ann Loraine - Did you forget to link to the research article describing this Dama dama assembly (see https://www.sciencedirect.com/science/article/pii/S2666937423000124#da0005)?
            ann.loraine Ann Loraine made changes -
            Sprint Spring 1, Spring 2, Spring 3 [ 210, 211, 212 ] Spring 1, Spring 2, Spring 3, Spring 4 [ 210, 211, 212, 213 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            Hide
            pkulzer Paige Kulzer added a comment - - edited

            I've added mention of the research article I mentioned in my previous comment. This update has been pushed to the SVN repository which is now at revision 228. This new documentation format is clear, concise, and easy to follow. Ready for a final review by Ann Loraine!

            Show
            pkulzer Paige Kulzer added a comment - - edited I've added mention of the research article I mentioned in my previous comment. This update has been pushed to the SVN repository which is now at revision 228. This new documentation format is clear, concise, and easy to follow. Ready for a final review by Ann Loraine !
            pkulzer Paige Kulzer made changes -
            Assignee Paige Kulzer [ pkulzer ] Ann Loraine [ aloraine ]
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Our quickload sites are now updated to revision 228. New documentation is deployed. I checked all the links and they all go to the expected location.

            Moving to Done.

            Show
            ann.loraine Ann Loraine added a comment - - edited Our quickload sites are now updated to revision 228. New documentation is deployed. I checked all the links and they all go to the expected location. Moving to Done.
            ann.loraine Ann Loraine made changes -
            Status Merged Needs Testing [ 10002 ] Post-merge Testing In Progress [ 10003 ]
            ann.loraine Ann Loraine made changes -
            Resolution Done [ 10000 ]
            Status Post-merge Testing In Progress [ 10003 ] Closed [ 6 ]
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ] Paige Kulzer [ pkulzer ]

              People

              • Assignee:
                pkulzer Paige Kulzer
                Reporter:
                ann.loraine Ann Loraine
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: