Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-4354

Add links to 10X Genomics dataset pages to the single-cell QL

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Karthik Raveendran has made a really nice, curated Quickload that IGB users can use to load and explore alignments from several single-cell RNA-Seq "demonstration" datasets provided by 10X Genomics.

      To add his Quickload to IGB, you can enter this URL as a new Quickload using the Settings > Data Sources tab as usual:

      And so the code for his Quickoad is in this repository:

      Note that a "file" tag in a Quickload repository can provide a Web address for each dataset. If provided, then IGB will display a linkout icon in the Available Data Sources menu.

      Also, 10X Genomics provides a very nice, no login required Web site for each of its datasets, including the ones Karthik included in his Quickload. To make this single-cell demonstration Quickload 1000 times more useful, let's add those URLs to the Quickload File tages so that users can very quickly and easily find out the full range of information about those datasets.

      The attribute that is needed is "url" and you can use it to specify the location for the dataset's page.

      I hope somebody can do this ASAP because I am preparing a lecture where I'm going to show my students how to browse these datasets in IGB!

      SCOPE CHANGE: Per Dr. Loraine's comment on 11/12, I will now be curating the track labels as part of this ticket, too.

        Attachments

          Activity

          Hide
          ann.loraine Ann Loraine added a comment - - edited

          The only new change requested was adding link to a dataset offered with genome assembly S_lycopersicum_Jun_2022, also called SL5. The change is made.

          To provide a kind of backup for the original repository, I forked it, and the fork is here: https://bitbucket.org/lorainelab/quickload_scrna-seq_10xgenomics/commits/branch/main

          Using the bitbucket interface, I added some bitbucket-only metadata:

          • repository description
          • link pattern for IGB Jira project tickets

          Moving to DONE!

          attn:

          Show
          ann.loraine Ann Loraine added a comment - - edited The only new change requested was adding link to a dataset offered with genome assembly S_lycopersicum_Jun_2022, also called SL5. The change is made. To provide a kind of backup for the original repository, I forked it, and the fork is here: https://bitbucket.org/lorainelab/quickload_scrna-seq_10xgenomics/commits/branch/main Using the bitbucket interface, I added some bitbucket-only metadata: repository description link pattern for IGB Jira project tickets Moving to DONE! attn: Karthik Raveendran Paige Kulzer
          Hide
          pkulzer Paige Kulzer added a comment -

          PR has been merged. Ann Loraine, ready for final review!

          Show
          pkulzer Paige Kulzer added a comment - PR has been merged. Ann Loraine , ready for final review!
          Hide
          pkulzer Paige Kulzer added a comment -

          I've incorporated that requested change into my commit and raised a PR: https://bitbucket.org/KarthikRavee91/quickload_scrna-seq_10xgenomics/pull-requests/1

          Show
          pkulzer Paige Kulzer added a comment - I've incorporated that requested change into my commit and raised a PR: https://bitbucket.org/KarthikRavee91/quickload_scrna-seq_10xgenomics/pull-requests/1
          Hide
          ann.loraine Ann Loraine added a comment -

          To review the updates, I looked at the genome versions listed as supported in the Quickload's metadata field "supportedGenomeVersionInfo" and checked each data set's alignments. I also checked track metadata by selected a track label and opening the Selection Info. Doing this let me confirm that the checkbox label, its Web page and its data file URL were consistent.

          The genome versions with data for this Quickload were:

          S_lycopersicum_Jun_2022, M_musculus_Jun_2020, H_sapiens_Feb_2009, S_lycopersicum_Sep_2019, H_sapiens_Dec_2013

          For the human genome versions, I checked alignments in the region of MEOX1, encoding a homeobox containing protein where the gene exhibits a rare exon-skipping event.
          For each dataset I loaded, most of the alignments included very large gaps with flanking regions of alignment that did not match the genomic sequence at all. Not one of the ones that I checked actually matched the genome! And yet, the quality metrics reported on each data set's page on the 10X Web site typically reported very high percentages of sequences mapping to the genome.

          I have just one additional request for Paige Kulzer:

          • S_lycopersicum_Jun_2022 - please add a linkout to the "no max intron value" checkbox - the same as the one above it. This is just because it's not super-obvious that the two datasets are part of the same Galaxy History.

          Once this last change is done, please submit a PR to Karthik's repository.

          Thank you very much for making this QL site lots easier to use!

          Show
          ann.loraine Ann Loraine added a comment - To review the updates, I looked at the genome versions listed as supported in the Quickload's metadata field "supportedGenomeVersionInfo" and checked each data set's alignments. I also checked track metadata by selected a track label and opening the Selection Info. Doing this let me confirm that the checkbox label, its Web page and its data file URL were consistent. The genome versions with data for this Quickload were: S_lycopersicum_Jun_2022, M_musculus_Jun_2020, H_sapiens_Feb_2009, S_lycopersicum_Sep_2019, H_sapiens_Dec_2013 For the human genome versions, I checked alignments in the region of MEOX1, encoding a homeobox containing protein where the gene exhibits a rare exon-skipping event. For each dataset I loaded, most of the alignments included very large gaps with flanking regions of alignment that did not match the genomic sequence at all. Not one of the ones that I checked actually matched the genome! And yet, the quality metrics reported on each data set's page on the 10X Web site typically reported very high percentages of sequences mapping to the genome. I have just one additional request for Paige Kulzer : S_lycopersicum_Jun_2022 - please add a linkout to the "no max intron value" checkbox - the same as the one above it. This is just because it's not super-obvious that the two datasets are part of the same Galaxy History. Once this last change is done, please submit a PR to Karthik's repository. Thank you very much for making this QL site lots easier to use!
          Hide
          pkulzer Paige Kulzer added a comment -

          I've updated my branch with the above suggested changes. All track labels should now be space separated, and I've fixed the title of the Mouse dataset to reflect that neurons, not splenocytes, were used.

          Ready for review!

          Show
          pkulzer Paige Kulzer added a comment - I've updated my branch with the above suggested changes. All track labels should now be space separated, and I've fixed the title of the Mouse dataset to reflect that neurons, not splenocytes, were used. Ready for review!

            People

            • Assignee:
              pkulzer Paige Kulzer
              Reporter:
              ann.loraine Ann Loraine
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: