Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-4354

Add links to 10X Genomics dataset pages to the single-cell QL

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Karthik Raveendran has made a really nice, curated Quickload that IGB users can use to load and explore alignments from several single-cell RNA-Seq "demonstration" datasets provided by 10X Genomics.

      To add his Quickload to IGB, you can enter this URL as a new Quickload using the Settings > Data Sources tab as usual:

      And so the code for his Quickoad is in this repository:

      Note that a "file" tag in a Quickload repository can provide a Web address for each dataset. If provided, then IGB will display a linkout icon in the Available Data Sources menu.

      Also, 10X Genomics provides a very nice, no login required Web site for each of its datasets, including the ones Karthik included in his Quickload. To make this single-cell demonstration Quickload 1000 times more useful, let's add those URLs to the Quickload File tages so that users can very quickly and easily find out the full range of information about those datasets.

      The attribute that is needed is "url" and you can use it to specify the location for the dataset's page.

      I hope somebody can do this ASAP because I am preparing a lecture where I'm going to show my students how to browse these datasets in IGB!

      SCOPE CHANGE: Per Dr. Loraine's comment on 11/12, I will now be curating the track labels as part of this ticket, too.

        Attachments

          Activity

          ann.loraine Ann Loraine created issue -
          ann.loraine Ann Loraine made changes -
          Field Original Value New Value
          Epic Link IGBF-1765 [ 17855 ]
          Hide
          pkulzer Paige Kulzer added a comment -

          I've added linkouts to the 10X Genomics datasets as requested. I've also modified the tooltips for these datasets such that hovering over the dataset name in IGB displays the full, searchable name of the dataset in 10X Genomics.

          Branch: https://bitbucket.org/pkulzer-lorainelab/quickload_scrna-seq_10xgenomics/branch/IGBF-4354

          Show
          pkulzer Paige Kulzer added a comment - I've added linkouts to the 10X Genomics datasets as requested. I've also modified the tooltips for these datasets such that hovering over the dataset name in IGB displays the full, searchable name of the dataset in 10X Genomics. Branch: https://bitbucket.org/pkulzer-lorainelab/quickload_scrna-seq_10xgenomics/branch/IGBF-4354
          pkulzer Paige Kulzer made changes -
          Status To-Do [ 10305 ] In Progress [ 3 ]
          pkulzer Paige Kulzer made changes -
          Assignee Paige Kulzer [ pkulzer ]
          pkulzer Paige Kulzer made changes -
          Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
          pkulzer Paige Kulzer made changes -
          Assignee Paige Kulzer [ pkulzer ] Ann Loraine [ aloraine ]
          ann.loraine Ann Loraine made changes -
          Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
          Hide
          ann.loraine Ann Loraine added a comment -
          Show
          ann.loraine Ann Loraine added a comment - Testing by adding the following URL as a new Quickload: https://bitbucket.org/pkulzer-lorainelab/quickload_scrna-seq_10xgenomics/raw/IGBF-4354
          Hide
          ann.loraine Ann Loraine added a comment -

          Scope change request for Paige Kulzer:

          Please also do the datasets that Karthik Raveendran made available for visualization in these two other genome versions:

          • H_sapiens_Feb_2009
          • M_musculus_Jun_2020
          Show
          ann.loraine Ann Loraine added a comment - Scope change request for Paige Kulzer : Please also do the datasets that Karthik Raveendran made available for visualization in these two other genome versions: H_sapiens_Feb_2009 M_musculus_Jun_2020
          ann.loraine Ann Loraine made changes -
          Status First Level Review in Progress [ 10301 ] To-Do [ 10305 ]
          ann.loraine Ann Loraine made changes -
          Assignee Ann Loraine [ aloraine ] Paige Kulzer [ pkulzer ]
          Hide
          pkulzer Paige Kulzer added a comment -

          Ann Loraine Way ahead of you! Those changes were included with the original commit.

          Show
          pkulzer Paige Kulzer added a comment - Ann Loraine Way ahead of you! Those changes were included with the original commit.
          pkulzer Paige Kulzer made changes -
          Status To-Do [ 10305 ] In Progress [ 3 ]
          pkulzer Paige Kulzer made changes -
          Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
          pkulzer Paige Kulzer made changes -
          Assignee Paige Kulzer [ pkulzer ] Ann Loraine [ aloraine ]
          Hide
          ann.loraine Ann Loraine added a comment -

          Sorry Paige Kulzer, I missed that!

          Show
          ann.loraine Ann Loraine added a comment - Sorry Paige Kulzer , I missed that!
          ann.loraine Ann Loraine made changes -
          Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
          Hide
          ann.loraine Ann Loraine added a comment - - edited

          I have a question about the dataset that appears in the Data Access Panel as "10X_MouseSplenocytes5K".

          The page that opens when I click the "linkout" icon is this one:

          https://www.10xgenomics.com/datasets/10k-Mouse-Neurons-3p-nextgem

          The page says the dataset are neurons, but the label on the checkbox implies that these data are from spleen, not brain. I think Paige Kulzer's linkout is actually the correct one because the BAM file link URL contains words like "neuron."

          Also, I see there is an issue with the way the checkbox label is crafted. This one is obviously a computer friendly way of naming a thing because it lacks spaces that in normal text indicate a separation between adjacent words.

          e.g., the label is: 10X_MouseSplenocytes5k.

          The more human-friendly text would be something like "10X Mouse Splenocytes 5K"

          This is problematic because when IGB displays the dataset name in a track label, it "knows" to treat underscore characters like spaces and can use this to wrap text. But it can't deal with concatentated words and so has to display "MouseSplenocytes5k" as one, unbroken word.

          This makes it harder to take a nice picture of this track because to make the track label readable, I have to expand the width of the track label part of the display.

          I feel it would be kinder to all if we could modify the checkbox labels slightly to include spaces between adjacent words.

          Show
          ann.loraine Ann Loraine added a comment - - edited I have a question about the dataset that appears in the Data Access Panel as "10X_MouseSplenocytes5K". The page that opens when I click the "linkout" icon is this one: https://www.10xgenomics.com/datasets/10k-Mouse-Neurons-3p-nextgem The page says the dataset are neurons, but the label on the checkbox implies that these data are from spleen, not brain. I think Paige Kulzer 's linkout is actually the correct one because the BAM file link URL contains words like "neuron." Also, I see there is an issue with the way the checkbox label is crafted. This one is obviously a computer friendly way of naming a thing because it lacks spaces that in normal text indicate a separation between adjacent words. e.g., the label is: 10X_MouseSplenocytes5k. The more human-friendly text would be something like "10X Mouse Splenocytes 5K" This is problematic because when IGB displays the dataset name in a track label, it "knows" to treat underscore characters like spaces and can use this to wrap text. But it can't deal with concatentated words and so has to display "MouseSplenocytes5k" as one, unbroken word. This makes it harder to take a nice picture of this track because to make the track label readable, I have to expand the width of the track label part of the display. I feel it would be kinder to all if we could modify the checkbox labels slightly to include spaces between adjacent words.
          pkulzer Paige Kulzer made changes -
          Assignee Ann Loraine [ aloraine ] Paige Kulzer [ pkulzer ]
          pkulzer Paige Kulzer made changes -
          Status First Level Review in Progress [ 10301 ] To-Do [ 10305 ]
          pkulzer Paige Kulzer made changes -
          Status To-Do [ 10305 ] In Progress [ 3 ]
          pkulzer Paige Kulzer made changes -
          Description [~karthik] has made a really nice, curated Quickload that IGB users can use to load and explore alignments from several single-cell RNA-Seq "demonstration" datasets provided by 10X Genomics.

          To add his Quickload to IGB, you can enter this URL as a new Quickload using the Settings > Data Sources tab as usual:

          * https://bitbucket.org/KarthikRavee91/quickload_scrna-seq_10xgenomics/raw/main/

          And so the code for his Quickoad is in this repository:

          * https://bitbucket.org/KarthikRavee91/quickload_scrna-seq_10xgenomics

          Note that a "file" tag in a Quickload repository can provide a Web address for each dataset. If provided, then IGB will display a linkout icon in the Available Data Sources menu.

          Also, 10X Genomics provides a very nice, no login required Web site for each of its datasets, including the ones Karthik included in his Quickload. To make this single-cell demonstration Quickload 1000 times more useful, let's add those URLs to the Quickload File tages so that users can very quickly and easily find out the full range of information about those datasets.

          The attribute that is needed is "url" and you can use it to specify the location for the dataset's page.

          I hope somebody can do this ASAP because I am preparing a lecture where I'm going to show my students how to browse these datasets in IGB!

          [~karthik] has made a really nice, curated Quickload that IGB users can use to load and explore alignments from several single-cell RNA-Seq "demonstration" datasets provided by 10X Genomics.

          To add his Quickload to IGB, you can enter this URL as a new Quickload using the Settings > Data Sources tab as usual:

          * https://bitbucket.org/KarthikRavee91/quickload_scrna-seq_10xgenomics/raw/main/

          And so the code for his Quickoad is in this repository:

          * https://bitbucket.org/KarthikRavee91/quickload_scrna-seq_10xgenomics

          Note that a "file" tag in a Quickload repository can provide a Web address for each dataset. If provided, then IGB will display a linkout icon in the Available Data Sources menu.

          Also, 10X Genomics provides a very nice, no login required Web site for each of its datasets, including the ones Karthik included in his Quickload. To make this single-cell demonstration Quickload 1000 times more useful, let's add those URLs to the Quickload File tages so that users can very quickly and easily find out the full range of information about those datasets.

          The attribute that is needed is "url" and you can use it to specify the location for the dataset's page.

          I hope somebody can do this ASAP because I am preparing a lecture where I'm going to show my students how to browse these datasets in IGB!

          *SCOPE CHANGE:* Per Dr. Loraine's comment on 11/12, I will now be curating the track labels as part of this ticket, too.
          Hide
          pkulzer Paige Kulzer added a comment -

          I've updated my branch with the above suggested changes. All track labels should now be space separated, and I've fixed the title of the Mouse dataset to reflect that neurons, not splenocytes, were used.

          Ready for review!

          Show
          pkulzer Paige Kulzer added a comment - I've updated my branch with the above suggested changes. All track labels should now be space separated, and I've fixed the title of the Mouse dataset to reflect that neurons, not splenocytes, were used. Ready for review!
          pkulzer Paige Kulzer made changes -
          Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
          pkulzer Paige Kulzer made changes -
          Assignee Paige Kulzer [ pkulzer ] Ann Loraine [ aloraine ]
          ann.loraine Ann Loraine made changes -
          Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
          Hide
          ann.loraine Ann Loraine added a comment -

          To review the updates, I looked at the genome versions listed as supported in the Quickload's metadata field "supportedGenomeVersionInfo" and checked each data set's alignments. I also checked track metadata by selected a track label and opening the Selection Info. Doing this let me confirm that the checkbox label, its Web page and its data file URL were consistent.

          The genome versions with data for this Quickload were:

          S_lycopersicum_Jun_2022, M_musculus_Jun_2020, H_sapiens_Feb_2009, S_lycopersicum_Sep_2019, H_sapiens_Dec_2013

          For the human genome versions, I checked alignments in the region of MEOX1, encoding a homeobox containing protein where the gene exhibits a rare exon-skipping event.
          For each dataset I loaded, most of the alignments included very large gaps with flanking regions of alignment that did not match the genomic sequence at all. Not one of the ones that I checked actually matched the genome! And yet, the quality metrics reported on each data set's page on the 10X Web site typically reported very high percentages of sequences mapping to the genome.

          I have just one additional request for Paige Kulzer:

          • S_lycopersicum_Jun_2022 - please add a linkout to the "no max intron value" checkbox - the same as the one above it. This is just because it's not super-obvious that the two datasets are part of the same Galaxy History.

          Once this last change is done, please submit a PR to Karthik's repository.

          Thank you very much for making this QL site lots easier to use!

          Show
          ann.loraine Ann Loraine added a comment - To review the updates, I looked at the genome versions listed as supported in the Quickload's metadata field "supportedGenomeVersionInfo" and checked each data set's alignments. I also checked track metadata by selected a track label and opening the Selection Info. Doing this let me confirm that the checkbox label, its Web page and its data file URL were consistent. The genome versions with data for this Quickload were: S_lycopersicum_Jun_2022, M_musculus_Jun_2020, H_sapiens_Feb_2009, S_lycopersicum_Sep_2019, H_sapiens_Dec_2013 For the human genome versions, I checked alignments in the region of MEOX1, encoding a homeobox containing protein where the gene exhibits a rare exon-skipping event. For each dataset I loaded, most of the alignments included very large gaps with flanking regions of alignment that did not match the genomic sequence at all. Not one of the ones that I checked actually matched the genome! And yet, the quality metrics reported on each data set's page on the 10X Web site typically reported very high percentages of sequences mapping to the genome. I have just one additional request for Paige Kulzer : S_lycopersicum_Jun_2022 - please add a linkout to the "no max intron value" checkbox - the same as the one above it. This is just because it's not super-obvious that the two datasets are part of the same Galaxy History. Once this last change is done, please submit a PR to Karthik's repository. Thank you very much for making this QL site lots easier to use!
          ann.loraine Ann Loraine made changes -
          Assignee Ann Loraine [ aloraine ] Paige Kulzer [ pkulzer ]
          ann.loraine Ann Loraine made changes -
          Status First Level Review in Progress [ 10301 ] Ready for Pull Request [ 10304 ]
          Hide
          pkulzer Paige Kulzer added a comment -

          I've incorporated that requested change into my commit and raised a PR: https://bitbucket.org/KarthikRavee91/quickload_scrna-seq_10xgenomics/pull-requests/1

          Show
          pkulzer Paige Kulzer added a comment - I've incorporated that requested change into my commit and raised a PR: https://bitbucket.org/KarthikRavee91/quickload_scrna-seq_10xgenomics/pull-requests/1
          pkulzer Paige Kulzer made changes -
          Status Ready for Pull Request [ 10304 ] Pull Request Submitted [ 10101 ]
          pkulzer Paige Kulzer made changes -
          Assignee Paige Kulzer [ pkulzer ] Karthik Raveendran [ karthik ]
          nfreese Nowlan Freese made changes -
          Status Pull Request Submitted [ 10101 ] Reviewing Pull Request [ 10303 ]
          nfreese Nowlan Freese made changes -
          Status Reviewing Pull Request [ 10303 ] Merged Needs Testing [ 10002 ]
          nfreese Nowlan Freese made changes -
          Assignee Karthik Raveendran [ karthik ] Paige Kulzer [ pkulzer ]
          Hide
          pkulzer Paige Kulzer added a comment -

          PR has been merged. Ann Loraine, ready for final review!

          Show
          pkulzer Paige Kulzer added a comment - PR has been merged. Ann Loraine , ready for final review!
          pkulzer Paige Kulzer made changes -
          Assignee Paige Kulzer [ pkulzer ] Ann Loraine [ aloraine ]
          Hide
          ann.loraine Ann Loraine added a comment - - edited

          The only new change requested was adding link to a dataset offered with genome assembly S_lycopersicum_Jun_2022, also called SL5. The change is made.

          To provide a kind of backup for the original repository, I forked it, and the fork is here: https://bitbucket.org/lorainelab/quickload_scrna-seq_10xgenomics/commits/branch/main

          Using the bitbucket interface, I added some bitbucket-only metadata:

          • repository description
          • link pattern for IGB Jira project tickets

          Moving to DONE!

          attn:

          Show
          ann.loraine Ann Loraine added a comment - - edited The only new change requested was adding link to a dataset offered with genome assembly S_lycopersicum_Jun_2022, also called SL5. The change is made. To provide a kind of backup for the original repository, I forked it, and the fork is here: https://bitbucket.org/lorainelab/quickload_scrna-seq_10xgenomics/commits/branch/main Using the bitbucket interface, I added some bitbucket-only metadata: repository description link pattern for IGB Jira project tickets Moving to DONE! attn: Karthik Raveendran Paige Kulzer
          ann.loraine Ann Loraine made changes -
          Status Merged Needs Testing [ 10002 ] Post-merge Testing In Progress [ 10003 ]
          ann.loraine Ann Loraine made changes -
          Resolution Done [ 10000 ]
          Status Post-merge Testing In Progress [ 10003 ] Closed [ 6 ]
          ann.loraine Ann Loraine made changes -
          Assignee Ann Loraine [ aloraine ] Paige Kulzer [ pkulzer ]

            People

            • Assignee:
              pkulzer Paige Kulzer
              Reporter:
              ann.loraine Ann Loraine
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: