Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-2741

Make an App that allows filtering by "id" - part 1

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None
    • Story Points:
      3
    • Sprint:
      Winter 1 Dec 28 - Jan 8, Winter 2 Jan 11 - Jan 22, Winter 3 Jan 25 - Feb 5, Winter 4 Feb 8 - Feb 19, Winter 5 Feb 22 - Mar 5, Winter 6 Mar 8 - Mar 19, Spring 1 2021 Mar 22 - Apr 2

      Description

      Nearly every single-cell RNA-Seq pipeline aligns the "raw" sequence data onto a reference, usually a reference genome sequence. This is an essential step in the workflow. This step is done in order to generate "counts per gene per cell" spreadsheets that are then analyzed using unsupervised clustering methods and other such methods. The "counts per gene per cell" numbers are supposed to reflect the number of RNAs observed from the gene. However, there is a problem with that!

      Because the methods for creating the sequence data start with minute amounts of RNA per cell, the protocols use PCR amplification in order to produce enough material for sequencing. This means that the same fragment of RNA will often get copied many, many times. Thanks to this, the number of sequences observed per gene will be only loosely related to the number of mRNAs from that gene that were in the original sample.

      To get around this, the experimental protocols include a step that adds a "UMI" sequence tag to every read that came from the same RNA molecule. (UMI stands for "unique molecular identifier.") So instead of counting every single read, the data analysis protocols instead count the number of unique "UMIs" per gene. To keep track of UMIs in the data processing steps, the computational pipelines typically append the UMI sequence to the read name. In IGB, this "read name" is also the "id" attribute.

      In addition to copying the UMI sequence, the pipelines also often copy another string that uniquely identifies the particular cell that the read came from. This is called a "cell barcode" and is also introduced into every read as part of the experimental protocol that produces the data.

      Therefore in IGB it would be super-useful if we can create a filter that limits the reads being show to a specific string that the user enters. This would allow users to get a better sense of how often a given UMI appears in their data. It would also allow them to visualize gene expression for a single cell instead of looking at all of the data at once for every cell.

      Also, this type of thing would be useful for any type of track, not only BAM tracks.

      For this App, please implemented a new "filter by name" option that lets a user hide all items in a track that do not match the name.

      Lastly, it is not clear yet whether we can introduce new filters as Apps. So a big part of this task will involve understanding the existing filtering system for tracks to see if an App can be added to implement a new one.

        Attachments

          Issue Links

            Activity

            ann.loraine Ann Loraine created issue -
            ann.loraine Ann Loraine made changes -
            Field Original Value New Value
            Epic Link IGBF-1765 [ 17855 ]
            ann.loraine Ann Loraine made changes -
            Link This issue relates to IGBF-2712 [ IGBF-2712 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            ann.loraine Ann Loraine made changes -
            Description Nearly every single-cell RNA-Seq pipeline aligns the "raw" sequence data onto a reference, usually a reference genome sequence. This is an essential step in the workflow. This step is done in order to generate "counts per gene per cell" spreadsheets that are then analyzed using unsupervised clustering methods and other such things. The "counts per gene per cell" are supposed to reflect the number of RNAs observed from the gene. However, there is a problem with that!

            Because the methods for creating the sequence data start with minute amounts of RNA per cell, the protocols use PCR amplification in order to produce enough material for sequencing. One problem with this however is that this means that the same fragment of RNA will often get copied many, many times. Thanks to this, the number of sequences observed per gene will be only loosely related to the number of mRNAs from that gene that were in the original sample.

            To get around this, the experimental protocols include a step that adds a "UMI" sequence tag to every read that came from the same RNA molecule. So instead of counting every single read, the data analysis protocols instead count the number of unique "UMIs" per gene. To keep track of UMIs, the read alignment and processing computational pipelines typically append the UMI sequence to the read name. In IGB, this "read name" is actually the "id" attribute.

            In addition to copying the UMI sequence, the pipelines also often copy another string that uniquely identifies the particular cell that the read came from.

            Therefore in IGB it would be super-useful if we can create a filter that limits the reads being show to a specific string that the user enters.

            Also, this type of thing would be useful for any type of track, not only BAM tracks.

            For this App, please implemented a new "filter by name" option that lets a user hide all items in a track that do not match the name.
            Nearly every single-cell RNA-Seq pipeline aligns the "raw" sequence data onto a reference, usually a reference genome sequence. This is an essential step in the workflow. This step is done in order to generate "counts per gene per cell" spreadsheets that are then analyzed using unsupervised clustering methods and other such methods. The "counts per gene per cell" are supposed to reflect the number of RNAs observed from the gene. However, there is a problem with that!

            Because the methods for creating the sequence data start with minute amounts of RNA per cell, the protocols use PCR amplification in order to produce enough material for sequencing. One problem with this however is that this means that the same fragment of RNA will often get copied many, many times. Thanks to this, the number of sequences observed per gene will be only loosely related to the number of mRNAs from that gene that were in the original sample.

            To get around this, the experimental protocols include a step that adds a "UMI" sequence tag to every read that came from the same RNA molecule. So instead of counting every single read, the data analysis protocols instead count the number of unique "UMIs" per gene. To keep track of UMIs, the read alignment and processing computational pipelines typically append the UMI sequence to the read name. In IGB, this "read name" is actually the "id" attribute.

            In addition to copying the UMI sequence, the pipelines also often copy another string that uniquely identifies the particular cell that the read came from.

            Therefore in IGB it would be super-useful if we can create a filter that limits the reads being show to a specific string that the user enters.

            Also, this type of thing would be useful for any type of track, not only BAM tracks.

            For this App, please implemented a new "filter by name" option that lets a user hide all items in a track that do not match the name.
            ann.loraine Ann Loraine made changes -
            Description Nearly every single-cell RNA-Seq pipeline aligns the "raw" sequence data onto a reference, usually a reference genome sequence. This is an essential step in the workflow. This step is done in order to generate "counts per gene per cell" spreadsheets that are then analyzed using unsupervised clustering methods and other such methods. The "counts per gene per cell" are supposed to reflect the number of RNAs observed from the gene. However, there is a problem with that!

            Because the methods for creating the sequence data start with minute amounts of RNA per cell, the protocols use PCR amplification in order to produce enough material for sequencing. One problem with this however is that this means that the same fragment of RNA will often get copied many, many times. Thanks to this, the number of sequences observed per gene will be only loosely related to the number of mRNAs from that gene that were in the original sample.

            To get around this, the experimental protocols include a step that adds a "UMI" sequence tag to every read that came from the same RNA molecule. So instead of counting every single read, the data analysis protocols instead count the number of unique "UMIs" per gene. To keep track of UMIs, the read alignment and processing computational pipelines typically append the UMI sequence to the read name. In IGB, this "read name" is actually the "id" attribute.

            In addition to copying the UMI sequence, the pipelines also often copy another string that uniquely identifies the particular cell that the read came from.

            Therefore in IGB it would be super-useful if we can create a filter that limits the reads being show to a specific string that the user enters.

            Also, this type of thing would be useful for any type of track, not only BAM tracks.

            For this App, please implemented a new "filter by name" option that lets a user hide all items in a track that do not match the name.
            Nearly every single-cell RNA-Seq pipeline aligns the "raw" sequence data onto a reference, usually a reference genome sequence. This is an essential step in the workflow. This step is done in order to generate "counts per gene per cell" spreadsheets that are then analyzed using unsupervised clustering methods and other such methods. The "counts per gene per cell" numbers are supposed to reflect the number of RNAs observed from the gene. However, there is a problem with that!

            Because the methods for creating the sequence data start with minute amounts of RNA per cell, the protocols use PCR amplification in order to produce enough material for sequencing. One problem with this however is that this means that the same fragment of RNA will often get copied many, many times. Thanks to this, the number of sequences observed per gene will be only loosely related to the number of mRNAs from that gene that were in the original sample.

            To get around this, the experimental protocols include a step that adds a "UMI" sequence tag to every read that came from the same RNA molecule. So instead of counting every single read, the data analysis protocols instead count the number of unique "UMIs" per gene. To keep track of UMIs, the read alignment and processing computational pipelines typically append the UMI sequence to the read name. In IGB, this "read name" is actually the "id" attribute.

            In addition to copying the UMI sequence, the pipelines also often copy another string that uniquely identifies the particular cell that the read came from.

            Therefore in IGB it would be super-useful if we can create a filter that limits the reads being show to a specific string that the user enters.

            Also, this type of thing would be useful for any type of track, not only BAM tracks.

            For this App, please implemented a new "filter by name" option that lets a user hide all items in a track that do not match the name.
            ann.loraine Ann Loraine made changes -
            Description Nearly every single-cell RNA-Seq pipeline aligns the "raw" sequence data onto a reference, usually a reference genome sequence. This is an essential step in the workflow. This step is done in order to generate "counts per gene per cell" spreadsheets that are then analyzed using unsupervised clustering methods and other such methods. The "counts per gene per cell" numbers are supposed to reflect the number of RNAs observed from the gene. However, there is a problem with that!

            Because the methods for creating the sequence data start with minute amounts of RNA per cell, the protocols use PCR amplification in order to produce enough material for sequencing. One problem with this however is that this means that the same fragment of RNA will often get copied many, many times. Thanks to this, the number of sequences observed per gene will be only loosely related to the number of mRNAs from that gene that were in the original sample.

            To get around this, the experimental protocols include a step that adds a "UMI" sequence tag to every read that came from the same RNA molecule. So instead of counting every single read, the data analysis protocols instead count the number of unique "UMIs" per gene. To keep track of UMIs, the read alignment and processing computational pipelines typically append the UMI sequence to the read name. In IGB, this "read name" is actually the "id" attribute.

            In addition to copying the UMI sequence, the pipelines also often copy another string that uniquely identifies the particular cell that the read came from.

            Therefore in IGB it would be super-useful if we can create a filter that limits the reads being show to a specific string that the user enters.

            Also, this type of thing would be useful for any type of track, not only BAM tracks.

            For this App, please implemented a new "filter by name" option that lets a user hide all items in a track that do not match the name.
            Nearly every single-cell RNA-Seq pipeline aligns the "raw" sequence data onto a reference, usually a reference genome sequence. This is an essential step in the workflow. This step is done in order to generate "counts per gene per cell" spreadsheets that are then analyzed using unsupervised clustering methods and other such methods. The "counts per gene per cell" numbers are supposed to reflect the number of RNAs observed from the gene. However, there is a problem with that!

            Because the methods for creating the sequence data start with minute amounts of RNA per cell, the protocols use PCR amplification in order to produce enough material for sequencing. This means that the same fragment of RNA will often get copied many, many times. Thanks to this, the number of sequences observed per gene will be only loosely related to the number of mRNAs from that gene that were in the original sample.

            To get around this, the experimental protocols include a step that adds a "UMI" sequence tag to every read that came from the same RNA molecule. So instead of counting every single read, the data analysis protocols instead count the number of unique "UMIs" per gene. To keep track of UMIs, the read alignment and processing computational pipelines typically append the UMI sequence to the read name. In IGB, this "read name" is actually the "id" attribute.

            In addition to copying the UMI sequence, the pipelines also often copy another string that uniquely identifies the particular cell that the read came from.

            Therefore in IGB it would be super-useful if we can create a filter that limits the reads being show to a specific string that the user enters.

            Also, this type of thing would be useful for any type of track, not only BAM tracks.

            For this App, please implemented a new "filter by name" option that lets a user hide all items in a track that do not match the name.
            ann.loraine Ann Loraine made changes -
            Description Nearly every single-cell RNA-Seq pipeline aligns the "raw" sequence data onto a reference, usually a reference genome sequence. This is an essential step in the workflow. This step is done in order to generate "counts per gene per cell" spreadsheets that are then analyzed using unsupervised clustering methods and other such methods. The "counts per gene per cell" numbers are supposed to reflect the number of RNAs observed from the gene. However, there is a problem with that!

            Because the methods for creating the sequence data start with minute amounts of RNA per cell, the protocols use PCR amplification in order to produce enough material for sequencing. This means that the same fragment of RNA will often get copied many, many times. Thanks to this, the number of sequences observed per gene will be only loosely related to the number of mRNAs from that gene that were in the original sample.

            To get around this, the experimental protocols include a step that adds a "UMI" sequence tag to every read that came from the same RNA molecule. So instead of counting every single read, the data analysis protocols instead count the number of unique "UMIs" per gene. To keep track of UMIs, the read alignment and processing computational pipelines typically append the UMI sequence to the read name. In IGB, this "read name" is actually the "id" attribute.

            In addition to copying the UMI sequence, the pipelines also often copy another string that uniquely identifies the particular cell that the read came from.

            Therefore in IGB it would be super-useful if we can create a filter that limits the reads being show to a specific string that the user enters.

            Also, this type of thing would be useful for any type of track, not only BAM tracks.

            For this App, please implemented a new "filter by name" option that lets a user hide all items in a track that do not match the name.
            Nearly every single-cell RNA-Seq pipeline aligns the "raw" sequence data onto a reference, usually a reference genome sequence. This is an essential step in the workflow. This step is done in order to generate "counts per gene per cell" spreadsheets that are then analyzed using unsupervised clustering methods and other such methods. The "counts per gene per cell" numbers are supposed to reflect the number of RNAs observed from the gene. However, there is a problem with that!

            Because the methods for creating the sequence data start with minute amounts of RNA per cell, the protocols use PCR amplification in order to produce enough material for sequencing. This means that the same fragment of RNA will often get copied many, many times. Thanks to this, the number of sequences observed per gene will be only loosely related to the number of mRNAs from that gene that were in the original sample.

            To get around this, the experimental protocols include a step that adds a "UMI" sequence tag to every read that came from the same RNA molecule. (UMI stands for "unique molecular identifier.") So instead of counting every single read, the data analysis protocols instead count the number of unique "UMIs" per gene. To keep track of UMIs, the read alignment and processing computational pipelines typically append the UMI sequence to the read name. In IGB, this "read name" is actually the "id" attribute.

            In addition to copying the UMI sequence, the pipelines also often copy another string that uniquely identifies the particular cell that the read came from.

            Therefore in IGB it would be super-useful if we can create a filter that limits the reads being show to a specific string that the user enters.

            Also, this type of thing would be useful for any type of track, not only BAM tracks.

            For this App, please implemented a new "filter by name" option that lets a user hide all items in a track that do not match the name.
            ann.loraine Ann Loraine made changes -
            Description Nearly every single-cell RNA-Seq pipeline aligns the "raw" sequence data onto a reference, usually a reference genome sequence. This is an essential step in the workflow. This step is done in order to generate "counts per gene per cell" spreadsheets that are then analyzed using unsupervised clustering methods and other such methods. The "counts per gene per cell" numbers are supposed to reflect the number of RNAs observed from the gene. However, there is a problem with that!

            Because the methods for creating the sequence data start with minute amounts of RNA per cell, the protocols use PCR amplification in order to produce enough material for sequencing. This means that the same fragment of RNA will often get copied many, many times. Thanks to this, the number of sequences observed per gene will be only loosely related to the number of mRNAs from that gene that were in the original sample.

            To get around this, the experimental protocols include a step that adds a "UMI" sequence tag to every read that came from the same RNA molecule. (UMI stands for "unique molecular identifier.") So instead of counting every single read, the data analysis protocols instead count the number of unique "UMIs" per gene. To keep track of UMIs, the read alignment and processing computational pipelines typically append the UMI sequence to the read name. In IGB, this "read name" is actually the "id" attribute.

            In addition to copying the UMI sequence, the pipelines also often copy another string that uniquely identifies the particular cell that the read came from.

            Therefore in IGB it would be super-useful if we can create a filter that limits the reads being show to a specific string that the user enters.

            Also, this type of thing would be useful for any type of track, not only BAM tracks.

            For this App, please implemented a new "filter by name" option that lets a user hide all items in a track that do not match the name.
            Nearly every single-cell RNA-Seq pipeline aligns the "raw" sequence data onto a reference, usually a reference genome sequence. This is an essential step in the workflow. This step is done in order to generate "counts per gene per cell" spreadsheets that are then analyzed using unsupervised clustering methods and other such methods. The "counts per gene per cell" numbers are supposed to reflect the number of RNAs observed from the gene. However, there is a problem with that!

            Because the methods for creating the sequence data start with minute amounts of RNA per cell, the protocols use PCR amplification in order to produce enough material for sequencing. This means that the same fragment of RNA will often get copied many, many times. Thanks to this, the number of sequences observed per gene will be only loosely related to the number of mRNAs from that gene that were in the original sample.

            To get around this, the experimental protocols include a step that adds a "UMI" sequence tag to every read that came from the same RNA molecule. (UMI stands for "unique molecular identifier.") So instead of counting every single read, the data analysis protocols instead count the number of unique "UMIs" per gene. To keep track of UMIs in the data processing steps, the computational pipelines typically append the UMI sequence to the read name. In IGB, this "read name" is also the "id" attribute.

            In addition to copying the UMI sequence, the pipelines also often copy another string that uniquely identifies the particular cell that the read came from.

            Therefore in IGB it would be super-useful if we can create a filter that limits the reads being show to a specific string that the user enters.

            Also, this type of thing would be useful for any type of track, not only BAM tracks.

            For this App, please implemented a new "filter by name" option that lets a user hide all items in a track that do not match the name.
            ann.loraine Ann Loraine made changes -
            Description Nearly every single-cell RNA-Seq pipeline aligns the "raw" sequence data onto a reference, usually a reference genome sequence. This is an essential step in the workflow. This step is done in order to generate "counts per gene per cell" spreadsheets that are then analyzed using unsupervised clustering methods and other such methods. The "counts per gene per cell" numbers are supposed to reflect the number of RNAs observed from the gene. However, there is a problem with that!

            Because the methods for creating the sequence data start with minute amounts of RNA per cell, the protocols use PCR amplification in order to produce enough material for sequencing. This means that the same fragment of RNA will often get copied many, many times. Thanks to this, the number of sequences observed per gene will be only loosely related to the number of mRNAs from that gene that were in the original sample.

            To get around this, the experimental protocols include a step that adds a "UMI" sequence tag to every read that came from the same RNA molecule. (UMI stands for "unique molecular identifier.") So instead of counting every single read, the data analysis protocols instead count the number of unique "UMIs" per gene. To keep track of UMIs in the data processing steps, the computational pipelines typically append the UMI sequence to the read name. In IGB, this "read name" is also the "id" attribute.

            In addition to copying the UMI sequence, the pipelines also often copy another string that uniquely identifies the particular cell that the read came from.

            Therefore in IGB it would be super-useful if we can create a filter that limits the reads being show to a specific string that the user enters.

            Also, this type of thing would be useful for any type of track, not only BAM tracks.

            For this App, please implemented a new "filter by name" option that lets a user hide all items in a track that do not match the name.
            Nearly every single-cell RNA-Seq pipeline aligns the "raw" sequence data onto a reference, usually a reference genome sequence. This is an essential step in the workflow. This step is done in order to generate "counts per gene per cell" spreadsheets that are then analyzed using unsupervised clustering methods and other such methods. The "counts per gene per cell" numbers are supposed to reflect the number of RNAs observed from the gene. However, there is a problem with that!

            Because the methods for creating the sequence data start with minute amounts of RNA per cell, the protocols use PCR amplification in order to produce enough material for sequencing. This means that the same fragment of RNA will often get copied many, many times. Thanks to this, the number of sequences observed per gene will be only loosely related to the number of mRNAs from that gene that were in the original sample.

            To get around this, the experimental protocols include a step that adds a "UMI" sequence tag to every read that came from the same RNA molecule. (UMI stands for "unique molecular identifier.") So instead of counting every single read, the data analysis protocols instead count the number of unique "UMIs" per gene. To keep track of UMIs in the data processing steps, the computational pipelines typically append the UMI sequence to the read name. In IGB, this "read name" is also the "id" attribute.

            In addition to copying the UMI sequence, the pipelines also often copy another string that uniquely identifies the particular cell that the read came from. This is called a "cell barcode" and is also introduced into every read as part of the experimental protocol that produces the data.

            Therefore in IGB it would be super-useful if we can create a filter that limits the reads being show to a specific string that the user enters.

            Also, this type of thing would be useful for any type of track, not only BAM tracks.

            For this App, please implemented a new "filter by name" option that lets a user hide all items in a track that do not match the name.
            ann.loraine Ann Loraine made changes -
            Description Nearly every single-cell RNA-Seq pipeline aligns the "raw" sequence data onto a reference, usually a reference genome sequence. This is an essential step in the workflow. This step is done in order to generate "counts per gene per cell" spreadsheets that are then analyzed using unsupervised clustering methods and other such methods. The "counts per gene per cell" numbers are supposed to reflect the number of RNAs observed from the gene. However, there is a problem with that!

            Because the methods for creating the sequence data start with minute amounts of RNA per cell, the protocols use PCR amplification in order to produce enough material for sequencing. This means that the same fragment of RNA will often get copied many, many times. Thanks to this, the number of sequences observed per gene will be only loosely related to the number of mRNAs from that gene that were in the original sample.

            To get around this, the experimental protocols include a step that adds a "UMI" sequence tag to every read that came from the same RNA molecule. (UMI stands for "unique molecular identifier.") So instead of counting every single read, the data analysis protocols instead count the number of unique "UMIs" per gene. To keep track of UMIs in the data processing steps, the computational pipelines typically append the UMI sequence to the read name. In IGB, this "read name" is also the "id" attribute.

            In addition to copying the UMI sequence, the pipelines also often copy another string that uniquely identifies the particular cell that the read came from. This is called a "cell barcode" and is also introduced into every read as part of the experimental protocol that produces the data.

            Therefore in IGB it would be super-useful if we can create a filter that limits the reads being show to a specific string that the user enters.

            Also, this type of thing would be useful for any type of track, not only BAM tracks.

            For this App, please implemented a new "filter by name" option that lets a user hide all items in a track that do not match the name.
            Nearly every single-cell RNA-Seq pipeline aligns the "raw" sequence data onto a reference, usually a reference genome sequence. This is an essential step in the workflow. This step is done in order to generate "counts per gene per cell" spreadsheets that are then analyzed using unsupervised clustering methods and other such methods. The "counts per gene per cell" numbers are supposed to reflect the number of RNAs observed from the gene. However, there is a problem with that!

            Because the methods for creating the sequence data start with minute amounts of RNA per cell, the protocols use PCR amplification in order to produce enough material for sequencing. This means that the same fragment of RNA will often get copied many, many times. Thanks to this, the number of sequences observed per gene will be only loosely related to the number of mRNAs from that gene that were in the original sample.

            To get around this, the experimental protocols include a step that adds a "UMI" sequence tag to every read that came from the same RNA molecule. (UMI stands for "unique molecular identifier.") So instead of counting every single read, the data analysis protocols instead count the number of unique "UMIs" per gene. To keep track of UMIs in the data processing steps, the computational pipelines typically append the UMI sequence to the read name. In IGB, this "read name" is also the "id" attribute.

            In addition to copying the UMI sequence, the pipelines also often copy another string that uniquely identifies the particular cell that the read came from. This is called a "cell barcode" and is also introduced into every read as part of the experimental protocol that produces the data.

            Therefore in IGB it would be super-useful if we can create a filter that limits the reads being show to a specific string that the user enters. This would allow users to get a better sense of how often a given UMI appears in their data. It would also allow them to visualize gene expression for a single cell instead of looking at all of the data at once for every cell.

            Also, this type of thing would be useful for any type of track, not only BAM tracks.

            For this App, please implemented a new "filter by name" option that lets a user hide all items in a track that do not match the name.
            ann.loraine Ann Loraine made changes -
            Description Nearly every single-cell RNA-Seq pipeline aligns the "raw" sequence data onto a reference, usually a reference genome sequence. This is an essential step in the workflow. This step is done in order to generate "counts per gene per cell" spreadsheets that are then analyzed using unsupervised clustering methods and other such methods. The "counts per gene per cell" numbers are supposed to reflect the number of RNAs observed from the gene. However, there is a problem with that!

            Because the methods for creating the sequence data start with minute amounts of RNA per cell, the protocols use PCR amplification in order to produce enough material for sequencing. This means that the same fragment of RNA will often get copied many, many times. Thanks to this, the number of sequences observed per gene will be only loosely related to the number of mRNAs from that gene that were in the original sample.

            To get around this, the experimental protocols include a step that adds a "UMI" sequence tag to every read that came from the same RNA molecule. (UMI stands for "unique molecular identifier.") So instead of counting every single read, the data analysis protocols instead count the number of unique "UMIs" per gene. To keep track of UMIs in the data processing steps, the computational pipelines typically append the UMI sequence to the read name. In IGB, this "read name" is also the "id" attribute.

            In addition to copying the UMI sequence, the pipelines also often copy another string that uniquely identifies the particular cell that the read came from. This is called a "cell barcode" and is also introduced into every read as part of the experimental protocol that produces the data.

            Therefore in IGB it would be super-useful if we can create a filter that limits the reads being show to a specific string that the user enters. This would allow users to get a better sense of how often a given UMI appears in their data. It would also allow them to visualize gene expression for a single cell instead of looking at all of the data at once for every cell.

            Also, this type of thing would be useful for any type of track, not only BAM tracks.

            For this App, please implemented a new "filter by name" option that lets a user hide all items in a track that do not match the name.
            Nearly every single-cell RNA-Seq pipeline aligns the "raw" sequence data onto a reference, usually a reference genome sequence. This is an essential step in the workflow. This step is done in order to generate "counts per gene per cell" spreadsheets that are then analyzed using unsupervised clustering methods and other such methods. The "counts per gene per cell" numbers are supposed to reflect the number of RNAs observed from the gene. However, there is a problem with that!

            Because the methods for creating the sequence data start with minute amounts of RNA per cell, the protocols use PCR amplification in order to produce enough material for sequencing. This means that the same fragment of RNA will often get copied many, many times. Thanks to this, the number of sequences observed per gene will be only loosely related to the number of mRNAs from that gene that were in the original sample.

            To get around this, the experimental protocols include a step that adds a "UMI" sequence tag to every read that came from the same RNA molecule. (UMI stands for "unique molecular identifier.") So instead of counting every single read, the data analysis protocols instead count the number of unique "UMIs" per gene. To keep track of UMIs in the data processing steps, the computational pipelines typically append the UMI sequence to the read name. In IGB, this "read name" is also the "id" attribute.

            In addition to copying the UMI sequence, the pipelines also often copy another string that uniquely identifies the particular cell that the read came from. This is called a "cell barcode" and is also introduced into every read as part of the experimental protocol that produces the data.

            Therefore in IGB it would be super-useful if we can create a filter that limits the reads being show to a specific string that the user enters. This would allow users to get a better sense of how often a given UMI appears in their data. It would also allow them to visualize gene expression for a single cell instead of looking at all of the data at once for every cell.

            Also, this type of thing would be useful for any type of track, not only BAM tracks.

            For this App, please implemented a new "filter by name" option that lets a user hide all items in a track that do not match the name.

            Lastly, it is not clear yet whether we can introduce new filters as Apps. So a big part of this task will involve understanding the existing filtering system for tracks to see if an App can be added to implement a new one.

            inaylor Irvin Naylor (Inactive) made changes -
            Assignee Irvin Naylor [ inaylor ]
            inaylor Irvin Naylor (Inactive) made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            Hide
            ann.loraine Ann Loraine added a comment -

            Assigning to Noor Zahara for consultation.

            Show
            ann.loraine Ann Loraine added a comment - Assigning to Noor Zahara for consultation.
            ann.loraine Ann Loraine made changes -
            Assignee Irvin Naylor [ inaylor ] Noor Zahara [ noor91zahara ]
            Hide
            noor91zahara Noor Zahara (Inactive) added a comment -

            As per my observation, we can implement the Filter Apps similar to the Soft Clips App.

            Show
            noor91zahara Noor Zahara (Inactive) added a comment - As per my observation, we can implement the Filter Apps similar to the Soft Clips App.
            Hide
            ann.loraine Ann Loraine added a comment -

            Thank you. Could you create a quick proof-of-concept App showing how the app "plugs in" to IGB?

            Show
            ann.loraine Ann Loraine added a comment - Thank you. Could you create a quick proof-of-concept App showing how the app "plugs in" to IGB?
            Hide
            noor91zahara Noor Zahara (Inactive) added a comment -

            When I eliminated the inheritance and implemented the interface, I was able to install the app without an Activator class.

            Show
            noor91zahara Noor Zahara (Inactive) added a comment - When I eliminated the inheritance and implemented the interface, I was able to install the app without an Activator class.
            Hide
            ann.loraine Ann Loraine added a comment -

            Thanks for the update. Could you add a link to the repository here?

            Show
            ann.loraine Ann Loraine added a comment - Thanks for the update. Could you add a link to the repository here?
            Show
            noor91zahara Noor Zahara (Inactive) added a comment - Code Diff - https://bitbucket.org/noorzahara/score-filter/commits/e39088bcc6afbf50a2e9ce873ca367ae5cce0ad6
            ann.loraine Ann Loraine made changes -
            Summary Make an App that allows filtering by "id" Make an App that allows filtering by "id" - part 1
            ann.loraine Ann Loraine made changes -
            Sprint Winter 1 Dec 28 - Jan 8 [ 111 ] Winter 1 Dec 28 - Jan 8, Winter 2 Jan 11 - Jan 22 [ 111, 112 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            inaylor Irvin Naylor (Inactive) made changes -
            Assignee Noor Zahara [ noor91zahara ] Irvin Naylor [ inaylor ]
            Hide
            inaylor Irvin Naylor (Inactive) added a comment -

            "The filter should work like this: Users will enter a string, and any id field that contains the string will be matched and shown to the user. If the id field contains additional text, that's OK. Here is a hypothetical example: let's say the user wants to only see every item in a track with an id containing the string "cat". They would enter "cat" in the interface, and then items with ids such as "My cat is named Mr. Two" and "Everybody loves cats!" and "A cat is sleeping in my chair" would be shown."

            Show
            inaylor Irvin Naylor (Inactive) added a comment - "The filter should work like this: Users will enter a string, and any id field that contains the string will be matched and shown to the user. If the id field contains additional text, that's OK. Here is a hypothetical example: let's say the user wants to only see every item in a track with an id containing the string "cat". They would enter "cat" in the interface, and then items with ids such as "My cat is named Mr. Two" and "Everybody loves cats!" and "A cat is sleeping in my chair" would be shown."
            Show
            cdias1 Chester Dias (Inactive) added a comment - Irvin Naylor Please approve : https://bitbucket.org/Inaylor01/id-filter-app/pull-requests/1/igbf-2741-added-logic-for-id-filter
            inaylor Irvin Naylor (Inactive) made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            Hide
            inaylor Irvin Naylor (Inactive) added a comment - - edited

            Here is our Repo link: https://bitbucket.org/Inaylor01/id-filter-app/src/master/

            [~aloraine] Chester and I have finished the basic implementation of matching IDs and relaying the results back to the users and we were discussing potential expansions of the feature such as allowing the user to enter partial IDs if they don't know the exact number and the filter will display the corresponding numbers. We also discussed the use of wildcard characters (Show me IDs with 1234-xxxx for example) and we were curious about your take on where we should go next.

            Show
            inaylor Irvin Naylor (Inactive) added a comment - - edited Here is our Repo link: https://bitbucket.org/Inaylor01/id-filter-app/src/master/ [~aloraine] Chester and I have finished the basic implementation of matching IDs and relaying the results back to the users and we were discussing potential expansions of the feature such as allowing the user to enter partial IDs if they don't know the exact number and the filter will display the corresponding numbers. We also discussed the use of wildcard characters (Show me IDs with 1234-xxxx for example) and we were curious about your take on where we should go next.
            inaylor Irvin Naylor (Inactive) made changes -
            Assignee Irvin Naylor [ inaylor ]
            ann.loraine Ann Loraine made changes -
            Sprint Winter 1 Dec 28 - Jan 8, Winter 2 Jan 11 - Jan 22 [ 111, 112 ] Winter 1 Dec 28 - Jan 8, Winter 2 Jan 11 - Jan 22, Winter 3 Jan 25 - Feb 5 [ 111, 112, 113 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Irvin Naylor:

            Please provide a link to the repository downloads directory that we can use to try out the new app.

            Show
            ann.loraine Ann Loraine added a comment - - edited Irvin Naylor : Please provide a link to the repository downloads directory that we can use to try out the new app.
            ann.loraine Ann Loraine made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            ann.loraine Ann Loraine made changes -
            Assignee Irvin Naylor [ inaylor ]
            ann.loraine Ann Loraine made changes -
            Status First Level Review in Progress [ 10301 ] To-Do [ 10305 ]
            Hide
            inaylor Irvin Naylor (Inactive) added a comment -
            Show
            inaylor Irvin Naylor (Inactive) added a comment - [~aloraine] Here is the directory: https://bitbucket.org/Inaylor01/id-filter-app/downloads/
            inaylor Irvin Naylor (Inactive) made changes -
            Assignee Irvin Naylor [ inaylor ]
            Hide
            ann.loraine Ann Loraine added a comment -

            Is this now ready for review again? If yes, please move it forward. If you leave something in "To-Do", it means that something is still needing to be done on it and I will not look at it.

            Show
            ann.loraine Ann Loraine added a comment - Is this now ready for review again? If yes, please move it forward. If you leave something in "To-Do", it means that something is still needing to be done on it and I will not look at it.
            ann.loraine Ann Loraine made changes -
            Assignee Irvin Naylor [ inaylor ]
            inaylor Irvin Naylor (Inactive) made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            inaylor Irvin Naylor (Inactive) made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            inaylor Irvin Naylor (Inactive) made changes -
            Assignee Irvin Naylor [ inaylor ]
            ann.loraine Ann Loraine made changes -
            Sprint Winter 1 Dec 28 - Jan 8, Winter 2 Jan 11 - Jan 22, Winter 3 Jan 25 - Feb 5 [ 111, 112, 113 ] Winter 1 Dec 28 - Jan 8, Winter 2 Jan 11 - Jan 22, Winter 3 Jan 25 - Feb 5, Winter 4 Feb 8 - Feb 19 [ 111, 112, 113, 114 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            ann.loraine Ann Loraine made changes -
            Sprint Winter 1 Dec 28 - Jan 8, Winter 2 Jan 11 - Jan 22, Winter 3 Jan 25 - Feb 5, Winter 4 Feb 8 - Feb 19 [ 111, 112, 113, 114 ] Winter 1 Dec 28 - Jan 8, Winter 2 Jan 11 - Jan 22, Winter 3 Jan 25 - Feb 5, Winter 4 Feb 8 - Feb 19, Winter 5 Feb 22 - Mar 5 [ 111, 112, 113, 114, 115 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            inaylor Irvin Naylor (Inactive) made changes -
            Assignee Irvin Naylor [ inaylor ]
            ann.loraine Ann Loraine made changes -
            Sprint Winter 1 Dec 28 - Jan 8, Winter 2 Jan 11 - Jan 22, Winter 3 Jan 25 - Feb 5, Winter 4 Feb 8 - Feb 19, Winter 5 Feb 22 - Mar 5 [ 111, 112, 113, 114, 115 ] Winter 1 Dec 28 - Jan 8, Winter 2 Jan 11 - Jan 22, Winter 3 Jan 25 - Feb 5, Winter 4 Feb 8 - Feb 19, Winter 5 Feb 22 - Mar 5, Winter 6 Mar 8 - Mar 19 [ 111, 112, 113, 114, 115, 116 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            Hide
            ann.loraine Ann Loraine added a comment -

            The code looks good. Also, it appears that the filter will allow anything that contains the search string to pass, which is the desired behavior.
            For the next steps, someone should test the functionality in their IGB and confirm that the app functions as expected.
            Also please confirm that the directions on how to use the App and what to expect are sufficiently clear.

            Show
            ann.loraine Ann Loraine added a comment - The code looks good. Also, it appears that the filter will allow anything that contains the search string to pass, which is the desired behavior. For the next steps, someone should test the functionality in their IGB and confirm that the app functions as expected. Also please confirm that the directions on how to use the App and what to expect are sufficiently clear.
            ann.loraine Ann Loraine made changes -
            Assignee Irvin Naylor [ inaylor ]
            rweidenh Logan Weidenhammer (Inactive) made changes -
            Assignee Rachel Weidenhammer [ rweidenh ]
            rweidenh Logan Weidenhammer (Inactive) made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            Hide
            rweidenh Logan Weidenhammer (Inactive) added a comment -

            Functional Review:
            I right clicked an alignment track from a loaded BAM file in the Human genome > "Filter..." > "Add" > "Show Only: ID" was already selected.
            In the ID section I tried filtering with characters from the beginning, middle, end, numbers, letters, underscores, decimals, partial ID's, and whole ID's.

            All filtered as expected. I would recommend for pull request as far as functionality working properly.

            Irvin Naylor
            One comment I would like to make is that the description/instructions for how to run the app in the "IGB App Manager" window are not as visually clean and easy to read as all the other apps. On the ID-filter app the instructions are chunked together as one paragraph, whereas the other apps present this information with one instruction step per line.

            The description also talks about the user in the third person, as if they aren't there. It feels weird to me as a user to feel the app talk about me instead of to me.

            Show
            rweidenh Logan Weidenhammer (Inactive) added a comment - Functional Review: I right clicked an alignment track from a loaded BAM file in the Human genome > "Filter..." > "Add" > "Show Only: ID" was already selected. In the ID section I tried filtering with characters from the beginning, middle, end, numbers, letters, underscores, decimals, partial ID's, and whole ID's. All filtered as expected. I would recommend for pull request as far as functionality working properly. Irvin Naylor One comment I would like to make is that the description/instructions for how to run the app in the "IGB App Manager" window are not as visually clean and easy to read as all the other apps. On the ID-filter app the instructions are chunked together as one paragraph, whereas the other apps present this information with one instruction step per line. The description also talks about the user in the third person, as if they aren't there. It feels weird to me as a user to feel the app talk about me instead of to me.
            rweidenh Logan Weidenhammer (Inactive) made changes -
            Status First Level Review in Progress [ 10301 ] To-Do [ 10305 ]
            rweidenh Logan Weidenhammer (Inactive) made changes -
            Assignee Rachel Weidenhammer [ rweidenh ] Irvin Naylor [ inaylor ]
            Hide
            ann.loraine Ann Loraine added a comment -

            Replying to comment from Irvin Naylor:

            • Kindly improve the description/instructions for how to run the app in the "IGB App Manager" window as you have noted above.

            Regarding voice: Yes, when we write documentation for users to read, we address them directly, as you have noted. This is following advice from the Microsoft Manual of Style, which we follow.

            Show
            ann.loraine Ann Loraine added a comment - Replying to comment from Irvin Naylor : Kindly improve the description/instructions for how to run the app in the "IGB App Manager" window as you have noted above. Regarding voice: Yes, when we write documentation for users to read, we address them directly, as you have noted. This is following advice from the Microsoft Manual of Style, which we follow.
            ann.loraine Ann Loraine made changes -
            Link This issue relates to IGBF-2817 [ IGBF-2817 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            ann.loraine Ann Loraine made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            ann.loraine Ann Loraine made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            ann.loraine Ann Loraine made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            ann.loraine Ann Loraine made changes -
            Status First Level Review in Progress [ 10301 ] Ready for Pull Request [ 10304 ]
            Hide
            inaylor Irvin Naylor (Inactive) added a comment -

            [~aloraine] - IGBF-2817 has been completed and is ready for review, however, I did have a question: For apps and repos that don't directly connect to the main IGB repo, how do we go about doing the pull request for this issue since there isn't anything for the app to merge into?

            Show
            inaylor Irvin Naylor (Inactive) added a comment - [~aloraine] - IGBF-2817 has been completed and is ready for review, however, I did have a question: For apps and repos that don't directly connect to the main IGB repo, how do we go about doing the pull request for this issue since there isn't anything for the app to merge into?
            inaylor Irvin Naylor (Inactive) made changes -
            Status Ready for Pull Request [ 10304 ] Pull Request Submitted [ 10101 ]
            inaylor Irvin Naylor (Inactive) made changes -
            Status Pull Request Submitted [ 10101 ] Reviewing Pull Request [ 10303 ]
            inaylor Irvin Naylor (Inactive) made changes -
            Status Reviewing Pull Request [ 10303 ] Merged Needs Testing [ 10002 ]
            inaylor Irvin Naylor (Inactive) made changes -
            Status Merged Needs Testing [ 10002 ] Post-merge Testing In Progress [ 10003 ]
            inaylor Irvin Naylor (Inactive) made changes -
            Resolution Done [ 10000 ]
            Status Post-merge Testing In Progress [ 10003 ] Closed [ 6 ]
            inaylor Irvin Naylor (Inactive) made changes -
            Sprint Winter 1 Dec 28 - Jan 8, Winter 2 Jan 11 - Jan 22, Winter 3 Jan 25 - Feb 5, Winter 4 Feb 8 - Feb 19, Winter 5 Feb 22 - Mar 5, Winter 6 Mar 8 - Mar 19 [ 111, 112, 113, 114, 115, 116 ] Winter 1 Dec 28 - Jan 8, Winter 2 Jan 11 - Jan 22, Winter 3 Jan 25 - Feb 5, Winter 4 Feb 8 - Feb 19, Winter 5 Feb 22 - Mar 5, Winter 6 Mar 8 - Mar 19, Spring 1 Mar 22 - Apr 2 [ 111, 112, 113, 114, 115, 116, 117 ]

              People

              • Assignee:
                inaylor Irvin Naylor (Inactive)
                Reporter:
                ann.loraine Ann Loraine
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: