Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-4235

Create IGB App that filters (removes) alignments with overly-long gaps

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Use case:

      We have observed that many alignments from single cell RNA-Seq experiments look wrong. That is, the aligned sequences contain enormous gaps relative to the genomic sequence. These seemingly spurious alignments appear to support introns that are way too large and which are unlikely to be real.

      It would be helpful if a user (me ) could apply a filter that would hide such apparently spurious alignments.

      I propose creating an all-new IGB App that would allow a user to hide any alignments with gaps (e.g., introns) that are longer than a user-entered value. Also, to save effort, let's choose a default length of 10,000 bases.

      See attached for an image showing examples of overly long gaps in a alignments track.

      See this directory for example data:

      Note that the file named "possorted_genome_bam.bam" contains many alignments with extremely long gaps. Also, please note that the reference genome assembly for this data file is from the tomato Sept 2019 assembly, also called "SL4". In IGB, this genome assembly versions' name is: "S_lycopersicum_Sep_2019." It is the second most recent genome version for tomato available from our Quickload sites.

      The data are from an experiment done by Rasha from Mark Johnson's lab and the data are already submitted to the Sequence Read Archive under accession SRP538407. Also, the "run_1_S1" directory correspond to sample run 1 representing unpollinated pistil maintained at 28 degrees C, the non-stressed temperature treatment, and has SRA accession SRR30982324.

      A good example gene to focus on during development can be found by using the Advanced Search tab and entering "F3H"

      Info about this gene:

      title: Solyc02g083860.3
      id: Solyc02g083860.3.1
      description: anthocyanin-reduced (are) flavanone 3-hydroxylase (F3H)
      start: 45,124,818
      end: 45,122,807
      length: 2,011

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                sjagarap saideepthi jagarapu (Inactive)
                Reporter:
                ann.loraine Ann Loraine
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: