Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-4299

Hunting for read depth in M6A sites of interest

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      GOAL: To look at how many reads align in 3' UTR regions of interest mentioned by Kausik.

      Rob's prevailing thought: No reads align to these areas. Prove him wrong.

      1. Get file from Kausik's email with the gene of interest
      2. Find relevant reference annotation (GFF, GTF, refflat file, bed) that contains the gene name.
      3. Within this file, identify the coordinates for the 3'UTR
      4. Get coordinates for m6A sites related to this gene.
      5. Check how many reads for these sites and for the 3' UTR in general.
      6. Repeat for each gene.
      7. Repeat for each of the 10 files (Ring, troph, Schitz, etc)

      To figure out how to get the depth of reads using samtools.
      Can it all be done in Python but we use the samtools.

        Attachments

          Activity

          Hide
          bbendick Brandon Bendickson added a comment -

          Grab random genome chunks and hunt for m6A:

          1. Typical 3' UTR length (random spot, grab 300 Nucleotides from that spot)
          2. Random piece on same chrom
          3. m6A predictions in those sections
          4. Get bam file counts, do we see anything.

          LOTS OF ITERATIONS (Like 1000)

          Show
          bbendick Brandon Bendickson added a comment - Grab random genome chunks and hunt for m6A: 1. Typical 3' UTR length (random spot, grab 300 Nucleotides from that spot) 2. Random piece on same chrom 3. m6A predictions in those sections 4. Get bam file counts, do we see anything. LOTS OF ITERATIONS (Like 1000)
          Hide
          robofjoy Robert Reid added a comment -

          I'll lay out a plan below. Feel free to make comments.

          1. Read in all of Brandon's de novo identified m6A sites.
          2. Read in all 27 genes of interest and their coordinates.
          3. Identify the 3' UTR regions for those genes. (from reference genome GFF)
          4. Identify all putative m6A sites within.
          5. Read in 2 bam files and get the read counts and nucleotide type (ACGT) at the location of interest.
          5. Record where it mutated as a new line in the corresponding bed file.

          The bed file format in their bed files:
          #chr start end gene score strand control_ratio control_total dart_ratio dart_total conversion
          Pf3D7_01_v3 36325 36326 PF3D7_0100100|CDS|C2U|mut=2|0.5882 0.1176 + 0.2000 10 0.1176 17 C2U
          Pf3D7_01_v3 36649 36650 PF3D7_0100100|CDS|C2U|mut=2|0.9167 0.1667 + 0.1818 11 0.1667 12 C2U
          Pf3D7_01_v3 36775 36776 PF3D7_0100100|CDS|C2U|mut=3|NA 0.1154 + 0 32 0.1154 26 C2U

          We don't actually know what this is:
          control_ratio control_total dart_ratio dart_total conversion

          But maybe we mimic it based on total NT numbers...... # of control C NT / # of Control U to get ratio. Repeat for 2nd expt. And then add C2U label.
          I see that it always is C2U in that last column. ( awk '

          { print $11 }

          ' plasmodiumsite_Schiz6.bed | uniq | sort )

          Show
          robofjoy Robert Reid added a comment - I'll lay out a plan below. Feel free to make comments. 1. Read in all of Brandon's de novo identified m6A sites. 2. Read in all 27 genes of interest and their coordinates. 3. Identify the 3' UTR regions for those genes. (from reference genome GFF) 4. Identify all putative m6A sites within. 5. Read in 2 bam files and get the read counts and nucleotide type (ACGT) at the location of interest. 5. Record where it mutated as a new line in the corresponding bed file. The bed file format in their bed files: #chr start end gene score strand control_ratio control_total dart_ratio dart_total conversion Pf3D7_01_v3 36325 36326 PF3D7_0100100|CDS|C2U|mut=2|0.5882 0.1176 + 0.2000 10 0.1176 17 C2U Pf3D7_01_v3 36649 36650 PF3D7_0100100|CDS|C2U|mut=2|0.9167 0.1667 + 0.1818 11 0.1667 12 C2U Pf3D7_01_v3 36775 36776 PF3D7_0100100|CDS|C2U|mut=3|NA 0.1154 + 0 32 0.1154 26 C2U We don't actually know what this is: control_ratio control_total dart_ratio dart_total conversion But maybe we mimic it based on total NT numbers...... # of control C NT / # of Control U to get ratio. Repeat for 2nd expt. And then add C2U label. I see that it always is C2U in that last column. ( awk ' { print $11 } ' plasmodiumsite_Schiz6.bed | uniq | sort )
          Hide
          robofjoy Robert Reid added a comment -

          Another table is coming soon.
          I'll lay out the steps:
          1. Identified all 3' UTR regions in original reference file.
          2. Filtered these 3'UTR regions by the 27 genes of interest.
          3. Cross comparing step #2 with Brandon's manually curated large list of m6A sites to get potential m6A sites here.
          4. For each experiment, identifying Total reads aligned and # of ACGT's (info extracted from BAM files at the locations in step 3.
          5. Fancy table of all this. easy to spot what potential m6A activity is happening across the stages/experiments/bam file.

          Will have this shortly.

          Show
          robofjoy Robert Reid added a comment - Another table is coming soon. I'll lay out the steps: 1. Identified all 3' UTR regions in original reference file. 2. Filtered these 3'UTR regions by the 27 genes of interest. 3. Cross comparing step #2 with Brandon's manually curated large list of m6A sites to get potential m6A sites here. 4. For each experiment, identifying Total reads aligned and # of ACGT's (info extracted from BAM files at the locations in step 3. 5. Fancy table of all this. easy to spot what potential m6A activity is happening across the stages/experiments/bam file. Will have this shortly.
          Hide
          bbendick Brandon Bendickson added a comment -

          Made comprehensive table for our genes of interest. Results located here: https://docs.google.com/spreadsheets/d/1dcG0J4_dW92fjcNOd7odN1ghqM1PW7scQ9ogI-kegy8/edit?usp=sharing

          Show
          bbendick Brandon Bendickson added a comment - Made comprehensive table for our genes of interest. Results located here: https://docs.google.com/spreadsheets/d/1dcG0J4_dW92fjcNOd7odN1ghqM1PW7scQ9ogI-kegy8/edit?usp=sharing
          Hide
          robofjoy Robert Reid added a comment -

          Now goal!

          Produce a comprehensive table with all the read counts.

          Based on Kausik's comment"
          "One question I have is, now that you have added the UTR reads, can we check and see if the 3’ UTR bed files from Llinas lab matching those read lengths? Then Brandon can see if those common regions have any m6A or not."

          The table:
          Each Row is an M6A site based on the bed files provided by the Llinas lab.

          The columns will be:

          1. Gene name
          2. Chromosome
          3. Coordinates
          4. Number of reads at this location overall
          5. Number of A's
          6. Number of C's
          7. Number of G's
          8. Number of T's

          We repeat this for Schiz, troph-4, etc....

          Show
          robofjoy Robert Reid added a comment - Now goal! Produce a comprehensive table with all the read counts. Based on Kausik's comment" "One question I have is, now that you have added the UTR reads, can we check and see if the 3’ UTR bed files from Llinas lab matching those read lengths? Then Brandon can see if those common regions have any m6A or not." The table: Each Row is an M6A site based on the bed files provided by the Llinas lab. The columns will be: Gene name Chromosome Coordinates Number of reads at this location overall Number of A's Number of C's Number of G's Number of T's We repeat this for Schiz, troph-4, etc....

            People

            • Assignee:
              bbendick Brandon Bendickson
              Reporter:
              robofjoy Robert Reid
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: