Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3701

Muday time course: Check first sequences in renamed fastq VS. what is in SRA

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Minor
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      To ensure that the sequences are matching what we have versus what the SRA has, I came up with a simple check. We look for the first 10 sequences of a given SRA run across all the sequence files.

      For speed, I head the fastq.gz file to get only the first 5 sequences.
      Use zcat to keep files zipped.
      Grep for the sequence of interest.

      for f in *gz; do echo $f; zcat < $f | head -n 20 | grep "^CTGGCTTTTC" ; done

      Probability dictates that we should find just 1 result.

        Attachments

          Issue Links

            Activity

            No work has yet been logged on this issue.

              People

              • Assignee:
                robofjoy Robert Reid
                Reporter:
                robofjoy Robert Reid
              • Votes:
                0 Vote for this issue
                Watchers:
                Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: