Details
-
Type:
Task
-
Status: Closed (View Workflow)
-
Priority:
Major
-
Resolution: Done
-
Affects Version/s: None
-
Fix Version/s: None
-
Labels:None
-
Story Points:2
-
Epic Link:
-
Sprint:Fall 4, Fall 5
Description
GOAL: After successfully removing UNIVEC adapters, the TSA have come up with a new list of sequences they fear might be contaminant. So we will now write a python script to remove these bits.
LOcation on cluster where this data is:
/projects/tomato_genome/fnb/dataprocessing/TSA-transcriptomeShotgunAssembly/kelsieData
Files to use:
rw-rr- 1 rreid2 tomato_genome 80M Oct 14 12:38 Heinz_new_ID_clean.fna
rw-rr- 1 rreid2 tomato_genome 146M Oct 14 12:38 Nagcarlang_new_ID_clean.fna
rw-rr- 1 rreid2 tomato_genome 69M Oct 14 12:38 Malintka_new_ID_clean.fna
rw-rr- 1 rreid2 tomato_genome 52M Oct 14 12:38 Tamaulipas_new_ID_clean.fna
rw-rr- 1 rreid2 tomato_genome 1.4M Oct 15 09:30 contaminant2.txt
FLOW:
#Read in contaminant2.txt as a dictionary, the header as the key, the region to be removed as the value.
#Iterate through the fasta file checking to see if each header is in the dict.
#If NO, write the sequence out to a new file.
#If YES, chop away the the region.
-
- If region is in middle of sequence, make 2 new sequences.
- ## if within 50BP of the beginning or the end, truncate the sequence.
- Write new sequence(s) to file, renaming the header if making 2 sequences ( A and B ).
Attachments
Issue Links
- blocks
-
IGBF-3928 Shotgun assembly submission to the TSA
-
- Closed
-
Activity
| Field | Original Value | New Value |
|---|---|---|
| Epic Link | IGBF-2993 [ 21429 ] |
| Status | To-Do [ 10305 ] | In Progress [ 3 ] |
| Status | In Progress [ 3 ] | Needs 1st Level Review [ 10005 ] |
| Assignee | Brandon Bendickson [ bbendick ] | Robert Reid [ robertreid ] |
| Status | Needs 1st Level Review [ 10005 ] | First Level Review in Progress [ 10301 ] |
| Status | First Level Review in Progress [ 10301 ] | To-Do [ 10305 ] |
| Status | To-Do [ 10305 ] | In Progress [ 3 ] |
| Assignee | Robert Reid [ robertreid ] | Brandon Bendickson [ bbendick ] |
| Status | In Progress [ 3 ] | Needs 1st Level Review [ 10005 ] |
| Assignee | Brandon Bendickson [ bbendick ] | Robert Reid [ robertreid ] |
| Status | Needs 1st Level Review [ 10005 ] | First Level Review in Progress [ 10301 ] |
| Status | First Level Review in Progress [ 10301 ] | To-Do [ 10305 ] |
| Status | To-Do [ 10305 ] | In Progress [ 3 ] |
| Sprint | Fall 4 [ 205 ] | Fall 4, Fall 5 [ 205, 206 ] |
| Rank | Ranked higher |
| Status | In Progress [ 3 ] | Needs 1st Level Review [ 10005 ] |
| Status | Needs 1st Level Review [ 10005 ] | First Level Review in Progress [ 10301 ] |
| Status | First Level Review in Progress [ 10301 ] | Ready for Pull Request [ 10304 ] |
| Status | Ready for Pull Request [ 10304 ] | Pull Request Submitted [ 10101 ] |
| Status | Pull Request Submitted [ 10101 ] | Reviewing Pull Request [ 10303 ] |
| Status | Reviewing Pull Request [ 10303 ] | Merged Needs Testing [ 10002 ] |
| Status | Merged Needs Testing [ 10002 ] | Post-merge Testing In Progress [ 10003 ] |
| Resolution | Done [ 10000 ] | |
| Status | Post-merge Testing In Progress [ 10003 ] | Closed [ 6 ] |