Details
-
Type:
Task
-
Status: Closed (View Workflow)
-
Priority:
Major
-
Resolution: Done
-
Affects Version/s: None
-
Fix Version/s: None
-
Labels:None
-
Story Points:2
-
Epic Link:
-
Sprint:Fall 5, Fall 6
Description
GOAL: To run NCBI's new tool for dealing with adapters and contaminants.
We run this on our TSA data and then resubmit.
FCS
The NCBI Foreign Contamination Screen (FCS) is a tool suite for identifying and removing contaminant sequences in genome assemblies. Contaminants are defined as sequences in a dataset that do not originate from the biological source organism and can arise from a variety of environmental and laboratory sources. FCS will help you remove contaminants from genomes before submission to GenBank.
FCS-adaptor
FCS-adaptor detects adaptor and vector contamination in genome sequences. FCS-adaptor is a high-throughput implementation of NCBI VecScreen. The FCS-adaptor executable retrieves a Docker or Singularity container and runs a pipeline to screen input sequences against a database of adaptors and vectors using stringent BLAST searches.
Please read the wiki for instructions on how to run FCS-adaptor.
FCS-GX
FCS-GX detects contamination from foreign organisms in genome sequences using the genome cross-species aligner (GX). The FCS-GX executable retrieves a Docker or Singularity container and runs a pipeline to align sequences to a large database of NCBI genomes through modified k-mer seeds and assign a most likely taxonomic division. FCS-GX classifies sequences as contaminant when their taxonomic assignment is different from the user-provided taxonomic identifier.
Please read the wiki for instructions on how to run FCS-GX.
Successfully ran FCS-adaptor on our Heinz contigs. I used the Heinz_new_ID_clean.fna as our input. I had to truncate the seqIDs even further because the tool splits the headers and adds a couple characters (ex: >lcl|H-contig-1-Solyc01T003576.3-1842). These extra characters exceed the 50 character limit, so it breaks the blast portion of this tool. Seqids went from >Heinz-contig-5-Solyc09T002714.1-2500 to >H-contig-5-Solyc09T002714.1-2500.
The final cleaned file is Heinz_cleaned_NCBI.fa and it is located in: /projects/tomato_genome/fnb/dataprocessing/TSA-transcriptomeShotgunAssembly/kelsieData/FCS_tools/FCS_adaptor/Heinz_NCBI
The fcs-adapt-commands.txt file lists all used commands and the order in which to use them. I will run this for other varieties as well.