Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3966

Improve our NCBi submissions by implementing their new contaminant tool

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      GOAL: To run NCBI's new tool for dealing with adapters and contaminants.
      We run this on our TSA data and then resubmit.

      https://github.com/ncbi/fcs

      FCS
      The NCBI Foreign Contamination Screen (FCS) is a tool suite for identifying and removing contaminant sequences in genome assemblies. Contaminants are defined as sequences in a dataset that do not originate from the biological source organism and can arise from a variety of environmental and laboratory sources. FCS will help you remove contaminants from genomes before submission to GenBank.

      FCS-adaptor
      FCS-adaptor detects adaptor and vector contamination in genome sequences. FCS-adaptor is a high-throughput implementation of NCBI VecScreen. The FCS-adaptor executable retrieves a Docker or Singularity container and runs a pipeline to screen input sequences against a database of adaptors and vectors using stringent BLAST searches.

      Please read the wiki for instructions on how to run FCS-adaptor.

      FCS-GX
      FCS-GX detects contamination from foreign organisms in genome sequences using the genome cross-species aligner (GX). The FCS-GX executable retrieves a Docker or Singularity container and runs a pipeline to align sequences to a large database of NCBI genomes through modified k-mer seeds and assign a most likely taxonomic division. FCS-GX classifies sequences as contaminant when their taxonomic assignment is different from the user-provided taxonomic identifier.

      Please read the wiki for instructions on how to run FCS-GX.

        Attachments

          Activity

          Hide
          bbendick Brandon Bendickson added a comment -

          Successfully ran FCS-adaptor on our Heinz contigs. I used the Heinz_new_ID_clean.fna as our input. I had to truncate the seqIDs even further because the tool splits the headers and adds a couple characters (ex: >lcl|H-contig-1-Solyc01T003576.3-1842). These extra characters exceed the 50 character limit, so it breaks the blast portion of this tool. Seqids went from >Heinz-contig-5-Solyc09T002714.1-2500 to >H-contig-5-Solyc09T002714.1-2500.

          The final cleaned file is Heinz_cleaned_NCBI.fa and it is located in: /projects/tomato_genome/fnb/dataprocessing/TSA-transcriptomeShotgunAssembly/kelsieData/FCS_tools/FCS_adaptor/Heinz_NCBI

          The fcs-adapt-commands.txt file lists all used commands and the order in which to use them. I will run this for other varieties as well.

          Show
          bbendick Brandon Bendickson added a comment - Successfully ran FCS-adaptor on our Heinz contigs. I used the Heinz_new_ID_clean.fna as our input. I had to truncate the seqIDs even further because the tool splits the headers and adds a couple characters (ex: >lcl|H-contig-1-Solyc01T003576.3-1842). These extra characters exceed the 50 character limit, so it breaks the blast portion of this tool. Seqids went from >Heinz-contig-5-Solyc09T002714.1-2500 to >H-contig-5-Solyc09T002714.1-2500. The final cleaned file is Heinz_cleaned_NCBI.fa and it is located in: /projects/tomato_genome/fnb/dataprocessing/TSA-transcriptomeShotgunAssembly/kelsieData/FCS_tools/FCS_adaptor/Heinz_NCBI The fcs-adapt-commands.txt file lists all used commands and the order in which to use them. I will run this for other varieties as well.
          Hide
          bbendick Brandon Bendickson added a comment -

          Results of FCS-Adaptor found in /projects/tomato_genome/fnb/dataprocessing/TSA-transcriptomeShotgunAssembly/kelsieData/FCS_tools/FCS_adaptor

          Going to investigate FCS-GX next

          Show
          bbendick Brandon Bendickson added a comment - Results of FCS-Adaptor found in /projects/tomato_genome/fnb/dataprocessing/TSA-transcriptomeShotgunAssembly/kelsieData/FCS_tools/FCS_adaptor Going to investigate FCS-GX next
          Hide
          robofjoy Robert Reid added a comment -

          I think we should run this on a sequence run that we have changed the ID but with no additional filtering. This is so we can see how well it does, before and after, each step.

          Show
          robofjoy Robert Reid added a comment - I think we should run this on a sequence run that we have changed the ID but with no additional filtering. This is so we can see how well it does, before and after, each step.
          Hide
          robofjoy Robert Reid added a comment -

          Brandon now adding slurm parallelization!

          Show
          robofjoy Robert Reid added a comment - Brandon now adding slurm parallelization!
          Hide
          bbendick Brandon Bendickson added a comment -

          All runs finished. Successfully ran FCS-GX on cleaned files from FCS adaptor. Results are located in: /projects/tomato_genome/fnb/dataprocessing/TSA-transcriptomeShotgunAssembly/kelsieData/FCS_tools/FCS_GX/tom_runs/fin_results

          Working on summarizing the results, but should be ready for another submission. Going to assign this to Dr. Reid for review

          Show
          bbendick Brandon Bendickson added a comment - All runs finished. Successfully ran FCS-GX on cleaned files from FCS adaptor. Results are located in: /projects/tomato_genome/fnb/dataprocessing/TSA-transcriptomeShotgunAssembly/kelsieData/FCS_tools/FCS_GX/tom_runs/fin_results Working on summarizing the results, but should be ready for another submission. Going to assign this to Dr. Reid for review
          Show
          bbendick Brandon Bendickson added a comment - FCS tool results on tomato contigs: https://docs.google.com/spreadsheets/d/1fWZn0HTq1ZfM0hrL02CjD0nZ_5Q2frCkYH2945yOZDU/edit?usp=sharing
          Hide
          robofjoy Robert Reid added a comment -

          This looks great. However TSA still complains about the N's at the beginning and the end.

          Which means we need to run your N filter after we do these filters!!!! Silliness.

          This ticket is done, I'll create another.

          Show
          robofjoy Robert Reid added a comment - This looks great. However TSA still complains about the N's at the beginning and the end. Which means we need to run your N filter after we do these filters!!!! Silliness. This ticket is done, I'll create another.

            People

            • Assignee:
              robofjoy Robert Reid
              Reporter:
              robofjoy Robert Reid
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: