Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3303

Investigate: Distinguish F3H-OX3 from VF36 samples

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Muday Lab has detected possible sample switching in their time course data from three genotypes: ARE (mutant), F3H-OX3 (transgenic), and VF36 (wild-type, parent strain for F3H-OX3)

      It was easy to identify which samples were ARE and not VF36 or F3H-OX3.

      We need a way to distinguish F3H-OX3 from VF36 samples due to the sample switching issue.

      To do this, we can use the fact that F3H-OX3 contains transgenic construct and VF36 does not.

      Previously, we have found that the plant selective marker gene was expressed in soybean seeds in a transgenic line.

      For this task, we'll investigate whether we can use the F3H-OX3 line's plant selection marker gene to distinguish transgenic from non-transgenic lines.

      See:

      Maarten has sent us three files with the construct information in them. These are added to the git repository here:

      • ExternalDataSets/pK7WG2-F3H.fa
      • ExternalDataSets/pK7WG2-F3H.gb

      In this experiment, the plant selection gene was a kanamycin resistance gene.

      For this task, use the kanamycin gene either as a query or a target in a search of the fastq files to identify samples that contain RNA-Seq reads from the kanamycin-resistance gene.

      To do this, we need to investigate the different tools that might be available for this task.

      Let's try to find an easy-to-use tool or approach that will let us identify RNA-Seq samples containing "kan" gene sequence.

        Attachments

          Issue Links

            Activity

            ann.loraine Ann Loraine created issue -
            ann.loraine Ann Loraine made changes -
            Field Original Value New Value
            Epic Link IGBF-3277 [ 22158 ]
            ann.loraine Ann Loraine made changes -
            Link This issue relates to IGBF-3290 [ IGBF-3290 ]
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ]
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Ann's first idea on what to do:

            Augment the tomato SL5 genome assembly and gene model annotations by adding a new "fake" chromosome containing the construct sequence and construct annotations provided by Maarten. Then, we use this as the target assembly in a fresh run of the nf-core rnaseq pipeline. This should produce a new "counts" file with counts for the newly added construct gene locations, including counts for the kanamycin resistance gene. This might not be a great idea! We might be able to find a more direct and simple approach that would be less work. But if we can't, I'm pretty sure this would get the job done.

            Show
            ann.loraine Ann Loraine added a comment - - edited Ann's first idea on what to do: Augment the tomato SL5 genome assembly and gene model annotations by adding a new "fake" chromosome containing the construct sequence and construct annotations provided by Maarten. Then, we use this as the target assembly in a fresh run of the nf-core rnaseq pipeline. This should produce a new "counts" file with counts for the newly added construct gene locations, including counts for the kanamycin resistance gene. This might not be a great idea! We might be able to find a more direct and simple approach that would be less work. But if we can't, I'm pretty sure this would get the job done.
            Show
            ann.loraine Ann Loraine added a comment - Added gb and fa file with construct info to the repository. See: https://bitbucket.org/hotpollen/flavonoid-rnaseq/src/main/ExternalDataSets/pK7WG2-F3H.fa https://bitbucket.org/hotpollen/flavonoid-rnaseq/src/main/ExternalDataSets/pK7WG2-F3H.gb
            Mdavis4290 Molly Davis made changes -
            Assignee Molly Davis [ molly ]
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Also, ask for help on BioStars. Post a question and hope somebody gives us a great answer!

            This might help:

            Show
            ann.loraine Ann Loraine added a comment - - edited Also, ask for help on BioStars. Post a question and hope somebody gives us a great answer! This might help: https://www.biostars.org/p/9492059/
            ann.loraine Ann Loraine made changes -
            Summary Distinguish F3H-OX3 from VF36 samples Investigate: Distinguish F3H-OX3 from VF36 samples
            ann.loraine Ann Loraine made changes -
            Description Muday Lab has detected possible sample switching in their time course data from three genotypes: ARE (mutant), F3H-OX3 (transgenic), and VF36 (wild-type, parent strain for F3H-OX3)

            It was easy to identify which samples were ARE and not VF36 or F3H-OX3.

            We need a way to distinguish F3H-OX3 from VF36 samples due to the sample switching issue.

            To do this, we can use the fact that F3H-OX3 contains transgenic construct and VF36 does not.

            Previously, we have found that the kanamycin selection gene was expressed in soybean seeds in a transgenic line.

            See:

            * Open access article link: https://bmcbiotechnol.biomedcentral.com/articles/10.1186/s12896-015-0207-z
            * Code repository: https://bitbucket.org/lorainelab/soyseq

            Maarten has sent us three files with the construct information in them. We will add these to repository in the ExternalDataSets folder, after removing the whitespace in the file names.

            For this task, use the kanamycin gene either as a query or a target in a search of the fastq files to identify samples that contain RNA-Seq reads from the kanamycin gene.

            To do this, we need to investigate the different tools that might be available for this task.

            First, let's devote some time to searching for an easy-to-use effective tool that will be able to identify which samples contain "kan" gene sequences.
            Muday Lab has detected possible sample switching in their time course data from three genotypes: ARE (mutant), F3H-OX3 (transgenic), and VF36 (wild-type, parent strain for F3H-OX3)

            It was easy to identify which samples were ARE and not VF36 or F3H-OX3.

            We need a way to distinguish F3H-OX3 from VF36 samples due to the sample switching issue.

            To do this, we can use the fact that F3H-OX3 contains transgenic construct and VF36 does not.

            Previously, we have found that the kanamycin selection gene was expressed in soybean seeds in a transgenic line.

            See:

            * Open access article link: https://bmcbiotechnol.biomedcentral.com/articles/10.1186/s12896-015-0207-z
            * Code repository: https://bitbucket.org/lorainelab/soyseq

            Maarten has sent us three files with the construct information in them. We will add these to repository in the ExternalDataSets folder, after removing the whitespace in the file names.

            For this task, use the kanamycin gene either as a query or a target in a search of the fastq files to identify samples that contain RNA-Seq reads from the kanamycin gene.

            To do this, we need to investigate the different tools that might be available for this task.

            Try to find an easy-to-use tool or approach that will let us identify RNA-Seq samples containing "kan" gene sequence.
            Mdavis4290 Molly Davis made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Fastq files Directory: /nobackup/tomato_genome/muday-144/nfcore-2022/investigate_samples

            Tool Resource: https://jasonjwilliamsny.github.io/shell-genomics/lessons/03_searching_files.html

            Control: Solyc01G000038 (SL5)=Solyc01g005430(SL4)

            grep -B1 -A2 GTTTTAGCAA 7-are-15-min-28C_R1_001.fastq | wc -l
            50044
            

            Kanamycin Gene Sequence Origin:

            grep -B1 -A2 GGCCCTCTAG 7-are-15-min-28C_R1_001.fastq | wc -l
            2357
            
            Show
            Mdavis4290 Molly Davis added a comment - - edited Fastq files Directory: /nobackup/tomato_genome/muday-144/nfcore-2022/investigate_samples Tool Resource: https://jasonjwilliamsny.github.io/shell-genomics/lessons/03_searching_files.html Control: Solyc01G000038 (SL5)=Solyc01g005430(SL4) Control Genomic Sequence: https://solgenomics.net/feature/17666541/details grep -B1 -A2 GTTTTAGCAA 7-are-15-min-28C_R1_001.fastq | wc -l 50044 Kanamycin Gene Sequence Origin: grep -B1 -A2 GGCCCTCTAG 7-are-15-min-28C_R1_001.fastq | wc -l 2357
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Candidate positive control sequence from the region flanking the first intron of Solyc01G000038 in SL5 assembly

            left side of intron:
            GATTGATAGACAG

            right side of intron:
            ATTAAAGTGTTCTTTTCTGTTCCGG

            Search the two sequences spliced together as:
            GATTGATAGACAGATTAAAGTGTTCTTTTCTGTTCCGG

            Using IGB to count overlapping reads with gaps yields 3260 aligned reads that could match the query sequence. This is the upper limit for what should be returned. If there are more than this, then the sequence is not specific to the gene.

            DONT USE - too complicated

            Show
            ann.loraine Ann Loraine added a comment - - edited Candidate positive control sequence from the region flanking the first intron of Solyc01G000038 in SL5 assembly left side of intron: GATTGATAGACAG right side of intron: ATTAAAGTGTTCTTTTCTGTTCCGG Search the two sequences spliced together as: GATTGATAGACAGATTAAAGTGTTCTTTTCTGTTCCGG Using IGB to count overlapping reads with gaps yields 3260 aligned reads that could match the query sequence. This is the upper limit for what should be returned. If there are more than this, then the sequence is not specific to the gene. DONT USE - too complicated
            Hide
            ann.loraine Ann Loraine added a comment -

            Search a longer sequence from the Kannamycin gene to reduce false positives.

            Show
            ann.loraine Ann Loraine added a comment - Search a longer sequence from the Kannamycin gene to reduce false positives.
            Hide
            ann.loraine Ann Loraine added a comment -

            Don't uncompress the fastq files. To avoid this, use gunzip -c.

            Here is an example of how to do this:

            gunzip -c sample.fa.gz | grep -B1 -A2 | wc -l

            Once you've got it working to your satisfaction, you can write a simple script and run it in parallel on the cluster.

            Show
            ann.loraine Ann Loraine added a comment - Don't uncompress the fastq files. To avoid this, use gunzip -c. Here is an example of how to do this: gunzip -c sample.fa.gz | grep -B1 -A2 | wc -l Once you've got it working to your satisfaction, you can write a simple script and run it in parallel on the cluster.
            Hide
            ann.loraine Ann Loraine added a comment -

            Trying a different positive control query:

            gene: Solyc01G001422 (one intron)

            left side of intron: TTATCAGAGTGCTTCCTTCA
            right side of intron: GTGCCATTGGAAATGGATTT
            query sequence: TTATCAGAGTGCTTCCTTCAGTGCCATTGGAAATGGATTT

            Expected result: No more than 925 but definitely more than 0 for either R1 or R2 fastq files, but not both.

            Show
            ann.loraine Ann Loraine added a comment - Trying a different positive control query: gene: Solyc01G001422 (one intron) left side of intron: TTATCAGAGTGCTTCCTTCA right side of intron: GTGCCATTGGAAATGGATTT query sequence: TTATCAGAGTGCTTCCTTCAGTGCCATTGGAAATGGATTT Expected result: No more than 925 but definitely more than 0 for either R1 or R2 fastq files, but not both.
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Positive Control Query Sequence (R1 vs. R2) Solyc01G001422:

            grep -B1 -A2 TTATCAGAGTGCTTCCTTCAGTGCCATTGGAAATGGATTT 7-are-15-min-28C_R1_001.fastq | wc -l
            4
            
            grep -B1 -A2 TTATCAGAGTGCTTCCTTCAGTGCCATTGGAAATGGATTT 7-are-15-min-28C_R2_001.fastq | wc -l
            1918
            
            Show
            Mdavis4290 Molly Davis added a comment - - edited Positive Control Query Sequence (R1 vs. R2) Solyc01G001422: grep -B1 -A2 TTATCAGAGTGCTTCCTTCAGTGCCATTGGAAATGGATTT 7-are-15-min-28C_R1_001.fastq | wc -l 4 grep -B1 -A2 TTATCAGAGTGCTTCCTTCAGTGCCATTGGAAATGGATTT 7-are-15-min-28C_R2_001.fastq | wc -l 1918
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Slurm Script
            Directory: /nobackup/tomato_genome/muday-144/nfcore-2022/investigate_samples

            #! /bin/bash
            
            #SBATCH --job-name=investigate-labels
            #SBATCH --partition=Orion
            #SBATCH --nodes=1
            #SBATCH --ntasks-per-node=1
            #SBATCH --mem=40gb
            #SBATCH --output=%x_%j.out
            #SBATCH --time=24:00:00
            #SBATCH --array=1-144
            
            #setting up where to grab files from
            file=$(sed -n -e "${SLURM_ARRAY_TASK_ID}p" /nobackup/tomato_genome/muday-144/nfcore-2022/investigate_samples/runlist_labels.txt)
            
            
            cd  /nobackup/tomato_genome/muday-144/nfcore-2022/investigate_samples
            
            echo "Positive Control Query Sequence Solyc01G001422: $file";
            
            gunzip -c /nobackup/tomato_genome/muday-144/nfcore-2022/$file | grep -B1 -A2 TTATCAGAGTGCTTCCTTCAGTGCCATTGGAAATGGATTT | wc -l
            
            echo "Kanamycin Gene Sequence Primer-Bind: $file";
            
            gunzip -c /nobackup/tomato_genome/muday-144/nfcore-2022/$file | grep -B1 -A2 GGGGACCACTTTGTACAAGAAAGCTGGGTA | wc -l
            
            echo "----------------------------------------------------"
            
            cat *.out > out2.txt # combine .out files
            

            Results Head:

            ----------------------------------------------------
            Positive Control Query Sequence Solyc01G001422: 7-are-15-min-28C_R1_001.fastq.gz   
            4
            Kanamycin Gene Sequence Primer-Bind: 7-are-15-min-28C_R1_001.fastq.gz   
            0
            ----------------------------------------------------
            Positive Control Query Sequence Solyc01G001422: 7-are-15-min-28C_R2_001.fastq.gz   
            1918
            Kanamycin Gene Sequence Primer-Bind: 7-are-15-min-28C_R2_001.fastq.gz   
            0
            ----------------------------------------------------
            Positive Control Query Sequence Solyc01G001422: 7-are-15-min-34C_R1_001.fastq.gz   
            0
            Kanamycin Gene Sequence Primer-Bind: 7-are-15-min-34C_R1_001.fastq.gz   
            0
            ----------------------------------------------------
            Positive Control Query Sequence Solyc01G001422: 7-are-15-min-34C_R2_001.fastq.gz   
            1812
            Kanamycin Gene Sequence Primer-Bind: 7-are-15-min-34C_R2_001.fastq.gz   
            0
            ----------------------------------------------------
            Positive Control Query Sequence Solyc01G001422: 7-are-30-min-28C_R1_001.fastq.gz   
            9
            Kanamycin Gene Sequence Primer-Bind: 7-are-30-min-28C_R1_001.fastq.gz   
            0
            ----------------------------------------------------
            Positive Control Query Sequence Solyc01G001422: 7-are-30-min-28C_R2_001.fastq.gz   
            1396
            Kanamycin Gene Sequence Primer-Bind: 7-are-30-min-28C_R2_001.fastq.gz   
            0
            ----------------------------------------------------
            

            Comment: Need help finding a sequence query for Kanamycin.

            [~aloraine]

            Show
            Mdavis4290 Molly Davis added a comment - - edited Slurm Script Directory: /nobackup/tomato_genome/muday-144/nfcore-2022/investigate_samples #! /bin/bash #SBATCH --job-name=investigate-labels #SBATCH --partition=Orion #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --mem=40gb #SBATCH --output=%x_%j.out #SBATCH --time=24:00:00 #SBATCH --array=1-144 #setting up where to grab files from file=$(sed -n -e "${SLURM_ARRAY_TASK_ID}p" /nobackup/tomato_genome/muday-144/nfcore-2022/investigate_samples/runlist_labels.txt) cd /nobackup/tomato_genome/muday-144/nfcore-2022/investigate_samples echo "Positive Control Query Sequence Solyc01G001422: $file" ; gunzip -c /nobackup/tomato_genome/muday-144/nfcore-2022/$file | grep -B1 -A2 TTATCAGAGTGCTTCCTTCAGTGCCATTGGAAATGGATTT | wc -l echo "Kanamycin Gene Sequence Primer-Bind: $file" ; gunzip -c /nobackup/tomato_genome/muday-144/nfcore-2022/$file | grep -B1 -A2 GGGGACCACTTTGTACAAGAAAGCTGGGTA | wc -l echo "----------------------------------------------------" cat *.out > out2.txt # combine .out files Results Head: ---------------------------------------------------- Positive Control Query Sequence Solyc01G001422: 7-are-15-min-28C_R1_001.fastq.gz 4 Kanamycin Gene Sequence Primer-Bind: 7-are-15-min-28C_R1_001.fastq.gz 0 ---------------------------------------------------- Positive Control Query Sequence Solyc01G001422: 7-are-15-min-28C_R2_001.fastq.gz 1918 Kanamycin Gene Sequence Primer-Bind: 7-are-15-min-28C_R2_001.fastq.gz 0 ---------------------------------------------------- Positive Control Query Sequence Solyc01G001422: 7-are-15-min-34C_R1_001.fastq.gz 0 Kanamycin Gene Sequence Primer-Bind: 7-are-15-min-34C_R1_001.fastq.gz 0 ---------------------------------------------------- Positive Control Query Sequence Solyc01G001422: 7-are-15-min-34C_R2_001.fastq.gz 1812 Kanamycin Gene Sequence Primer-Bind: 7-are-15-min-34C_R2_001.fastq.gz 0 ---------------------------------------------------- Positive Control Query Sequence Solyc01G001422: 7-are-30-min-28C_R1_001.fastq.gz 9 Kanamycin Gene Sequence Primer-Bind: 7-are-30-min-28C_R1_001.fastq.gz 0 ---------------------------------------------------- Positive Control Query Sequence Solyc01G001422: 7-are-30-min-28C_R2_001.fastq.gz 1396 Kanamycin Gene Sequence Primer-Bind: 7-are-30-min-28C_R2_001.fastq.gz 0 ---------------------------------------------------- Comment: Need help finding a sequence query for Kanamycin. [~aloraine]
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            We expect that kanamycin resistance gene RNA-Seq sequences are present in the F3H-OX3 samples and absent in ARE and VF36 samples.

            However, if you get a negative result (no matches to the query sequences) in the F3H-OX3 files, there are at least three possible reasons for a negative result. First reason: the F3H-OX3 files are mislabeled and are not from transgenic lines. Second reason: they could be transgenic lines but the kanamycin gene is not expressed in pollen tube tissue type. Third reason: They are transgenic lines and the kanamycin gene is expressed but the test is not working for some reason.

            You need a positive control that will allow you to rule out option three.

            For this, I recommend using a fastq file from an experiment where we are 100% confident that the kanamycin gene is expressed. Note that previously I mistakenly thought the soybean experimental construct from the paper listed above used the same kanamycin resistance gene as F3H-OX3. It does not. Thus, we do not have an example dataset that can serve as a positive control.

            To rule out reason 2, you need a different positive control, a fastq file from a sample where you are 100% certain the transgene is present. For this, you could use the 120 minute time points samples from F3H-OX3 and/or F3H-OX4. If kanamycin gene is not detected in those samples, but it is detected in the soybean samples, then probably this would mean that the gene is simply not expressed in pollen tubes.

            Show
            ann.loraine Ann Loraine added a comment - - edited We expect that kanamycin resistance gene RNA-Seq sequences are present in the F3H-OX3 samples and absent in ARE and VF36 samples. However, if you get a negative result (no matches to the query sequences) in the F3H-OX3 files, there are at least three possible reasons for a negative result. First reason: the F3H-OX3 files are mislabeled and are not from transgenic lines. Second reason: they could be transgenic lines but the kanamycin gene is not expressed in pollen tube tissue type. Third reason: They are transgenic lines and the kanamycin gene is expressed but the test is not working for some reason. You need a positive control that will allow you to rule out option three. For this, I recommend using a fastq file from an experiment where we are 100% confident that the kanamycin gene is expressed. Note that previously I mistakenly thought the soybean experimental construct from the paper listed above used the same kanamycin resistance gene as F3H-OX3. It does not. Thus, we do not have an example dataset that can serve as a positive control. To rule out reason 2, you need a different positive control, a fastq file from a sample where you are 100% certain the transgene is present. For this, you could use the 120 minute time points samples from F3H-OX3 and/or F3H-OX4. If kanamycin gene is not detected in those samples, but it is detected in the soybean samples, then probably this would mean that the gene is simply not expressed in pollen tubes.
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Suggestion: modify your script to produce comma-separated data, a single line:

            column 1: name of target file
            column 2: query sequence used
            column 3: gene of origin for the query sequence
            column 4: number of matches found

            Use "squeue-doIt" script in flavonoid rnaseq repository "src" folder to submits jobs for every fastq.gz file in a directory. (These can be symbolic links of course)

            The end result is that you will get many tab-delimited files with one line per file. That is OK. When finished, use cat and redirection operator to bind them into a single file, e.g.,:

            cat *.csv > result.csv
            

            When done, we will place all the scripts and the final results file into a single folder in the repository to document the experiment and the results of the experiment.

            Show
            ann.loraine Ann Loraine added a comment - - edited Suggestion: modify your script to produce comma-separated data, a single line: column 1: name of target file column 2: query sequence used column 3: gene of origin for the query sequence column 4: number of matches found Use "squeue-doIt" script in flavonoid rnaseq repository "src" folder to submits jobs for every fastq.gz file in a directory. (These can be symbolic links of course) The end result is that you will get many tab-delimited files with one line per file. That is OK. When finished, use cat and redirection operator to bind them into a single file, e.g.,: cat *.csv > result.csv When done, we will place all the scripts and the final results file into a single folder in the repository to document the experiment and the results of the experiment.
            ann.loraine Ann Loraine made changes -
            Description Muday Lab has detected possible sample switching in their time course data from three genotypes: ARE (mutant), F3H-OX3 (transgenic), and VF36 (wild-type, parent strain for F3H-OX3)

            It was easy to identify which samples were ARE and not VF36 or F3H-OX3.

            We need a way to distinguish F3H-OX3 from VF36 samples due to the sample switching issue.

            To do this, we can use the fact that F3H-OX3 contains transgenic construct and VF36 does not.

            Previously, we have found that the kanamycin selection gene was expressed in soybean seeds in a transgenic line.

            See:

            * Open access article link: https://bmcbiotechnol.biomedcentral.com/articles/10.1186/s12896-015-0207-z
            * Code repository: https://bitbucket.org/lorainelab/soyseq

            Maarten has sent us three files with the construct information in them. We will add these to repository in the ExternalDataSets folder, after removing the whitespace in the file names.

            For this task, use the kanamycin gene either as a query or a target in a search of the fastq files to identify samples that contain RNA-Seq reads from the kanamycin gene.

            To do this, we need to investigate the different tools that might be available for this task.

            Try to find an easy-to-use tool or approach that will let us identify RNA-Seq samples containing "kan" gene sequence.
            Muday Lab has detected possible sample switching in their time course data from three genotypes: ARE (mutant), F3H-OX3 (transgenic), and VF36 (wild-type, parent strain for F3H-OX3)

            It was easy to identify which samples were ARE and not VF36 or F3H-OX3.

            We need a way to distinguish F3H-OX3 from VF36 samples due to the sample switching issue.

            To do this, we can use the fact that F3H-OX3 contains transgenic construct and VF36 does not.

            Previously, we have found that the kanamycin selection gene was expressed in soybean seeds in a transgenic line.

            See:

            * Open access article link: https://bmcbiotechnol.biomedcentral.com/articles/10.1186/s12896-015-0207-z
            * Code repository: https://bitbucket.org/lorainelab/soyseq

            Maarten has sent us three files with the construct information in them. We will add these to repository in the ExternalDataSets folder, after removing the whitespace in the file names.

            For this task, use the kanamycin gene either as a query or a target in a search of the fastq files to identify samples that contain RNA-Seq reads from the kanamycin-resistance gene.

            To do this, we need to investigate the different tools that might be available for this task.

            Try to find an easy-to-use tool or approach that will let us identify RNA-Seq samples containing "kan" gene sequence.
            ann.loraine Ann Loraine made changes -
            Sprint Spring 6 2023 Mar 20 [ 166 ] Spring 6 2023 Mar 20, Spring 7 2023 Apr 10 [ 166, 167 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            I used the F3H.fasta file that Maarten sent and created some different Kanamycin sequence queries. After running a couple of different sequences with the fastq files, there were two consistent fastq samples that kept producing results with each Kanamycin sequence I gave it. Those are:

            • 7-VF36-45-min-34C_R1_001.fastq.gz
            • 7-VF36-75-min-28C_R1_001.fastq.gz

            Results can be found in this directory: /nobackup/tomato_genome/muday-144/nfcore-2022/investigate_samples

            Next step: Find a control fastq file for Kanamycin to check sequence query.
            https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR15568707&display=download

            Show
            Mdavis4290 Molly Davis added a comment - - edited I used the F3H.fasta file that Maarten sent and created some different Kanamycin sequence queries. After running a couple of different sequences with the fastq files, there were two consistent fastq samples that kept producing results with each Kanamycin sequence I gave it. Those are: 7-VF36-45-min-34C_R1_001.fastq.gz 7-VF36-75-min-28C_R1_001.fastq.gz Results can be found in this directory: /nobackup/tomato_genome/muday-144/nfcore-2022/investigate_samples Next step: Find a control fastq file for Kanamycin to check sequence query. https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR15568707&display=download
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Dr. Reid took chunks out of the functional domain of Kan sequence (kanR2_SeqfromAnthony-funcDomain.fna) and made the following file to make queries with my grep script:
            /nobackup/tomato_genome/muday-144/nfcore-202215nt_chunksof_KanR2.fna

            >Kan-R2 - 5 primed end
            GTCTGGAAAGAAATG
            >Kan-R2-middle1
            CCTTATTTTTGACGA
            >Kan-R2-middle2
            TCTTGCCATCCTATG
            >Kan-R2-middle3
            TAATCCTGATATGAA
            >Kan-R2-middle4
            AAATGGGCTCGCGAT
            >Kan-R2-middle5
            TTGTTTCTGAAACAT
            >Kan-R2-middle6
            GAATTTATGCCTCTT
            >Kan-R2-middle7
            GGAAAAACAGCATTC
            >Kan-R2-middle8
            TGCGCCGGTTGCATT
            >Kan-R2-middle9
            TCACGAATGAATAAC
            >Kan-R2-middle10
            TGGAAAGAAATGCAT
            >Kan-R2-middle11
            TATTTTTGACGAGGG
            >Kan-R2-middle12
            TGCCATCCTATGGAA
            >Kan-R2-middle13
            ATCCTGATATGAATA
            >Kan-R2-3 primed end
            TGCTCGATGAGTTTT
            

            Results Directory for each query: /nobackup/tomato_genome/muday-144/nfcore-2022/investigate_samples

            Show
            Mdavis4290 Molly Davis added a comment - - edited Dr. Reid took chunks out of the functional domain of Kan sequence (kanR2_SeqfromAnthony-funcDomain.fna) and made the following file to make queries with my grep script: /nobackup/tomato_genome/muday-144/nfcore-202215nt_chunksof_KanR2.fna >Kan-R2 - 5 primed end GTCTGGAAAGAAATG >Kan-R2-middle1 CCTTATTTTTGACGA >Kan-R2-middle2 TCTTGCCATCCTATG >Kan-R2-middle3 TAATCCTGATATGAA >Kan-R2-middle4 AAATGGGCTCGCGAT >Kan-R2-middle5 TTGTTTCTGAAACAT >Kan-R2-middle6 GAATTTATGCCTCTT >Kan-R2-middle7 GGAAAAACAGCATTC >Kan-R2-middle8 TGCGCCGGTTGCATT >Kan-R2-middle9 TCACGAATGAATAAC >Kan-R2-middle10 TGGAAAGAAATGCAT >Kan-R2-middle11 TATTTTTGACGAGGG >Kan-R2-middle12 TGCCATCCTATGGAA >Kan-R2-middle13 ATCCTGATATGAATA >Kan-R2-3 primed end TGCTCGATGAGTTTT Results Directory for each query: /nobackup/tomato_genome/muday-144/nfcore-2022/investigate_samples
            Hide
            Mdavis4290 Molly Davis added a comment -

            Dr. Reid had idea to run nextflow on the AreWeThere dataset on SL5 (Gloria's old dataset). One idea was to take the old dataset. Treat those gene expressions as gospel. And then cluster each of the muday144 samples with the old set. And see where they land:

            Directory: /nobackup/tomato_genome/areYouThere-nfcoreSL5

            Show
            Mdavis4290 Molly Davis added a comment - Dr. Reid had idea to run nextflow on the AreWeThere dataset on SL5 (Gloria's old dataset). One idea was to take the old dataset. Treat those gene expressions as gospel. And then cluster each of the muday144 samples with the old set. And see where they land: Directory: /nobackup/tomato_genome/areYouThere-nfcoreSL5
            Hide
            ann.loraine Ann Loraine added a comment -

            We have run these samples through nf-core using SL5 as the reference already. It's in the first SRA batch we downloaded and these data are also available for visualization in IGB.

            Show
            ann.loraine Ann Loraine added a comment - We have run these samples through nf-core using SL5 as the reference already. It's in the first SRA batch we downloaded and these data are also available for visualization in IGB.
            Hide
            robofjoy Robert Reid added a comment -

            Some other notes related to this.

            On Tuesday's tech meeting, the group discussed other possibilities that we could collectively try.
            In the end, not much came up as things we can try.
            Gloria is hunting down the original plasmid that was used in the transformation. Maybe from that there is some sequence we can also hunt for.

            I talked about this with Rick White. He has done plant transformations and was fascinated by the problem. (And impressed how IGB was able to detect the mislabeling by the read count profiles!!). He suspected that we would not be able to detect kanamycin. Maybe in genome sequence but that too would be a long shot.

            Show
            robofjoy Robert Reid added a comment - Some other notes related to this. On Tuesday's tech meeting, the group discussed other possibilities that we could collectively try. In the end, not much came up as things we can try. Gloria is hunting down the original plasmid that was used in the transformation. Maybe from that there is some sequence we can also hunt for. I talked about this with Rick White. He has done plant transformations and was fascinated by the problem. (And impressed how IGB was able to detect the mislabeling by the read count profiles!!). He suspected that we would not be able to detect kanamycin. Maybe in genome sequence but that too would be a long shot.
            Hide
            ann.loraine Ann Loraine added a comment -

            During discussion in scrum we realized that results are inconclusive from the kanamycin grep strategy. Let's halt this approach and try the approach from the soybean paper mentioned above. Making a new Jira issue to pursue this other strategy.

            Closing this "investigate" ticket to try the other strategy.

            Show
            ann.loraine Ann Loraine added a comment - During discussion in scrum we realized that results are inconclusive from the kanamycin grep strategy. Let's halt this approach and try the approach from the soybean paper mentioned above. Making a new Jira issue to pursue this other strategy. Closing this "investigate" ticket to try the other strategy.
            ann.loraine Ann Loraine made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            ann.loraine Ann Loraine made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            ann.loraine Ann Loraine made changes -
            Status First Level Review in Progress [ 10301 ] Ready for Pull Request [ 10304 ]
            ann.loraine Ann Loraine made changes -
            Status Ready for Pull Request [ 10304 ] Pull Request Submitted [ 10101 ]
            ann.loraine Ann Loraine made changes -
            Status Pull Request Submitted [ 10101 ] Reviewing Pull Request [ 10303 ]
            ann.loraine Ann Loraine made changes -
            Status Reviewing Pull Request [ 10303 ] Merged Needs Testing [ 10002 ]
            ann.loraine Ann Loraine made changes -
            Status Merged Needs Testing [ 10002 ] Post-merge Testing In Progress [ 10003 ]
            ann.loraine Ann Loraine made changes -
            Resolution Done [ 10000 ]
            Status Post-merge Testing In Progress [ 10003 ] Closed [ 6 ]
            ann.loraine Ann Loraine made changes -
            Description Muday Lab has detected possible sample switching in their time course data from three genotypes: ARE (mutant), F3H-OX3 (transgenic), and VF36 (wild-type, parent strain for F3H-OX3)

            It was easy to identify which samples were ARE and not VF36 or F3H-OX3.

            We need a way to distinguish F3H-OX3 from VF36 samples due to the sample switching issue.

            To do this, we can use the fact that F3H-OX3 contains transgenic construct and VF36 does not.

            Previously, we have found that the kanamycin selection gene was expressed in soybean seeds in a transgenic line.

            See:

            * Open access article link: https://bmcbiotechnol.biomedcentral.com/articles/10.1186/s12896-015-0207-z
            * Code repository: https://bitbucket.org/lorainelab/soyseq

            Maarten has sent us three files with the construct information in them. We will add these to repository in the ExternalDataSets folder, after removing the whitespace in the file names.

            For this task, use the kanamycin gene either as a query or a target in a search of the fastq files to identify samples that contain RNA-Seq reads from the kanamycin-resistance gene.

            To do this, we need to investigate the different tools that might be available for this task.

            Try to find an easy-to-use tool or approach that will let us identify RNA-Seq samples containing "kan" gene sequence.
            Muday Lab has detected possible sample switching in their time course data from three genotypes: ARE (mutant), F3H-OX3 (transgenic), and VF36 (wild-type, parent strain for F3H-OX3)

            It was easy to identify which samples were ARE and not VF36 or F3H-OX3.

            We need a way to distinguish F3H-OX3 from VF36 samples due to the sample switching issue.

            To do this, we can use the fact that F3H-OX3 contains transgenic construct and VF36 does not.

            Previously, we have found that the kanamycin selection gene was expressed in soybean seeds in a transgenic line.

            See:

            * Open access article link: https://bmcbiotechnol.biomedcentral.com/articles/10.1186/s12896-015-0207-z
            * Code repository: https://bitbucket.org/lorainelab/soyseq

            Maarten has sent us three files with the construct information in them. These are added to the git repository:



            For this task, use the kanamycin gene either as a query or a target in a search of the fastq files to identify samples that contain RNA-Seq reads from the kanamycin-resistance gene.

            To do this, we need to investigate the different tools that might be available for this task.

            Try to find an easy-to-use tool or approach that will let us identify RNA-Seq samples containing "kan" gene sequence.
            ann.loraine Ann Loraine made changes -
            Description Muday Lab has detected possible sample switching in their time course data from three genotypes: ARE (mutant), F3H-OX3 (transgenic), and VF36 (wild-type, parent strain for F3H-OX3)

            It was easy to identify which samples were ARE and not VF36 or F3H-OX3.

            We need a way to distinguish F3H-OX3 from VF36 samples due to the sample switching issue.

            To do this, we can use the fact that F3H-OX3 contains transgenic construct and VF36 does not.

            Previously, we have found that the kanamycin selection gene was expressed in soybean seeds in a transgenic line.

            See:

            * Open access article link: https://bmcbiotechnol.biomedcentral.com/articles/10.1186/s12896-015-0207-z
            * Code repository: https://bitbucket.org/lorainelab/soyseq

            Maarten has sent us three files with the construct information in them. These are added to the git repository:



            For this task, use the kanamycin gene either as a query or a target in a search of the fastq files to identify samples that contain RNA-Seq reads from the kanamycin-resistance gene.

            To do this, we need to investigate the different tools that might be available for this task.

            Try to find an easy-to-use tool or approach that will let us identify RNA-Seq samples containing "kan" gene sequence.
            Muday Lab has detected possible sample switching in their time course data from three genotypes: ARE (mutant), F3H-OX3 (transgenic), and VF36 (wild-type, parent strain for F3H-OX3)

            It was easy to identify which samples were ARE and not VF36 or F3H-OX3.

            We need a way to distinguish F3H-OX3 from VF36 samples due to the sample switching issue.

            To do this, we can use the fact that F3H-OX3 contains transgenic construct and VF36 does not.

            Previously, we have found that the kanamycin selection gene was expressed in soybean seeds in a transgenic line.

            See:

            * Open access article link: https://bmcbiotechnol.biomedcentral.com/articles/10.1186/s12896-015-0207-z
            * Code repository: https://bitbucket.org/lorainelab/soyseq

            Maarten has sent us three files with the construct information in them. These are added to the git repository here:

            * ExternalDataSets/pK7WG2-F3H.fa
            * ExternalDataSets/pK7WG2-F3H.gb

            For this task, use the kanamycin gene either as a query or a target in a search of the fastq files to identify samples that contain RNA-Seq reads from the kanamycin-resistance gene.

            To do this, we need to investigate the different tools that might be available for this task.

            Try to find an easy-to-use tool or approach that will let us identify RNA-Seq samples containing "kan" gene sequence.
            ann.loraine Ann Loraine made changes -
            Link This issue relates to IGBF-3306 [ IGBF-3306 ]
            ann.loraine Ann Loraine made changes -
            Resolution Done [ 10000 ]
            Status Closed [ 6 ] To-Do [ 10305 ]
            ann.loraine Ann Loraine made changes -
            Assignee Molly Davis [ molly ] Robert Reid [ robertreid ]
            Hide
            robofjoy Robert Reid added a comment -

            The STAR run to see if ANY read sequences align to the kanamycin gene that Anthony from Gloria's lab sent.

            He sent it in a Word doc: (KANR2)

            GTCTGGAAAGAAATGCATAAACTTTTGCCATTCTCACCGGATTCAGTCGTCACTCATGGTGATTTCTCACTTGATAA
            CCTTATTTTTGACGAGGGGAAATTAATAGGTTGTATTGATGTTGGACGAGTCGGAATCGCAGACCGATACCAGGA
            TCTTGCCATCCTATGGAACTGCCTCGGTGAGTTTTCTCCTTCATTACAGAAACGGCTTTTTCAAAAATATGGTATTGA
            TAATCCTGATATGAATAAATTGCAGTTTCATTTGATGCTCGATGAGTTTTAACATGGATGCTGATTTATATGGGTAT
            AAATGGGCTCGCGATAATGTCGGGCAATCAGGTGCGACAATCTATCGCTTGTATGGGAAGCCCGATGCGCCAGAG
            TTGTTTCTGAAACATGGCAAAGGTAGCGTTGCCAATGATGTTACAGATGAGATGGTCAGACTAAACTGGCTGACG
            GAATTTATGCCTCTTCCGACCATCAAGCATTTTATCCGTACTCCTGATGATGCATGGTTACTCACCACTGCGATCCCC
            GGAAAAACAGCATTCCAGGTATTAGAAGAATATCCTGATTCAGGTGAAAATATTGTTGATGCGCTGGCAGTGTTCC
            TGCGCCGGTTGCATTCGATTCCTGTTTGTAATTGTCCTTTTAACAGCGATCGCGTATTTCGTCTCGCTCAGGCGCAA
            TCACGAATGAATAACGGTTTGGTTGATGCGAGTGATTTTGATGACGAGCGTAATGGCTGGCCTGTTGAACAAGTC
            TGGAAAGAAATGCATAAACTTTTGCCATTCTCACCGGATTCAGTCGTCACTCATGGTGATTTCTCACTTGATAACCT
            TATTTTTGACGAGGGGAAATTAATAGGTTGTATTGATGTTGGACGAGTCGGAATCGCAGACCGATACCAGGATCT
            TGCCATCCTATGGAACTGCCTCGGTGAGTTTTCTCCTTCATTACAGAAACGGCTTTTTCAAAAATATGGTATTGATA
            ATCCTGATATGAATAAATTGCAGTTTCATTTGATGCTCGATGAGTTTT

            I made a fasta out of this and is on the cluster at this location:
            /nobackup/tomato_genome/muday-144/nfcore-2022/kanhunt/kanR2_SeqfromAnthony-fullsequence.fna

            I treated the above sequence as the reference file for a STAR alignment.
            Results to go here:
            /nobackup/tomato_genome/muday-144/nfcore-2022/kanhunt/star

            First step in STAR is the make a genome index file.
            This is done once via this script:
            /nobackup/tomato_genome/muday-144/nfcore-2022/kanhunt/star-generategenome.slurm

            And then we run 144 jobs as an array, 1 for each read sequence in the experiment:
            star-chunks.slurm
            (silly name, I first planned to align to small chunks, named it this way)

            We get 144 bam files, all of which are empty. The log files appear that STAR ran successfully.

            Show
            robofjoy Robert Reid added a comment - The STAR run to see if ANY read sequences align to the kanamycin gene that Anthony from Gloria's lab sent. He sent it in a Word doc: (KANR2) GTCTGGAAAGAAATGCATAAACTTTTGCCATTCTCACCGGATTCAGTCGTCACTCATGGTGATTTCTCACTTGATAA CCTTATTTTTGACGAGGGGAAATTAATAGGTTGTATTGATGTTGGACGAGTCGGAATCGCAGACCGATACCAGGA TCTTGCCATCCTATGGAACTGCCTCGGTGAGTTTTCTCCTTCATTACAGAAACGGCTTTTTCAAAAATATGGTATTGA TAATCCTGATATGAATAAATTGCAGTTTCATTTGATGCTCGATGAGTTTTAACATGGATGCTGATTTATATGGGTAT AAATGGGCTCGCGATAATGTCGGGCAATCAGGTGCGACAATCTATCGCTTGTATGGGAAGCCCGATGCGCCAGAG TTGTTTCTGAAACATGGCAAAGGTAGCGTTGCCAATGATGTTACAGATGAGATGGTCAGACTAAACTGGCTGACG GAATTTATGCCTCTTCCGACCATCAAGCATTTTATCCGTACTCCTGATGATGCATGGTTACTCACCACTGCGATCCCC GGAAAAACAGCATTCCAGGTATTAGAAGAATATCCTGATTCAGGTGAAAATATTGTTGATGCGCTGGCAGTGTTCC TGCGCCGGTTGCATTCGATTCCTGTTTGTAATTGTCCTTTTAACAGCGATCGCGTATTTCGTCTCGCTCAGGCGCAA TCACGAATGAATAACGGTTTGGTTGATGCGAGTGATTTTGATGACGAGCGTAATGGCTGGCCTGTTGAACAAGTC TGGAAAGAAATGCATAAACTTTTGCCATTCTCACCGGATTCAGTCGTCACTCATGGTGATTTCTCACTTGATAACCT TATTTTTGACGAGGGGAAATTAATAGGTTGTATTGATGTTGGACGAGTCGGAATCGCAGACCGATACCAGGATCT TGCCATCCTATGGAACTGCCTCGGTGAGTTTTCTCCTTCATTACAGAAACGGCTTTTTCAAAAATATGGTATTGATA ATCCTGATATGAATAAATTGCAGTTTCATTTGATGCTCGATGAGTTTT I made a fasta out of this and is on the cluster at this location: /nobackup/tomato_genome/muday-144/nfcore-2022/kanhunt/kanR2_SeqfromAnthony-fullsequence.fna I treated the above sequence as the reference file for a STAR alignment. Results to go here: /nobackup/tomato_genome/muday-144/nfcore-2022/kanhunt/star First step in STAR is the make a genome index file. This is done once via this script: /nobackup/tomato_genome/muday-144/nfcore-2022/kanhunt/star-generategenome.slurm And then we run 144 jobs as an array, 1 for each read sequence in the experiment: star-chunks.slurm (silly name, I first planned to align to small chunks, named it this way) We get 144 bam files, all of which are empty. The log files appear that STAR ran successfully.
            Hide
            robofjoy Robert Reid added a comment -

            Blast RUN:

            Just like Star run but for blast.
            Location:
            /nobackup/tomato_genome/muday-144/nfcore-2022/kanhunt/blast

            Script:
            /nobackup/tomato_genome/muday-144/nfcore-2022/kanhunt/

            -Need to run seqtk to convert the fastq to fasta to run blast.

            So far there is an issue. 35 jobs complete. And the rest do not.
            Getting kicked off the cluster because of my internet. Hope to resolve from office.
            Of the 35 jobs that completed, no sequences were found to hit at a e-score cutofff = 1e-25.

            Show
            robofjoy Robert Reid added a comment - Blast RUN: Just like Star run but for blast. Location: /nobackup/tomato_genome/muday-144/nfcore-2022/kanhunt/blast Script: /nobackup/tomato_genome/muday-144/nfcore-2022/kanhunt/ -Need to run seqtk to convert the fastq to fasta to run blast. So far there is an issue. 35 jobs complete. And the rest do not. Getting kicked off the cluster because of my internet. Hope to resolve from office. Of the 35 jobs that completed, no sequences were found to hit at a e-score cutofff = 1e-25.
            Hide
            ann.loraine Ann Loraine added a comment -

            Using kan-r gene in this way does not yield much. Closing this ticket.
            Next strategy to try will be running an RNA-Seq alignment tool on the fastq files and the new plasmid genome, either by itself or with the rest of the tomato SL5 genome assembly, using the fasta file and bed file made by [~molly] in IGBF-3306.

            We will make a new ticket for running this RNA-Seq alignment strategy to detect Kan-r expression.

            Show
            ann.loraine Ann Loraine added a comment - Using kan-r gene in this way does not yield much. Closing this ticket. Next strategy to try will be running an RNA-Seq alignment tool on the fastq files and the new plasmid genome, either by itself or with the rest of the tomato SL5 genome assembly, using the fasta file and bed file made by [~molly] in IGBF-3306. We will make a new ticket for running this RNA-Seq alignment strategy to detect Kan-r expression.
            ann.loraine Ann Loraine made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            ann.loraine Ann Loraine made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            ann.loraine Ann Loraine made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            ann.loraine Ann Loraine made changes -
            Status First Level Review in Progress [ 10301 ] Ready for Pull Request [ 10304 ]
            ann.loraine Ann Loraine made changes -
            Status Ready for Pull Request [ 10304 ] Pull Request Submitted [ 10101 ]
            ann.loraine Ann Loraine made changes -
            Status Pull Request Submitted [ 10101 ] Reviewing Pull Request [ 10303 ]
            ann.loraine Ann Loraine made changes -
            Status Reviewing Pull Request [ 10303 ] Merged Needs Testing [ 10002 ]
            ann.loraine Ann Loraine made changes -
            Status Merged Needs Testing [ 10002 ] Post-merge Testing In Progress [ 10003 ]
            ann.loraine Ann Loraine made changes -
            Resolution Done [ 10000 ]
            Status Post-merge Testing In Progress [ 10003 ] Closed [ 6 ]
            ann.loraine Ann Loraine made changes -
            Description Muday Lab has detected possible sample switching in their time course data from three genotypes: ARE (mutant), F3H-OX3 (transgenic), and VF36 (wild-type, parent strain for F3H-OX3)

            It was easy to identify which samples were ARE and not VF36 or F3H-OX3.

            We need a way to distinguish F3H-OX3 from VF36 samples due to the sample switching issue.

            To do this, we can use the fact that F3H-OX3 contains transgenic construct and VF36 does not.

            Previously, we have found that the kanamycin selection gene was expressed in soybean seeds in a transgenic line.

            See:

            * Open access article link: https://bmcbiotechnol.biomedcentral.com/articles/10.1186/s12896-015-0207-z
            * Code repository: https://bitbucket.org/lorainelab/soyseq

            Maarten has sent us three files with the construct information in them. These are added to the git repository here:

            * ExternalDataSets/pK7WG2-F3H.fa
            * ExternalDataSets/pK7WG2-F3H.gb

            For this task, use the kanamycin gene either as a query or a target in a search of the fastq files to identify samples that contain RNA-Seq reads from the kanamycin-resistance gene.

            To do this, we need to investigate the different tools that might be available for this task.

            Try to find an easy-to-use tool or approach that will let us identify RNA-Seq samples containing "kan" gene sequence.
            Muday Lab has detected possible sample switching in their time course data from three genotypes: ARE (mutant), F3H-OX3 (transgenic), and VF36 (wild-type, parent strain for F3H-OX3)

            It was easy to identify which samples were ARE and not VF36 or F3H-OX3.

            We need a way to distinguish F3H-OX3 from VF36 samples due to the sample switching issue.

            To do this, we can use the fact that F3H-OX3 contains transgenic construct and VF36 does not.

            Previously, we have found that the plant selective marker gene was expressed in soybean seeds in a transgenic line.

            For this task, we'll investigate whether we can use the F3H-OX3 line's plant selection marker gene to distinguish transgenic from non-transgenic lines.

            See:

            * Open access article link: https://bmcbiotechnol.biomedcentral.com/articles/10.1186/s12896-015-0207-z
            * Code repository: https://bitbucket.org/lorainelab/soyseq

            Maarten has sent us three files with the construct information in them. These are added to the git repository here:

            * ExternalDataSets/pK7WG2-F3H.fa
            * ExternalDataSets/pK7WG2-F3H.gb

            In this experiment, the plant selection gene was a kanamycin resistance gene.

            For this task, use the kanamycin gene either as a query or a target in a search of the fastq files to identify samples that contain RNA-Seq reads from the kanamycin-resistance gene.

            To do this, we need to investigate the different tools that might be available for this task.

            Let's try to find an easy-to-use tool or approach that will let us identify RNA-Seq samples containing "kan" gene sequence.
            ann.loraine Ann Loraine made changes -
            Assignee Robert Reid [ robertreid ] Molly Davis [ molly ]
            ann.loraine Ann Loraine made changes -
            Assignee Molly Davis [ molly ] Robert Reid [ robertreid ]

              People

              • Assignee:
                robofjoy Robert Reid
                Reporter:
                ann.loraine Ann Loraine
              • Votes:
                0 Vote for this issue
                Watchers:
                Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: