Uploaded image for project: 'Bug Repository'
  1. Bug Repository
  2. BUG-604

try loading 1000 genomes file into IGB

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Labels:
      None

      Description

      TASK:

      Write a SHORT vignette (succinct as possible) for the User's Guide describing accessing 1000 genomes file from AWS

      Before you get started, spend an hour reading up on the 1000 genomes project, what it is, and so on.

      Also, before you get started, see this post on IGV groups:

      https://groups.google.com/forum/#!topic/igv-help/Z6vF2n8nzSc[1-25]

      Hi everyone,

      I would like to compare my own bam file with data from the 1000 Genomes list (I usually pull the last one up through "File -> Load from Server").

      I succeeded in writing a batch script that shows my own bam file and saves a snapshot, but how can I add the track from the 1000 Genomes data to this?

      Thanks for any help with this!

      Best,
      ~Lina

      Hi,

      If you know the URL to the file, either ftp or http, you can use "Load from URL..." and paste it in, or the batch load command. For example,

      ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/data/HG00111/alignment/HG00111.chrom11.ILLUMINA.bwa.GBR.low_coverage.20111114.bam

      the same data can be loaded from the "cloud" with this URL

      http://1000genomes.s3.amazonaws.com/data/HG00111/alignment/HG00111.chrom11.ILLUMINA.bwa.GBR.low_coverage.20111114.bam

      In general I recommend use of the cloud URLs, performance will be much better. Unfortunately browsing the cloud dataset is not easy without a tool, however it usually works to first find the data browsing the ftp site at ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/data/, the replacing the first part of the URL with the cloud eqiuvalent, i.e. replace

      ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp

      with

      http://1000genomes.s3.amazonaws.com

      For VCF files I recommend just downloading them, they aren't so huge and performance is better. If you want to load them remotely the cloud URLs are your only option (other than downloading the file) as ftp is not supported for tabix indexed files.

      http://1000genomes.s3.amazonaws.com/release/20110521/ALL.chr11.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz

      Jim

      • show quoted text -

      On Thursday, August 2, 2012 5:54:24 PM UTC-4, Lina Faller wrote:

      Hi everyone,

      I would like to compare my own bam file with data from the 1000 Genomes list (I usually pull the last one up through "File -> Load from Server").

      I succeeded in writing a batch script that shows my own bam file and saves a snapshot, but how can I add the track from the 1000 Genomes data to this?

      Thanks for any help with this!

      Best,
      ~Lina

        Attachments

          Issue Links

            Activity

            Hide
            alyssa Alyssa Gulledge (Inactive) added a comment -

            The bam files work just fine but you HAVE to pull down the bai as well (as always).

            The vcf file is giving me trouble - still working on it...

            Show
            alyssa Alyssa Gulledge (Inactive) added a comment - The bam files work just fine but you HAVE to pull down the bai as well (as always). The vcf file is giving me trouble - still working on it...
            Hide
            ann.loraine Ann Loraine added a comment -

            What happens if you open from url?

            Seems like that should work ???

            Show
            ann.loraine Ann Loraine added a comment - What happens if you open from url? Seems like that should work ???
            Hide
            alyssa Alyssa Gulledge (Inactive) added a comment -

            Nope - it does not - I keep getting an error about trying to find chromosomes (from the downloaded file). Could be a VCF issue? Assigning to Hiral to check...

            Show
            alyssa Alyssa Gulledge (Inactive) added a comment - Nope - it does not - I keep getting an error about trying to find chromosomes (from the downloaded file). Could be a VCF issue? Assigning to Hiral to check...
            Hide
            ann.loraine Ann Loraine added a comment -

            Check if the header is missing the chromosome names

            $ samtools view -H URL

            Show
            ann.loraine Ann Loraine added a comment - Check if the header is missing the chromosome names $ samtools view -H URL
            Show
            hiralv Hiral Vora (Inactive) added a comment - I just opened http://1000genomes.s3.amazonaws.com/release/20110521/ALL.chr11.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz and it worked fine.
            Hide
            hiralv Hiral Vora (Inactive) added a comment -

            Btw, http://1000genomes.s3.amazonaws.com/release/20110521/ALL.chr11.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz is a tabix indexed file. So you will need to download corresponding tabix file as well.

            Show
            hiralv Hiral Vora (Inactive) added a comment - Btw, http://1000genomes.s3.amazonaws.com/release/20110521/ALL.chr11.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz is a tabix indexed file. So you will need to download corresponding tabix file as well.
            Hide
            richard Richard Linchangco (Inactive) added a comment -

            I tested using both the .bam file directly from URL:

            http://1000genomes.s3.amazonaws.com/data/HG00111/alignment/HG00111.chrom11.ILLUMINA.bwa.GBR.low_coverage.20111114.bam PASSED

            and the .vcf file directly from URL:

            http://1000genomes.s3.amazonaws.com/release/20110521/ALL.chr11.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz FAILED

            As was commented by Hiral, the .vcf file requires it's accompanying .tbi file. I ended up downloading the .vcf file and the .tbi file as follows and the file worked in IGB:

            • Open a new tab/window on browser and in the web address field input the following links and hit enter.
            • The files should immediately begin downloading.
              • The .vcf file is 7.24GB and the .tbi file is 130KB in size so be prepared timewise.

            http://1000genomes.s3.amazonaws.com/release/20110521/ALL.chr11.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz

            http://1000genomes.s3.amazonaws.com/release/20110521/ALL.chr11.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz.tbi

            OR just click on the above links to download both files.

            Show
            richard Richard Linchangco (Inactive) added a comment - I tested using both the .bam file directly from URL: http://1000genomes.s3.amazonaws.com/data/HG00111/alignment/HG00111.chrom11.ILLUMINA.bwa.GBR.low_coverage.20111114.bam PASSED and the .vcf file directly from URL: http://1000genomes.s3.amazonaws.com/release/20110521/ALL.chr11.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz FAILED As was commented by Hiral, the .vcf file requires it's accompanying .tbi file. I ended up downloading the .vcf file and the .tbi file as follows and the file worked in IGB: Open a new tab/window on browser and in the web address field input the following links and hit enter. The files should immediately begin downloading. The .vcf file is 7.24GB and the .tbi file is 130KB in size so be prepared timewise. http://1000genomes.s3.amazonaws.com/release/20110521/ALL.chr11.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz http://1000genomes.s3.amazonaws.com/release/20110521/ALL.chr11.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz.tbi OR just click on the above links to download both files.

              People

              • Assignee:
                Unassigned
                Reporter:
                ann.loraine Ann Loraine
              • Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: