[BUG-604] try loading 1000 genomes file into IGB - JIRA UNCC

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Labels:
None

Description

TASK:

Write a SHORT vignette (succinct as possible) for the User's Guide describing accessing 1000 genomes file from AWS

Before you get started, spend an hour reading up on the 1000 genomes project, what it is, and so on.

Also, before you get started, see this post on IGV groups:

https://groups.google.com/forum/#!topic/igv-help/Z6vF2n8nzSc[1-25]

Hi everyone,

I would like to compare my own bam file with data from the 1000 Genomes list (I usually pull the last one up through "File -> Load from Server").

I succeeded in writing a batch script that shows my own bam file and saves a snapshot, but how can I add the track from the 1000 Genomes data to this?

Thanks for any help with this!

Best,
~Lina

Hi,

If you know the URL to the file, either ftp or http, you can use "Load from URL..." and paste it in, or the batch load command. For example,

ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/data/HG00111/alignment/HG00111.chrom11.ILLUMINA.bwa.GBR.low_coverage.20111114.bam

the same data can be loaded from the "cloud" with this URL

http://1000genomes.s3.amazonaws.com/data/HG00111/alignment/HG00111.chrom11.ILLUMINA.bwa.GBR.low_coverage.20111114.bam

In general I recommend use of the cloud URLs, performance will be much better. Unfortunately browsing the cloud dataset is not easy without a tool, however it usually works to first find the data browsing the ftp site at ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/data/, the replacing the first part of the URL with the cloud eqiuvalent, i.e. replace

ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp

with

http://1000genomes.s3.amazonaws.com

For VCF files I recommend just downloading them, they aren't so huge and performance is better. If you want to load them remotely the cloud URLs are your only option (other than downloading the file) as ftp is not supported for tabix indexed files.

http://1000genomes.s3.amazonaws.com/release/20110521/ALL.chr11.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz

Jim

show quoted text -

On Thursday, August 2, 2012 5:54:24 PM UTC-4, Lina Faller wrote:

Hi everyone,

I would like to compare my own bam file with data from the 1000 Genomes list (I usually pull the last one up through "File -> Load from Server").

I succeeded in writing a batch script that shows my own bam file and saves a snapshot, but how can I add the track from the 1000 Genomes data to this?

Thanks for any help with this!

Best,
~Lina

Attachments

Issue Links

relates to

BUG-622 File format issues

Closed

Activity

Ascending order - Click to sort in descending order

Hide

Permalink

Alyssa Gulledge (Inactive) added a comment - 05/Nov/12 7:03 AM

The bam files work just fine but you HAVE to pull down the bai as well (as always).

The vcf file is giving me trouble - still working on it...

Show

Alyssa Gulledge (Inactive) added a comment - 05/Nov/12 7:03 AM The bam files work just fine but you HAVE to pull down the bai as well (as always). The vcf file is giving me trouble - still working on it...

Hide

Permalink

Ann Loraine added a comment - 05/Nov/12 7:07 AM

What happens if you open from url?

Seems like that should work ???

Show

Ann Loraine added a comment - 05/Nov/12 7:07 AM What happens if you open from url? Seems like that should work ???

Hide

Permalink

Alyssa Gulledge (Inactive) added a comment - 05/Nov/12 7:52 AM

Nope - it does not - I keep getting an error about trying to find chromosomes (from the downloaded file). Could be a VCF issue? Assigning to Hiral to check...

Show

Alyssa Gulledge (Inactive) added a comment - 05/Nov/12 7:52 AM Nope - it does not - I keep getting an error about trying to find chromosomes (from the downloaded file). Could be a VCF issue? Assigning to Hiral to check...

Hide

Permalink

Ann Loraine added a comment - 05/Nov/12 8:11 AM

Check if the header is missing the chromosome names

$ samtools view -H URL

Show

Ann Loraine added a comment - 05/Nov/12 8:11 AM Check if the header is missing the chromosome names $ samtools view -H URL

Hide

Permalink

Hiral Vora (Inactive) added a comment - 05/Nov/12 8:11 AM

I just opened http://1000genomes.s3.amazonaws.com/release/20110521/ALL.chr11.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz and it worked fine.

Show

Hiral Vora (Inactive) added a comment - 05/Nov/12 8:11 AM I just opened http://1000genomes.s3.amazonaws.com/release/20110521/ALL.chr11.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz and it worked fine.

Hide

Permalink

Hiral Vora (Inactive) added a comment - 05/Nov/12 8:12 AM

Btw, http://1000genomes.s3.amazonaws.com/release/20110521/ALL.chr11.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz is a tabix indexed file. So you will need to download corresponding tabix file as well.

Show

Hiral Vora (Inactive) added a comment - 05/Nov/12 8:12 AM Btw, http://1000genomes.s3.amazonaws.com/release/20110521/ALL.chr11.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz is a tabix indexed file. So you will need to download corresponding tabix file as well.

Hide

Permalink

Richard Linchangco (Inactive) added a comment - 28/Nov/12 9:57 AM

I tested using both the .bam file directly from URL:

http://1000genomes.s3.amazonaws.com/data/HG00111/alignment/HG00111.chrom11.ILLUMINA.bwa.GBR.low_coverage.20111114.bam PASSED

and the .vcf file directly from URL:

http://1000genomes.s3.amazonaws.com/release/20110521/ALL.chr11.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz FAILED

As was commented by Hiral, the .vcf file requires it's accompanying .tbi file. I ended up downloading the .vcf file and the .tbi file as follows and the file worked in IGB:

Open a new tab/window on browser and in the web address field input the following links and hit enter.
The files should immediately begin downloading.
- The .vcf file is 7.24GB and the .tbi file is 130KB in size so be prepared timewise.

http://1000genomes.s3.amazonaws.com/release/20110521/ALL.chr11.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz

http://1000genomes.s3.amazonaws.com/release/20110521/ALL.chr11.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz.tbi

OR just click on the above links to download both files.

Show

Richard Linchangco (Inactive) added a comment - 28/Nov/12 9:57 AM I tested using both the .bam file directly from URL: http://1000genomes.s3.amazonaws.com/data/HG00111/alignment/HG00111.chrom11.ILLUMINA.bwa.GBR.low_coverage.20111114.bam PASSED and the .vcf file directly from URL: http://1000genomes.s3.amazonaws.com/release/20110521/ALL.chr11.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz FAILED As was commented by Hiral, the .vcf file requires it's accompanying .tbi file. I ended up downloading the .vcf file and the .tbi file as follows and the file worked in IGB: Open a new tab/window on browser and in the web address field input the following links and hit enter. The files should immediately begin downloading. The .vcf file is 7.24GB and the .tbi file is 130KB in size so be prepared timewise. http://1000genomes.s3.amazonaws.com/release/20110521/ALL.chr11.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz http://1000genomes.s3.amazonaws.com/release/20110521/ALL.chr11.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz.tbi OR just click on the above links to download both files.

People

Assignee:

Unassigned

Reporter:

Ann Loraine

Votes:

0 Vote for this issue

Watchers:

0 Start watching this issue

Dates

Created:

16/Aug/12 12:20 PM

Updated:

28/Nov/12 10:52 AM

Resolved:

28/Nov/12 10:52 AM