Hi Nowlan! I’ve been looking into your range request issue and wanted to touch base. Converting your commands to curl, I was able to run the following commands successful, using `-r` to pass in my range request:
```curl -k -H "Authorization: Bearer $tok" https://agave.iplantc.org/files/v2/download/nowlanf/system/data.iplantcollaborative.org/nowlanf/Demo/non-smallCellLungCancer-WT.bam -o agave.txt -r 0-199,1000-199```
```curl -k -H "Authorization: Bearer $tok" https://usegalaxy.org/display_application/bbd44e69cb8906b5c5df29f1196be054/igb_bam/View/ea1cf2b8b07ca721/data/Tophat_on_data_14_and_data_13__accepted_hits.bam -o galaxy.txt -r 0-199,1000-199```
So to answer your general question, yes, Agave should be able to handle range requests. However, when passing in the range you were using (`-r 149,968,314-149,968,384`), the usegalaxy file seems to work while the agave file returns writes to a file but it contains 0 bytes. Is the galaxy file (Tophat_on_data_14_and_data_13__accepted_hits.bam) located somewhere on the agave storage system as well? I’d like to eliminate the file as a variable.
Nowlan Freese [12:44 PM]
The galaxy file and the agave file are both the same file, but located on Galaxy's servers and Agave.
I.e., the same bam file but stored in two different locations.
I have been testing a lot of different files, and in general anything with an index (.bai, .tbi) seems to have the most issues.
CIC Support [12:55 PM]
oh that’s interesting, when i request the same byte ranges and do a diff on the files they write to, it shows they are different. I have some additional thoughts I can test though, I’ll keep you updated. and thanks for the information!
Nowlan Freese [12:59 PM]
The file was originally generated in Galaxy. I downloaded it from Galaxy and uploaded it using Agave, so in theory they should be the same.
Nowlan Freese [1:45 PM]
So the link I was using before from agave (https://agave.iplantc.....) did not work. However, if I use the link provided by the Discovery Environment it works perfectly fine.
https://data.cyverse.org/dav-anon/iplant/home/nowlanf/Demo/non-smallCellLungCancer-WT.bam
In theory this is the exact same data, uploaded through Agave, and visible in both Agave and CyVerse. But the Agave link does not work correctly.
Works:
samtools view https://data.cyverse.org/dav-anon/iplant/home/nowlanf/Demo/non-smallCellLungCancer-WT.bam -o agave.txt chr3:149,968,314-149,968,384
Does not work:
samtools view https://agave.iplantc.org/files/v2/download/nowlanf/system/data.iplantcollaborative.org/nowlanf/Demo/non-smallCellLungCancer-WT.bam -o agave.txt chr3:149,968,314-149,968,384 (edited)
CIC Support [2:23 PM]
Ah, this is helpful. So far what I see is that the Agave link works for me if I use smaller ranges. I think the issue is that Agave may not support byte skipping, just based on my testing. These two commands download the same results for me:
```curl -k -H "Authorization: Bearer $tok" https://agave.iplantc.org/files/v2/download/nowlanf/system/data.iplantcollaborative.org/nowlanf/Demo/non-smallCellLungCancer-WT.bam -o agavediff.txt -r 0-199```
```curl -k -H "Authorization: Bearer $tok" https://usegalaxy.org/display_application/bbd44e69cb8906b5c5df29f1196be054/igb_bam/View/ea1cf2b8b07ca721/data/Tophat_on_data_14_and_data_13__accepted_hits.bam -o galaxydiff.txt -r 0-199```
the `-r` flag is the curl equivalent to your `chr3:###` request. If I change the `0-199` to `0-199,1000-199`, AKA requesting 200 bytes from index 0, and 200 bytes from index 1000, the two commands write different results. And then using your range request just results in a blank output from the Agave link, though the usegalaxy.org request appears to work. My manager also just stated that he doesn’t believe Agave supports range requests outside of a straight forward range. Have you tried testing with simpler ranges at all? Not that you’re doing anything wrong, I’m curious if you would see the same behavior as I am.
Nowlan Freese [2:25 PM]
I've tested on smaller files with smaller ranges, and it usually, though not always, works.
CIC Support [2:30 PM]
ah okay. so yes we’re seeing some connectivity issues to CyVerse these last few days. I saw a few of mine fail as well, with logs indicating `failed to connect to remote server` so I chalked up my intermittent failures to this. Would it make sense that’s why they sometimes were not working for you, or did you see a different behavior in their failures?
Nowlan Freese [2:36 PM]
I'm seeing consistent failure/working for a specific file, but inconsistent failure/working between files of the same type. For example, I have two small bam files, one of which fails, while the other works correctly, both in Agave. I have also run into the CyVerse connectivity issues, though that seems to be separate (I think).
CIC Support [2:36 PM]
got it, I can look into those bam files if you’d like. Which ones are they?
Nowlan Freese [2:37 PM]
I'm trying to test them again right now, but I just lost connection to CyVerse.
CIC Support [2:37 PM]
yes unfortunately, our smoke tests are indicating issues at this moment.
We’re investigating it. I’ll reach out when things are looking a little better!
Nowlan Freese [2:40 PM]
Sounds good, thank you for looking into this.
Nowlan Freese [3:17 PM]
Hi! Ann Loraine here (sharing computer with Nowlan).
I have a quick question: By "straightforward range" do you mean: "a range that starts at 0 (beginning of the file)?"
CIC Support [3:21 PM]
Hi Ann, that’s a good question! All of my tests started at 0 and based on my conversation with my manager, I would lean towards yes. BUT I will definitely test this for you first before I leave you at that, as it may not be the case. It shouldn’t take me long to test, I just have to wait on the CyVerse connection issues to clear up a bit - it’s a little rough right now unfortunately. I will get back to you on this ASAP though!
CIC Support [10:13 AM]
Hi Ann & Nowlan! It looks like range requests do indeed work best when they begin with 0 for Agave.
CIC Support [10:26 AM]
I am not sure that is functioning as it should be though. I’m going to discuss this behavior with my manager.
Nowlan Freese [9:36 AM]
Thank you for the update.
It seems odd that Agave would only accept range requests that start with 0, as this would make most requests for genomic data impractical.
CIC Support [10:26 AM]
Yes, I can definitely see your point. So today I’m going to test these range requests a little more and do a code review and bring this to the development team. I know there are plans to rewrite a lot of code so this may be something that already falls under this umbrella, or it may need to be addressed separately. I’m keeping your ticket open until I can get an answer regarding the long term solution for these range requests.
Nowlan Freese [10:27 AM]
Thank you!
CIC Support [1:25 PM]
Kind of interesting - through my testing, I found what appears to be happening with the range requests is that Agave is in some sense doubling the first part of the range (probably not that exactly, but it conveys the behavior I’m seeing).
So a `-r 100-599` on the usegalaxy returns 500 bytes, while this flag returns 400 bytes in agave storage system. `-r 250-699` returns 450 bytes for usegalaxy, but only 200 bytes in agave (so as if i passed in a `-r 500-699` instead). Makes sense why some of my previous tests such as `-r 400-600` requests just wrote 0 to the file, as based on what I’m seeing that would be treated as `800-600` in agave.
I’m running this by devs as a preliminary check that I’m not missing something here, but then I expect this to be escalated as a bug. Nothing to do on your end, just wanted to update you
Hi Dr. Fonner,
I have a question regarding range requests in Agave/iRods. We've been testing data uploaded to CyVerse through Agave, and we're getting some odd results when we try to download data with range requests. So for example if I try and use samtools view to view data from a bam file, I specify the public Agave URL as well as a range, but receive: error closing "myFileURL": -1. I have the same file in Galaxy, and it works perfectly fine.
This works:
samtools view https://usegalaxy.org/display_application/bbd44e69cb8906b5c5df29f1196be054/igb_bam/View/ea1cf2b8b07ca721/data/Tophat_on_data_14_and_data_13__accepted_hits.bam -o galaxy.txt chr3:149,968,314-149,968,384
This does not work:
samtools view https://agave.iplantc.org/files/v2/download/nowlanf/system/data.iplantcollaborative.org/nowlanf/Demo/non-smallCellLungCancer-WT.bam -o agave.txt chr3:149,968,314-149,968,384
Do range requests normally work in Agave/iRods/CyVerse? (edited)
John Fonner [10:55 PM]
Hey Nowlan, I am out on vacation this week (I should update my Slack status). I will pass this on and ask someone to take a look. If you hit other blockers this week, I would ping #support. I'll ask @CIC Support to circle back with you.