Details
-
Type:
New Feature
-
Status: To-Do (View Workflow)
-
Priority:
Major
-
Resolution: Unresolved
-
Affects Version/s: None
-
Fix Version/s: None
-
Labels:
-
Story Points:1
-
Epic Link:
Description
When users open the metadata panel for a file, the species and genome version is only shown if the user has previously set these values.
This is fine, but it means that a user must manually annotate each file with this information. Also, sometimes it is not possible to do so because the metadata can only be edited by the file's owner.
IGB Quickload sites already have file-based databases that map chromosome names and sizes onto genome versions. Some file types in bioinformatics (e.g., BAM) also contain this information. All genome data files contain location coordinates. Sometimes the structure of the data itself, esp. transcriptome data, can suggest what sort of creature or plant species provided the original biological material that was used to generate the data. In light of all these facts, it may be possible to build a simple computer system that will make a guess about which genome assembly to display the data on top of. I think we could implement a "species and genome version" guesser that fill in these metadata values if the user has not yet provided them.
This may be very difficult to implement. To my knowledge we cannot directly read a files contents using the Terrain API. I think this would be especially true for bam files as they are binary and would need samtools view to be run to view the header info. A potential option would be to create an app that would take a bam file as input and output the header info as a file. We would then have to read this file (somehow?) and parse out the species/genome. However, this would require that we run many jobs for the user (equal to number of files they have?). And we would also need to keep track of which files have had the metadata set and which have not or have been altered by the user.