ANNOTS.XML step
1) Opened the run file for this data set in Excel and save it, in Excel format, to tardigrade/Documentation/inputForMakeAnnotsXml (the tardigrade repository)
Note: Open SRP48452_for_AnnotsXml as a reference and guide!
2) Added five new columns to the front of the file, in from of "Run:
- file name prefix
- color
- physical folder
- study name
- display name
- url
3) Used Excel referencing to insert all the values in "Run" in "file name prefix"
4) Inserted hexadecimal colors codes for each sample. Made those cells have the same fill color as the colors I chose to help me assess their potential appearance and contrast in IGB.
5) Inserted the study code (e.g., SRP454305) in "physical folder" column
6) Used Excel reference to insert a human-friendly "study name" - this becomes the name of the folder where the data files will be listed in IGB.
7) Used Excel references to insert human-friendly "display name" values - these become the checkbox labels in IGB.
8) Used Excel references to make URLs for each file / data set. Used the "SRX" values in the existing "Experiment" column to construct the URL.
9) Added new columns as needed after the first five to use for sorting. For example, I added "Concentration" and then sorted the spreadsheet by concentration and then by run so that the lower concentration, control samples would appear first in the IGB data display list.
10) Edited the script makeAnnots.py to include the new spreadsheet in function getSampleSheets. Ran the script, which will add the new data files to annots.xml in tardigrade/ForGenomeBrowsers/quickload.
11) Checked how it looks by adding the above directory to IGB as a new quickload data source.
PREFETCH step
Pre-fetching SRA files with:
in:
/projects/tomato_genome/fnb/dataprocessing/tardigrade/SRP454305
Confirmed it worked with:
using prefetch.sh:
using SRP454305_SraRunTable.txt: