[IGBF-3329] Create and test SRP371294 sample sheet spreadsheet for adding data to "hotpollen" annots.xml - JIRA UNCC

Details

Type: Task
Status: To-Do (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Labels:
None

Story Points:
2
Epic Link:
Support NSF pollen grant
Sprint:
Spring 9 2023 May 1, Summer 2 2023 May 29

Description

Now that we have processed data from SRP371294 data, we need to make the processed data files available for visualization in IGB.

For this, we will make a sample spreadsheet named SRP371294_sample_sheet.xlsx and then use it as a new input to the script "makeAnnotsXml.py" residing in hotpollen repo "genome-browser-visualization" to enable users to view the data in IGB.

To do:

Review existing sample sheets in "splicing analysis" repository in https://bitbucket.org/hotpollen/splicing-analysis/src/main/ExternalData/
Create a new spreadsheet named SRP371294_sample_sheet.xlsx and fill in the columns using information from Sequence Read Archive records for SRP371294. Use the existing sample spreadsheets as a guide for how best to do this.
Modify the script "makeAnnotsXml.py" in hotpollen repo "genome-browser-visualization" to include the new spreadsheet. Note that if you have errors in the spreadsheet, the script will likely fail. Also note that you will have modify the script to accommodate a new dataset that not tomato and that does not reside in the same physical location as the others.
To check the annots.xml file and the resulting visualizations, following the instructions in the "README" file at https://bitbucket.org/hotpollen/genome-browser-visualization/src/main/

To Test:

Check that every file can be loaded into IGB
Visualize genes SR45A and SR30 to observe possible splicing differences between sample types.

Other relevant information:

The physical location of the data files on our data deployment host is: http://lorainelab-quickload.scidas.org/rnaseq/A_thaliana_Jun_2009/SRP371294/
Genome version for this data set is A_thaliana_Jun_2009

Attachments

Options
- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

Attachments

SRP371294_sample_sheet.xlsx
7 kB
12/May/23 3:23 PM

Issue Links

relates to

IGBF-3326 Create KP 2023 sample sheet needed for Quickload site and more

Closed

IGBF-3258 Download and process SRP371294 RNA-Seq data

Closed

Activity

Ascending order - Click to sort in descending order

Hide

Permalink

Ann Loraine added a comment - 11/May/23 8:27 AM

Check that the URL column exactly matches URL columns for other datasets from SRA. I'm pretty sure that the URL values for those do not include the "SRP" value.

Look at how the make_annots_xml.py script uses the values coming from the "URL" column when the study is from SRA.

Show

Ann Loraine added a comment - 11/May/23 8:27 AM Check that the URL column exactly matches URL columns for other datasets from SRA. I'm pretty sure that the URL values for those do not include the "SRP" value. Look at how the make_annots_xml.py script uses the values coming from the "URL" column when the study is from SRA.

Hide

Permalink

Ann Loraine added a comment - 11/May/23 8:47 AM - edited

Documentation for how to run the script "makeAnnotsXml.py" from https://bitbucket.org/hotpollen/genome-browser-visualization/src/main/

1) Clone the repository to your desktop as usual
2) Install or download additional libraries and code needed to run the script.

Required libraries include:

pandas library - follow instructions for pandas library (google it)
custom code from Loraine Lab - https://bitbucket.org/lorainelab/igbquickload/src/master/ - follow instructions in the "README" Note that to ensure that the makeAnnots script needs some code modules (not scripts) defined in the Loraine Lab codebase. To ensure that the makeAnnots script can load this, you need to create an environment variable that specifies its location. Use "google" and see instructions in the Loraine Lab code README.

4) Test that when you run the script, it can find the required libraries and custom code from previous step.

To run it it in Terminal, change into the directory containing the script and enter its name. For example:

local aloraine$ cd src/genome-browser-visualization/
local aloraine$ ./makeAnnotsXml.py
local aloraine$

Note that if the script runs and does not print any errors, that means it did its job properly.

If there are errors, they will likely have to do with the script not being able to locate either the required pandas library, the required Loraine Lab custom code, or the required data files.

Read the script and compare what you find to the errors message to figure out the problem. This process will take some time as you get more familiar with python, the language of the script, and so on.

Show

Ann Loraine added a comment - 11/May/23 8:47 AM - edited Documentation for how to run the script "makeAnnotsXml.py" from https://bitbucket.org/hotpollen/genome-browser-visualization/src/main/ 1) Clone the repository to your desktop as usual 2) Install or download additional libraries and code needed to run the script. Required libraries include: pandas library - follow instructions for pandas library (google it) custom code from Loraine Lab - https://bitbucket.org/lorainelab/igbquickload/src/master/ - follow instructions in the "README" Note that to ensure that the makeAnnots script needs some code modules (not scripts) defined in the Loraine Lab codebase. To ensure that the makeAnnots script can load this, you need to create an environment variable that specifies its location. Use "google" and see instructions in the Loraine Lab code README. 4) Test that when you run the script, it can find the required libraries and custom code from previous step. To run it it in Terminal, change into the directory containing the script and enter its name. For example: local aloraine$ cd src/genome-browser-visualization/ local aloraine$ ./makeAnnotsXml.py local aloraine$ Note that if the script runs and does not print any errors, that means it did its job properly. If there are errors, they will likely have to do with the script not being able to locate either the required pandas library, the required Loraine Lab custom code, or the required data files. Read the script and compare what you find to the errors message to figure out the problem. This process will take some time as you get more familiar with python, the language of the script, and so on.

Hide

Permalink

Molly Davis added a comment - 11/May/23 4:33 PM - edited

Error:

(panda_env) Mollys-iMac:genome-browser-visualization mollydavis333$ ./makeAnnotsXml.py                                                                     
Traceback (most recent call last):                                              
  File "/Users/mollydavis333/Desktop/genome-browser-visualization/./makeAnnotsXml.py", line 12, in <module>
    from Quickload import *
ModuleNotFoundError: No module named 'Quickload'

Solution: PYTHONPATH

Show

Molly Davis added a comment - 11/May/23 4:33 PM - edited Error : (panda_env) Mollys-iMac:genome-browser-visualization mollydavis333$ ./makeAnnotsXml.py Traceback (most recent call last): File "/Users/mollydavis333/Desktop/genome-browser-visualization/./makeAnnotsXml.py" , line 12, in <module> from Quickload import * ModuleNotFoundError: No module named 'Quickload' Solution : PYTHONPATH

Hide

Permalink

Molly Davis added a comment - 12/May/23 10:26 AM - edited

Used solution error 2:

(panda_env) Mollys-iMac:genome-browser-visualization mollydavis333$ export PYTHONPATH=/Users/mollydavis333/Desktop/igbquickload
(panda_env) Mollys-iMac:genome-browser-visualization mollydavis333$ ./makeAnnotsXml.py
Traceback (most recent call last):
  File "/Users/mollydavis333/Desktop/genome-browser-visualization/./makeAnnotsXml.py", line 16, in <module>
    import requests
ModuleNotFoundError: No module named 'requests'

Solution:

python -m pip install requests

Notes: Clone the repos with all of the sample sheets.

pip install openpyxl

Script Ran successfully!

Next Step: Add new sample sheets to script

Show

Molly Davis added a comment - 12/May/23 10:26 AM - edited Used solution error 2 : (panda_env) Mollys-iMac:genome-browser-visualization mollydavis333$ export PYTHONPATH=/Users/mollydavis333/Desktop/igbquickload (panda_env) Mollys-iMac:genome-browser-visualization mollydavis333$ ./makeAnnotsXml.py Traceback (most recent call last): File "/Users/mollydavis333/Desktop/genome-browser-visualization/./makeAnnotsXml.py" , line 16, in <module> import requests ModuleNotFoundError: No module named 'requests' Solution : python -m pip install requests Notes: Clone the repos with all of the sample sheets. pip install openpyxl Script Ran successfully! Next Step: Add new sample sheets to script

Hide

Permalink

Molly Davis added a comment - 12/May/23 3:24 PM - edited

Sample sheet Draft 2:
SRP371294_sample_sheet.xlsx

Added new URL for SRA dataset

Show

Molly Davis added a comment - 12/May/23 3:24 PM - edited Sample sheet Draft 2 : SRP371294_sample_sheet.xlsx Added new URL for SRA dataset

Hide

Permalink

Molly Davis added a comment - 12/May/23 3:44 PM

Sample sheet:

Branch: https://bitbucket.org/mdavis4290/molly-splicing-analysis/branch/IGBF-3329
Pull Request: https://bitbucket.org/hotpollen/splicing-analysis/pull-requests/10

Notes: Just adding sample sheet. Recent changes were to URL. Still needs to be tested in makeAnnotsXml.py once it is added to team repo

Show

Molly Davis added a comment - 12/May/23 3:44 PM Sample sheet : Branch : https://bitbucket.org/mdavis4290/molly-splicing-analysis/branch/IGBF-3329 Pull Request : https://bitbucket.org/hotpollen/splicing-analysis/pull-requests/10 Notes: Just adding sample sheet. Recent changes were to URL. Still needs to be tested in makeAnnotsXml.py once it is added to team repo

Hide

Permalink

Ann Loraine added a comment - 17/May/23 10:00 AM

Merged. Moving back to To-Do for testing with makeAnnotsXml.py

Show

Ann Loraine added a comment - 17/May/23 10:00 AM Merged. Moving back to To-Do for testing with makeAnnotsXml.py

Hide

Permalink

Molly Davis added a comment - 17/May/23 2:46 PM

Updating makeAnnotsXml.py:

Branch: https://bitbucket.org/mdavis4290/molly-genome-browser-visualization/branch/IGBF-3329

Pull Request: https://bitbucket.org/hotpollen/genome-browser-visualization/pull-requests/1/igbf-3329-create-and-test-srp371294-sample

Show

Molly Davis added a comment - 17/May/23 2:46 PM Updating makeAnnotsXml.py : Branch : https://bitbucket.org/mdavis4290/molly-genome-browser-visualization/branch/IGBF-3329 Pull Request : https://bitbucket.org/hotpollen/genome-browser-visualization/pull-requests/1/igbf-3329-create-and-test-srp371294-sample

Hide

Permalink

Ann Loraine added a comment - 18/May/23 9:31 AM

The script currently assumes that the genomes are the same for each dataset.
This was fine until now because the datasets added thus far were from tomato.
The script needs to be further edited to accommodate the fact that this dataset is not tomato.

Show

Ann Loraine added a comment - 18/May/23 9:31 AM The script currently assumes that the genomes are the same for each dataset. This was fine until now because the datasets added thus far were from tomato. The script needs to be further edited to accommodate the fact that this dataset is not tomato.

People

Assignee:

Unassigned

Reporter:

Ann Loraine

Votes:

0 Vote for this issue

Watchers:

2 Start watching this issue

Dates

Created:

28/Apr/23 3:09 PM

Updated:

26/Jun/23 10:55 AM