Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3329

Create and test SRP371294 sample sheet spreadsheet for adding data to "hotpollen" annots.xml

    Details

    • Type: Task
    • Status: To-Do (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Now that we have processed data from SRP371294 data, we need to make the processed data files available for visualization in IGB.

      For this, we will make a sample spreadsheet named SRP371294_sample_sheet.xlsx and then use it as a new input to the script "makeAnnotsXml.py" residing in hotpollen repo "genome-browser-visualization" to enable users to view the data in IGB.

      To do:

      • Review existing sample sheets in "splicing analysis" repository in https://bitbucket.org/hotpollen/splicing-analysis/src/main/ExternalData/
      • Create a new spreadsheet named SRP371294_sample_sheet.xlsx and fill in the columns using information from Sequence Read Archive records for SRP371294. Use the existing sample spreadsheets as a guide for how best to do this.
      • Modify the script "makeAnnotsXml.py" in hotpollen repo "genome-browser-visualization" to include the new spreadsheet. Note that if you have errors in the spreadsheet, the script will likely fail. Also note that you will have modify the script to accommodate a new dataset that not tomato and that does not reside in the same physical location as the others.
      • To check the annots.xml file and the resulting visualizations, following the instructions in the "README" file at https://bitbucket.org/hotpollen/genome-browser-visualization/src/main/

      To Test:

      • Check that every file can be loaded into IGB
      • Visualize genes SR45A and SR30 to observe possible splicing differences between sample types.

      Other relevant information:

        Attachments

          Issue Links

            Activity

            Hide
            ann.loraine Ann Loraine added a comment -

            The script currently assumes that the genomes are the same for each dataset.
            This was fine until now because the datasets added thus far were from tomato.
            The script needs to be further edited to accommodate the fact that this dataset is not tomato.

            Show
            ann.loraine Ann Loraine added a comment - The script currently assumes that the genomes are the same for each dataset. This was fine until now because the datasets added thus far were from tomato. The script needs to be further edited to accommodate the fact that this dataset is not tomato.
            Show
            Mdavis4290 Molly Davis added a comment - Updating makeAnnotsXml.py : Branch : https://bitbucket.org/mdavis4290/molly-genome-browser-visualization/branch/IGBF-3329 Pull Request : https://bitbucket.org/hotpollen/genome-browser-visualization/pull-requests/1/igbf-3329-create-and-test-srp371294-sample
            Hide
            ann.loraine Ann Loraine added a comment -

            Merged. Moving back to To-Do for testing with makeAnnotsXml.py

            Show
            ann.loraine Ann Loraine added a comment - Merged. Moving back to To-Do for testing with makeAnnotsXml.py
            Hide
            Mdavis4290 Molly Davis added a comment -

            Sample sheet:

            Branch: https://bitbucket.org/mdavis4290/molly-splicing-analysis/branch/IGBF-3329
            Pull Request: https://bitbucket.org/hotpollen/splicing-analysis/pull-requests/10

            Notes: Just adding sample sheet. Recent changes were to URL. Still needs to be tested in makeAnnotsXml.py once it is added to team repo

            Show
            Mdavis4290 Molly Davis added a comment - Sample sheet : Branch : https://bitbucket.org/mdavis4290/molly-splicing-analysis/branch/IGBF-3329 Pull Request : https://bitbucket.org/hotpollen/splicing-analysis/pull-requests/10 Notes: Just adding sample sheet. Recent changes were to URL. Still needs to be tested in makeAnnotsXml.py once it is added to team repo
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Sample sheet Draft 2:
            SRP371294_sample_sheet.xlsx

            • Added new URL for SRA dataset
            Show
            Mdavis4290 Molly Davis added a comment - - edited Sample sheet Draft 2 : SRP371294_sample_sheet.xlsx Added new URL for SRA dataset
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Used solution error 2:

            (panda_env) Mollys-iMac:genome-browser-visualization mollydavis333$ export PYTHONPATH=/Users/mollydavis333/Desktop/igbquickload
            (panda_env) Mollys-iMac:genome-browser-visualization mollydavis333$ ./makeAnnotsXml.py
            Traceback (most recent call last):
              File "/Users/mollydavis333/Desktop/genome-browser-visualization/./makeAnnotsXml.py", line 16, in <module>
                import requests
            ModuleNotFoundError: No module named 'requests'
            

            Solution:

            python -m pip install requests
            

            Notes: Clone the repos with all of the sample sheets.

            pip install openpyxl
            

            Script Ran successfully!

            Next Step: Add new sample sheets to script

            Show
            Mdavis4290 Molly Davis added a comment - - edited Used solution error 2 : (panda_env) Mollys-iMac:genome-browser-visualization mollydavis333$ export PYTHONPATH=/Users/mollydavis333/Desktop/igbquickload (panda_env) Mollys-iMac:genome-browser-visualization mollydavis333$ ./makeAnnotsXml.py Traceback (most recent call last): File "/Users/mollydavis333/Desktop/genome-browser-visualization/./makeAnnotsXml.py" , line 16, in <module> import requests ModuleNotFoundError: No module named 'requests' Solution : python -m pip install requests Notes: Clone the repos with all of the sample sheets. pip install openpyxl Script Ran successfully! Next Step: Add new sample sheets to script
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Error:

            (panda_env) Mollys-iMac:genome-browser-visualization mollydavis333$ ./makeAnnotsXml.py                                                                     
            Traceback (most recent call last):                                              
              File "/Users/mollydavis333/Desktop/genome-browser-visualization/./makeAnnotsXml.py", line 12, in <module>
                from Quickload import *
            ModuleNotFoundError: No module named 'Quickload'
            

            Solution: PYTHONPATH

            Show
            Mdavis4290 Molly Davis added a comment - - edited Error : (panda_env) Mollys-iMac:genome-browser-visualization mollydavis333$ ./makeAnnotsXml.py Traceback (most recent call last): File "/Users/mollydavis333/Desktop/genome-browser-visualization/./makeAnnotsXml.py" , line 12, in <module> from Quickload import * ModuleNotFoundError: No module named 'Quickload' Solution : PYTHONPATH
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Documentation for how to run the script "makeAnnotsXml.py" from https://bitbucket.org/hotpollen/genome-browser-visualization/src/main/

            1) Clone the repository to your desktop as usual
            2) Install or download additional libraries and code needed to run the script.

            Required libraries include:

            • pandas library - follow instructions for pandas library (google it)
            • custom code from Loraine Lab - https://bitbucket.org/lorainelab/igbquickload/src/master/ - follow instructions in the "README" Note that to ensure that the makeAnnots script needs some code modules (not scripts) defined in the Loraine Lab codebase. To ensure that the makeAnnots script can load this, you need to create an environment variable that specifies its location. Use "google" and see instructions in the Loraine Lab code README.

            4) Test that when you run the script, it can find the required libraries and custom code from previous step.

            To run it it in Terminal, change into the directory containing the script and enter its name. For example:

            local aloraine$ cd src/genome-browser-visualization/
            local aloraine$ ./makeAnnotsXml.py
            local aloraine$ 
            

            Note that if the script runs and does not print any errors, that means it did its job properly.

            If there are errors, they will likely have to do with the script not being able to locate either the required pandas library, the required Loraine Lab custom code, or the required data files.

            Read the script and compare what you find to the errors message to figure out the problem. This process will take some time as you get more familiar with python, the language of the script, and so on.

            Show
            ann.loraine Ann Loraine added a comment - - edited Documentation for how to run the script "makeAnnotsXml.py" from https://bitbucket.org/hotpollen/genome-browser-visualization/src/main/ 1) Clone the repository to your desktop as usual 2) Install or download additional libraries and code needed to run the script. Required libraries include: pandas library - follow instructions for pandas library (google it) custom code from Loraine Lab - https://bitbucket.org/lorainelab/igbquickload/src/master/ - follow instructions in the "README" Note that to ensure that the makeAnnots script needs some code modules (not scripts) defined in the Loraine Lab codebase. To ensure that the makeAnnots script can load this, you need to create an environment variable that specifies its location. Use "google" and see instructions in the Loraine Lab code README. 4) Test that when you run the script, it can find the required libraries and custom code from previous step. To run it it in Terminal, change into the directory containing the script and enter its name. For example: local aloraine$ cd src/genome-browser-visualization/ local aloraine$ ./makeAnnotsXml.py local aloraine$ Note that if the script runs and does not print any errors, that means it did its job properly. If there are errors, they will likely have to do with the script not being able to locate either the required pandas library, the required Loraine Lab custom code, or the required data files. Read the script and compare what you find to the errors message to figure out the problem. This process will take some time as you get more familiar with python, the language of the script, and so on.
            Hide
            ann.loraine Ann Loraine added a comment -

            Check that the URL column exactly matches URL columns for other datasets from SRA. I'm pretty sure that the URL values for those do not include the "SRP" value.

            Look at how the make_annots_xml.py script uses the values coming from the "URL" column when the study is from SRA.

            Show
            ann.loraine Ann Loraine added a comment - Check that the URL column exactly matches URL columns for other datasets from SRA. I'm pretty sure that the URL values for those do not include the "SRP" value. Look at how the make_annots_xml.py script uses the values coming from the "URL" column when the study is from SRA.

              People

              • Assignee:
                Unassigned
                Reporter:
                ann.loraine Ann Loraine
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated: