Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3035

Create Cyverse App to create and index Bedgraph

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None
    • Story Points:
      4
    • Sprint:
      Fall 9 2021 Dec 13 - Dec 24, Spring 1 2022 Jan 3 - Jan 14, Spring 2 2022 Jan 18 - Jan 28, Spring 3 2022 Jan 31 - Feb 11

      Description

      Use case scenario:

      A user has uploaded or created a BAM file with RNA-Seq data. They would like to create a bedgraph file representing a scaled RNA-Seq coverage graph. However, for the bedgraph to be useful, the file needs to be sorted, compressed, and tabix-indexed. Currently, we have a BioViz CyVerse app that creates a bedgraph file using a tool from the deepTools suite, but the bedgraph file is not compressed or tabix-indexed. Rather than just create the "plain" bedgraph file, let's also create the bedgraph file and then sort, compress, and tabix-index it. This will be much more useful to a user than simply the plain-text, uncompressed, unindexed file.

      This will probably require created a docker image where the docker "run" command is actually running a script that we write that does all of the above. Note that you can potentially obtain the original docker image with deepTools and use that to further provision it with tabix/samtools. Also, be sure to save the docker provisioning file (the docker file).

      Create a Cyverse analyses app that does two things:

      • creates a scaled coverage graph using bamCoverage (from the deepTools suite)
      • sorts and tabix-indexes the resulting bedgraph file (the output from bamCoverage)

      References:

      bamCoverage documentation
      How to index a bedgraph
      tabix and bgzip in Samtools
      Deploying apps in CyVerse
      Tool integration in DE

      Some examples of scripts that do all of the above can be found in our code repositories in various places. Remind [~aloraine] to provide links to the most relevant repositories and scripts.

        Attachments

          Issue Links

            Activity

            karthik Karthik Raveendran created issue -
            karthik Karthik Raveendran made changes -
            Field Original Value New Value
            Epic Link IGBF-2376 [ 18533 ]
            karthik Karthik Raveendran made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            ann.loraine Ann Loraine made changes -
            Description Create a Cyverse analyses app to index Bedgraph.
            [How to index a bedgraph | https://www.biostars.org/p/121967/]
            [tabix and bgzip in Samtools | http://www.htslib.org/doc/tabix.html]
            [Deploying apps in CyVerse | https://learning.cyverse.org/projects/container_camp_workshop_2019/en/latest/cyverse/de_docker.html]
            [Tool integration in DE | https://learning.cyverse.org/projects/Container-camp-2020/en/latest/cyverse/tool_integration_app_building_DE.html]
            Use case scenario:

            A user has uploaded or created a BAM file with RNA-Seq data. They would like to create a bedgraph file representing a scaled RNA-Seq coverage graph. However, for the bedgraph to be useful, the file needs to be sorted, compressed, and tabix-indexed. Currently, we have a BioViz CyVerse app that creates a bedgraph file using genomeCov from the deepTools suite, but the bedgraph file is not compressed or tabix-indexed. Rather than just create the "plain" bedgraph file, let's also create the bedgraph file and then sort, compress, and tabix-index it. This will be much more useful to a user than simply the plain-text, uncompressed, unindexed file.

            This will probably require created a docker image where the docker "run" command is actually running a script that we write that does all of the above.



            Create a Cyverse analyses app to index Bedgraph.
            [How to index a bedgraph | https://www.biostars.org/p/121967/]
            [tabix and bgzip in Samtools | http://www.htslib.org/doc/tabix.html]
            [Deploying apps in CyVerse | https://learning.cyverse.org/projects/container_camp_workshop_2019/en/latest/cyverse/de_docker.html]
            [Tool integration in DE | https://learning.cyverse.org/projects/Container-camp-2020/en/latest/cyverse/tool_integration_app_building_DE.html]
            ann.loraine Ann Loraine made changes -
            Description Use case scenario:

            A user has uploaded or created a BAM file with RNA-Seq data. They would like to create a bedgraph file representing a scaled RNA-Seq coverage graph. However, for the bedgraph to be useful, the file needs to be sorted, compressed, and tabix-indexed. Currently, we have a BioViz CyVerse app that creates a bedgraph file using genomeCov from the deepTools suite, but the bedgraph file is not compressed or tabix-indexed. Rather than just create the "plain" bedgraph file, let's also create the bedgraph file and then sort, compress, and tabix-index it. This will be much more useful to a user than simply the plain-text, uncompressed, unindexed file.

            This will probably require created a docker image where the docker "run" command is actually running a script that we write that does all of the above.



            Create a Cyverse analyses app to index Bedgraph.
            [How to index a bedgraph | https://www.biostars.org/p/121967/]
            [tabix and bgzip in Samtools | http://www.htslib.org/doc/tabix.html]
            [Deploying apps in CyVerse | https://learning.cyverse.org/projects/container_camp_workshop_2019/en/latest/cyverse/de_docker.html]
            [Tool integration in DE | https://learning.cyverse.org/projects/Container-camp-2020/en/latest/cyverse/tool_integration_app_building_DE.html]
            Use case scenario:

            A user has uploaded or created a BAM file with RNA-Seq data. They would like to create a bedgraph file representing a scaled RNA-Seq coverage graph. However, for the bedgraph to be useful, the file needs to be sorted, compressed, and tabix-indexed. Currently, we have a BioViz CyVerse app that creates a bedgraph file using genomeCov from the deepTools suite, but the bedgraph file is not compressed or tabix-indexed. Rather than just create the "plain" bedgraph file, let's also create the bedgraph file and then sort, compress, and tabix-index it. This will be much more useful to a user than simply the plain-text, uncompressed, unindexed file.

            This will probably require created a docker image where the docker "run" command is actually running a script that we write that does all of the above.



            Create a Cyverse analyses app to index Bedgraph.
            [How to index a bedgraph | https://www.biostars.org/p/121967/]
            [tabix and bgzip in Samtools | http://www.htslib.org/doc/tabix.html]
            [Deploying apps in CyVerse | https://learning.cyverse.org/projects/container_camp_workshop_2019/en/latest/cyverse/de_docker.html]
            [Tool integration in DE | https://learning.cyverse.org/projects/Container-camp-2020/en/latest/cyverse/tool_integration_app_building_DE.html]

            Some examples of scripts that do all of the above can be found in our code repositories in various places. Remind [~aloraine] to provide links to the most relevant repositories and scripts.
            ann.loraine Ann Loraine made changes -
            Description Use case scenario:

            A user has uploaded or created a BAM file with RNA-Seq data. They would like to create a bedgraph file representing a scaled RNA-Seq coverage graph. However, for the bedgraph to be useful, the file needs to be sorted, compressed, and tabix-indexed. Currently, we have a BioViz CyVerse app that creates a bedgraph file using genomeCov from the deepTools suite, but the bedgraph file is not compressed or tabix-indexed. Rather than just create the "plain" bedgraph file, let's also create the bedgraph file and then sort, compress, and tabix-index it. This will be much more useful to a user than simply the plain-text, uncompressed, unindexed file.

            This will probably require created a docker image where the docker "run" command is actually running a script that we write that does all of the above.



            Create a Cyverse analyses app to index Bedgraph.
            [How to index a bedgraph | https://www.biostars.org/p/121967/]
            [tabix and bgzip in Samtools | http://www.htslib.org/doc/tabix.html]
            [Deploying apps in CyVerse | https://learning.cyverse.org/projects/container_camp_workshop_2019/en/latest/cyverse/de_docker.html]
            [Tool integration in DE | https://learning.cyverse.org/projects/Container-camp-2020/en/latest/cyverse/tool_integration_app_building_DE.html]

            Some examples of scripts that do all of the above can be found in our code repositories in various places. Remind [~aloraine] to provide links to the most relevant repositories and scripts.
            Use case scenario:

            A user has uploaded or created a BAM file with RNA-Seq data. They would like to create a bedgraph file representing a scaled RNA-Seq coverage graph. However, for the bedgraph to be useful, the file needs to be sorted, compressed, and tabix-indexed. Currently, we have a BioViz CyVerse app that creates a bedgraph file using genomeCov from the deepTools suite, but the bedgraph file is not compressed or tabix-indexed. Rather than just create the "plain" bedgraph file, let's also create the bedgraph file and then sort, compress, and tabix-index it. This will be much more useful to a user than simply the plain-text, uncompressed, unindexed file.

            This will probably require created a docker image where the docker "run" command is actually running a script that we write that does all of the above. Note that you can potentially obtain the original docker image with deepTools and use that to further provision it with tabix/samtools.



            Create a Cyverse analyses app to index Bedgraph.
            [How to index a bedgraph | https://www.biostars.org/p/121967/]
            [tabix and bgzip in Samtools | http://www.htslib.org/doc/tabix.html]
            [Deploying apps in CyVerse | https://learning.cyverse.org/projects/container_camp_workshop_2019/en/latest/cyverse/de_docker.html]
            [Tool integration in DE | https://learning.cyverse.org/projects/Container-camp-2020/en/latest/cyverse/tool_integration_app_building_DE.html]

            Some examples of scripts that do all of the above can be found in our code repositories in various places. Remind [~aloraine] to provide links to the most relevant repositories and scripts.
            ann.loraine Ann Loraine made changes -
            Description Use case scenario:

            A user has uploaded or created a BAM file with RNA-Seq data. They would like to create a bedgraph file representing a scaled RNA-Seq coverage graph. However, for the bedgraph to be useful, the file needs to be sorted, compressed, and tabix-indexed. Currently, we have a BioViz CyVerse app that creates a bedgraph file using genomeCov from the deepTools suite, but the bedgraph file is not compressed or tabix-indexed. Rather than just create the "plain" bedgraph file, let's also create the bedgraph file and then sort, compress, and tabix-index it. This will be much more useful to a user than simply the plain-text, uncompressed, unindexed file.

            This will probably require created a docker image where the docker "run" command is actually running a script that we write that does all of the above. Note that you can potentially obtain the original docker image with deepTools and use that to further provision it with tabix/samtools.



            Create a Cyverse analyses app to index Bedgraph.
            [How to index a bedgraph | https://www.biostars.org/p/121967/]
            [tabix and bgzip in Samtools | http://www.htslib.org/doc/tabix.html]
            [Deploying apps in CyVerse | https://learning.cyverse.org/projects/container_camp_workshop_2019/en/latest/cyverse/de_docker.html]
            [Tool integration in DE | https://learning.cyverse.org/projects/Container-camp-2020/en/latest/cyverse/tool_integration_app_building_DE.html]

            Some examples of scripts that do all of the above can be found in our code repositories in various places. Remind [~aloraine] to provide links to the most relevant repositories and scripts.
            Use case scenario:

            A user has uploaded or created a BAM file with RNA-Seq data. They would like to create a bedgraph file representing a scaled RNA-Seq coverage graph. However, for the bedgraph to be useful, the file needs to be sorted, compressed, and tabix-indexed. Currently, we have a BioViz CyVerse app that creates a bedgraph file using genomeCov from the deepTools suite, but the bedgraph file is not compressed or tabix-indexed. Rather than just create the "plain" bedgraph file, let's also create the bedgraph file and then sort, compress, and tabix-index it. This will be much more useful to a user than simply the plain-text, uncompressed, unindexed file.

            This will probably require created a docker image where the docker "run" command is actually running a script that we write that does all of the above. Note that you can potentially obtain the original docker image with deepTools and use that to further provision it with tabix/samtools. Also, be sure to save the docker provisioning file (the docker file).



            Create a Cyverse analyses app to index Bedgraph.
            [How to index a bedgraph | https://www.biostars.org/p/121967/]
            [tabix and bgzip in Samtools | http://www.htslib.org/doc/tabix.html]
            [Deploying apps in CyVerse | https://learning.cyverse.org/projects/container_camp_workshop_2019/en/latest/cyverse/de_docker.html]
            [Tool integration in DE | https://learning.cyverse.org/projects/Container-camp-2020/en/latest/cyverse/tool_integration_app_building_DE.html]

            Some examples of scripts that do all of the above can be found in our code repositories in various places. Remind [~aloraine] to provide links to the most relevant repositories and scripts.
            Hide
            ann.loraine Ann Loraine added a comment -

            A docker project for your reference: https://bitbucket.org/lorainelab/integrated-genome-browser-docker/src/master/
            I also have set up a Docker VM that I use for making the IGB docker image. See EC2 list in the lorainelab AWS account.

            Show
            ann.loraine Ann Loraine added a comment - A docker project for your reference: https://bitbucket.org/lorainelab/integrated-genome-browser-docker/src/master/ I also have set up a Docker VM that I use for making the IGB docker image. See EC2 list in the lorainelab AWS account.
            karthik Karthik Raveendran made changes -
            Status In Progress [ 3 ] To-Do [ 10305 ]
            karthik Karthik Raveendran made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            ann.loraine Ann Loraine made changes -
            Sprint Fall 9 2021 Dec 13 - Dec 24 [ 135 ] Fall 9 2021 Dec 13 - Dec 24, Spring 1 2022 Jan 3 - Jan 14 [ 135, 136 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            Hide
            karthik Karthik Raveendran added a comment - - edited

            The igb_tbi_creator application is a bash script that uses the sort and tabix commands provided in the link in issue description, How to index a bedgraph . These commands are from the htslib library, a C library for reading and writing data from Samtools. the htslib library has to be downloaded, installed and exported to PATH variable in ubuntu and there are quite a few dependencies that needs to be installed before doing all that (gcc, apt-utils,make,libbz2-dev,zlib1g-dev,libncurses5-dev,libncursesw5-dev,liblzma-dev,wget,curl,bzip2)

            In order to dockerize the application, I followed the instructions provided in Deploying apps in Cyverse in the description but I found Programming with Mosh tutorial much more helpful in this regard. All the dependencies, mentioned above is added in the Dockerfile with which the docker image is created. While running the image in a container, a local host folder(with the bedgraph file) is mounted to a container folder with the command:

            docker run -v C:/Users/karth/hello-docker/:/app igb_tbi_creator

            Show
            karthik Karthik Raveendran added a comment - - edited The igb_tbi_creator application is a bash script that uses the sort and tabix commands provided in the link in issue description, How to index a bedgraph . These commands are from the htslib library, a C library for reading and writing data from Samtools. the htslib library has to be downloaded, installed and exported to PATH variable in ubuntu and there are quite a few dependencies that needs to be installed before doing all that (gcc, apt-utils,make,libbz2-dev,zlib1g-dev,libncurses5-dev,libncursesw5-dev,liblzma-dev,wget,curl,bzip2) In order to dockerize the application, I followed the instructions provided in Deploying apps in Cyverse in the description but I found Programming with Mosh tutorial much more helpful in this regard. All the dependencies, mentioned above is added in the Dockerfile with which the docker image is created. While running the image in a container, a local host folder(with the bedgraph file) is mounted to a container folder with the command: docker run -v C:/Users/karth/hello-docker/:/app igb_tbi_creator
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Notes from scrum: Karthik Raveendran mentions not understanding how to pass the file to be processed in the docker run command.

            [~aloraine] notes: Actually, the tabix-index target file should not be the argument passed to the image. Instead, the argument should be the alignments BAM file and there should be an index (.bai file) for the target file located in the same directory with the BAM file. What should happen is that a deepTools command should be run to create a scaled bedgraph file. Next, this output file needs to be sorted and compressed using samtools bgzip. After that, the tabix program needs to then index the scaled, compressed bedgraph file. Actually there are several processing steps that need to happen, using different programs. So you need to make a Docker image that has all of these programs.

            Show
            ann.loraine Ann Loraine added a comment - - edited Notes from scrum: Karthik Raveendran mentions not understanding how to pass the file to be processed in the docker run command. [~aloraine] notes: Actually, the tabix-index target file should not be the argument passed to the image. Instead, the argument should be the alignments BAM file and there should be an index (.bai file) for the target file located in the same directory with the BAM file. What should happen is that a deepTools command should be run to create a scaled bedgraph file. Next, this output file needs to be sorted and compressed using samtools bgzip. After that, the tabix program needs to then index the scaled, compressed bedgraph file. Actually there are several processing steps that need to happen, using different programs. So you need to make a Docker image that has all of these programs.
            Hide
            karthik Karthik Raveendran added a comment - - edited

            Dockerfile

            FROM ubuntu
            COPY . /app
            WORKDIR /app
            RUN apt-get update
            
            RUN apt-get install -y gcc
            RUN apt-get install apt-utils
            RUN apt-get install -y make
            RUN apt-get install -y libbz2-dev
            RUN apt-get install -y zlib1g-dev
            RUN apt-get install -y libncurses5-dev
            RUN apt-get install -y libncursesw5-dev
            RUN apt-get install -y liblzma-dev
            RUN apt-get install -y wget
            RUN apt-get install -y curl
            RUN apt-get install -y bzip2
            
            CMD  ./igb_tbi_creator.sh
            

            igb_tbi_creator_.sh

            cd /app
            wget https://github.com/samtools/htslib/releases/download/1.14/htslib-1.14.tar.bz2
            tar -xf htslib-1.14.tar.bz2
            cd htslib-1.14
            ./configure
            make
            make install
            dir
            export PATH="/app/htslib-1.14:$PATH"
            filepath="/app/BedGraph_HomoSapien.bedGraph"
            sort -k1,1 -k2,2n $filepath | bgzip > "$filepath.gz"
            tabix -s 1 -b 2 -e 3 "$filepath.gz"
            
            Show
            karthik Karthik Raveendran added a comment - - edited Dockerfile FROM ubuntu COPY . /app WORKDIR /app RUN apt-get update RUN apt-get install -y gcc RUN apt-get install apt-utils RUN apt-get install -y make RUN apt-get install -y libbz2-dev RUN apt-get install -y zlib1g-dev RUN apt-get install -y libncurses5-dev RUN apt-get install -y libncursesw5-dev RUN apt-get install -y liblzma-dev RUN apt-get install -y wget RUN apt-get install -y curl RUN apt-get install -y bzip2 CMD ./igb_tbi_creator.sh igb_tbi_creator_.sh cd /app wget https: //github.com/samtools/htslib/releases/download/1.14/htslib-1.14.tar.bz2 tar -xf htslib-1.14.tar.bz2 cd htslib-1.14 ./configure make make install dir export PATH= "/app/htslib-1.14:$PATH" filepath= "/app/BedGraph_HomoSapien.bedGraph" sort -k1,1 -k2,2n $filepath | bgzip > "$filepath.gz" tabix -s 1 -b 2 -e 3 "$filepath.gz"
            Hide
            ann.loraine Ann Loraine added a comment -

            We need to get in touch with the person at CyVerse who created the deepTools Docker image. Ramona Walls and Reetu Tuteja did it. Also ask Amanda Cooksey. See https://cyverse.org/team. Also ask for help via the team chat which you have to find somewhere on the CyVerse site. Log in to the Discovery Environment and see the bottom right corner.

            Show
            ann.loraine Ann Loraine added a comment - We need to get in touch with the person at CyVerse who created the deepTools Docker image. Ramona Walls and Reetu Tuteja did it. Also ask Amanda Cooksey. See https://cyverse.org/team . Also ask for help via the team chat which you have to find somewhere on the CyVerse site. Log in to the Discovery Environment and see the bottom right corner.
            ann.loraine Ann Loraine made changes -
            Link This issue relates to IGBF-2971 [ IGBF-2971 ]
            ann.loraine Ann Loraine made changes -
            Summary Create Cyverse App to Index Bedgraph Create Cyverse App to create and index Bedgraph
            ann.loraine Ann Loraine made changes -
            Description Use case scenario:

            A user has uploaded or created a BAM file with RNA-Seq data. They would like to create a bedgraph file representing a scaled RNA-Seq coverage graph. However, for the bedgraph to be useful, the file needs to be sorted, compressed, and tabix-indexed. Currently, we have a BioViz CyVerse app that creates a bedgraph file using genomeCov from the deepTools suite, but the bedgraph file is not compressed or tabix-indexed. Rather than just create the "plain" bedgraph file, let's also create the bedgraph file and then sort, compress, and tabix-index it. This will be much more useful to a user than simply the plain-text, uncompressed, unindexed file.

            This will probably require created a docker image where the docker "run" command is actually running a script that we write that does all of the above. Note that you can potentially obtain the original docker image with deepTools and use that to further provision it with tabix/samtools. Also, be sure to save the docker provisioning file (the docker file).



            Create a Cyverse analyses app to index Bedgraph.
            [How to index a bedgraph | https://www.biostars.org/p/121967/]
            [tabix and bgzip in Samtools | http://www.htslib.org/doc/tabix.html]
            [Deploying apps in CyVerse | https://learning.cyverse.org/projects/container_camp_workshop_2019/en/latest/cyverse/de_docker.html]
            [Tool integration in DE | https://learning.cyverse.org/projects/Container-camp-2020/en/latest/cyverse/tool_integration_app_building_DE.html]

            Some examples of scripts that do all of the above can be found in our code repositories in various places. Remind [~aloraine] to provide links to the most relevant repositories and scripts.
            Use case scenario:

            A user has uploaded or created a BAM file with RNA-Seq data. They would like to create a bedgraph file representing a scaled RNA-Seq coverage graph. However, for the bedgraph to be useful, the file needs to be sorted, compressed, and tabix-indexed. Currently, we have a BioViz CyVerse app that creates a bedgraph file using genomeCov from the deepTools suite, but the bedgraph file is not compressed or tabix-indexed. Rather than just create the "plain" bedgraph file, let's also create the bedgraph file and then sort, compress, and tabix-index it. This will be much more useful to a user than simply the plain-text, uncompressed, unindexed file.

            This will probably require created a docker image where the docker "run" command is actually running a script that we write that does all of the above. Note that you can potentially obtain the original docker image with deepTools and use that to further provision it with tabix/samtools. Also, be sure to save the docker provisioning file (the docker file).



            Create a Cyverse analyses app to index Bedgraph.
            [How to create a scaled coverage graph (bedgraph) file - bamCoverage |documentation|https://deeptools.readthedocs.io/en/develop/content/tools/bamCoverage.htm]
            [How to index a bedgraph | https://www.biostars.org/p/121967/]
            [tabix and bgzip in Samtools | http://www.htslib.org/doc/tabix.html]
            [Deploying apps in CyVerse | https://learning.cyverse.org/projects/container_camp_workshop_2019/en/latest/cyverse/de_docker.html]
            [Tool integration in DE | https://learning.cyverse.org/projects/Container-camp-2020/en/latest/cyverse/tool_integration_app_building_DE.html]

            Some examples of scripts that do all of the above can be found in our code repositories in various places. Remind [~aloraine] to provide links to the most relevant repositories and scripts.
            ann.loraine Ann Loraine made changes -
            Description Use case scenario:

            A user has uploaded or created a BAM file with RNA-Seq data. They would like to create a bedgraph file representing a scaled RNA-Seq coverage graph. However, for the bedgraph to be useful, the file needs to be sorted, compressed, and tabix-indexed. Currently, we have a BioViz CyVerse app that creates a bedgraph file using genomeCov from the deepTools suite, but the bedgraph file is not compressed or tabix-indexed. Rather than just create the "plain" bedgraph file, let's also create the bedgraph file and then sort, compress, and tabix-index it. This will be much more useful to a user than simply the plain-text, uncompressed, unindexed file.

            This will probably require created a docker image where the docker "run" command is actually running a script that we write that does all of the above. Note that you can potentially obtain the original docker image with deepTools and use that to further provision it with tabix/samtools. Also, be sure to save the docker provisioning file (the docker file).



            Create a Cyverse analyses app to index Bedgraph.
            [How to create a scaled coverage graph (bedgraph) file - bamCoverage |documentation|https://deeptools.readthedocs.io/en/develop/content/tools/bamCoverage.htm]
            [How to index a bedgraph | https://www.biostars.org/p/121967/]
            [tabix and bgzip in Samtools | http://www.htslib.org/doc/tabix.html]
            [Deploying apps in CyVerse | https://learning.cyverse.org/projects/container_camp_workshop_2019/en/latest/cyverse/de_docker.html]
            [Tool integration in DE | https://learning.cyverse.org/projects/Container-camp-2020/en/latest/cyverse/tool_integration_app_building_DE.html]

            Some examples of scripts that do all of the above can be found in our code repositories in various places. Remind [~aloraine] to provide links to the most relevant repositories and scripts.
            Use case scenario:

            A user has uploaded or created a BAM file with RNA-Seq data. They would like to create a bedgraph file representing a scaled RNA-Seq coverage graph. However, for the bedgraph to be useful, the file needs to be sorted, compressed, and tabix-indexed. Currently, we have a BioViz CyVerse app that creates a bedgraph file using genomeCov from the deepTools suite, but the bedgraph file is not compressed or tabix-indexed. Rather than just create the "plain" bedgraph file, let's also create the bedgraph file and then sort, compress, and tabix-index it. This will be much more useful to a user than simply the plain-text, uncompressed, unindexed file.

            This will probably require created a docker image where the docker "run" command is actually running a script that we write that does all of the above. Note that you can potentially obtain the original docker image with deepTools and use that to further provision it with tabix/samtools. Also, be sure to save the docker provisioning file (the docker file).



            Create a Cyverse analyses app to index Bedgraph.
            [bamCoverage documentation|https://deeptools.readthedocs.io/en/develop/content/tools/bamCoverage.htm]
            [How to index a bedgraph | https://www.biostars.org/p/121967/]
            [tabix and bgzip in Samtools | http://www.htslib.org/doc/tabix.html]
            [Deploying apps in CyVerse | https://learning.cyverse.org/projects/container_camp_workshop_2019/en/latest/cyverse/de_docker.html]
            [Tool integration in DE | https://learning.cyverse.org/projects/Container-camp-2020/en/latest/cyverse/tool_integration_app_building_DE.html]

            Some examples of scripts that do all of the above can be found in our code repositories in various places. Remind [~aloraine] to provide links to the most relevant repositories and scripts.
            ann.loraine Ann Loraine made changes -
            Description Use case scenario:

            A user has uploaded or created a BAM file with RNA-Seq data. They would like to create a bedgraph file representing a scaled RNA-Seq coverage graph. However, for the bedgraph to be useful, the file needs to be sorted, compressed, and tabix-indexed. Currently, we have a BioViz CyVerse app that creates a bedgraph file using genomeCov from the deepTools suite, but the bedgraph file is not compressed or tabix-indexed. Rather than just create the "plain" bedgraph file, let's also create the bedgraph file and then sort, compress, and tabix-index it. This will be much more useful to a user than simply the plain-text, uncompressed, unindexed file.

            This will probably require created a docker image where the docker "run" command is actually running a script that we write that does all of the above. Note that you can potentially obtain the original docker image with deepTools and use that to further provision it with tabix/samtools. Also, be sure to save the docker provisioning file (the docker file).



            Create a Cyverse analyses app to index Bedgraph.
            [bamCoverage documentation|https://deeptools.readthedocs.io/en/develop/content/tools/bamCoverage.htm]
            [How to index a bedgraph | https://www.biostars.org/p/121967/]
            [tabix and bgzip in Samtools | http://www.htslib.org/doc/tabix.html]
            [Deploying apps in CyVerse | https://learning.cyverse.org/projects/container_camp_workshop_2019/en/latest/cyverse/de_docker.html]
            [Tool integration in DE | https://learning.cyverse.org/projects/Container-camp-2020/en/latest/cyverse/tool_integration_app_building_DE.html]

            Some examples of scripts that do all of the above can be found in our code repositories in various places. Remind [~aloraine] to provide links to the most relevant repositories and scripts.
            Use case scenario:

            A user has uploaded or created a BAM file with RNA-Seq data. They would like to create a bedgraph file representing a scaled RNA-Seq coverage graph. However, for the bedgraph to be useful, the file needs to be sorted, compressed, and tabix-indexed. Currently, we have a BioViz CyVerse app that creates a bedgraph file using genomeCov from the deepTools suite, but the bedgraph file is not compressed or tabix-indexed. Rather than just create the "plain" bedgraph file, let's also create the bedgraph file and then sort, compress, and tabix-index it. This will be much more useful to a user than simply the plain-text, uncompressed, unindexed file.

            This will probably require created a docker image where the docker "run" command is actually running a script that we write that does all of the above. Note that you can potentially obtain the original docker image with deepTools and use that to further provision it with tabix/samtools. Also, be sure to save the docker provisioning file (the docker file).

            Create a Cyverse analyses app that does two things:
            * creates a scaled coverage graph using bamCoverage (from the deepTools suite)
            * sorts and tabix-indexes the resulting bedgraph file (the output from bamCoverage)

            References:

            [bamCoverage documentation|https://deeptools.readthedocs.io/en/develop/content/tools/bamCoverage.htm]
            [How to index a bedgraph | https://www.biostars.org/p/121967/]
            [tabix and bgzip in Samtools | http://www.htslib.org/doc/tabix.html]
            [Deploying apps in CyVerse | https://learning.cyverse.org/projects/container_camp_workshop_2019/en/latest/cyverse/de_docker.html]
            [Tool integration in DE | https://learning.cyverse.org/projects/Container-camp-2020/en/latest/cyverse/tool_integration_app_building_DE.html]

            Some examples of scripts that do all of the above can be found in our code repositories in various places. Remind [~aloraine] to provide links to the most relevant repositories and scripts.
            Hide
            ann.loraine Ann Loraine added a comment -

            There appears to be a Docker image that you can pull and then provision as needed. The image can be located by visiting the deepTools documentation Web site.

            Please see: https://deeptools.readthedocs.io/en/develop/content/installation.html#installation-with-docker

            Show
            ann.loraine Ann Loraine added a comment - There appears to be a Docker image that you can pull and then provision as needed. The image can be located by visiting the deepTools documentation Web site. Please see: https://deeptools.readthedocs.io/en/develop/content/installation.html#installation-with-docker
            ann.loraine Ann Loraine made changes -
            Comment [ Also, please see linked issue: IGBF-2971 ]
            Hide
            ann.loraine Ann Loraine added a comment -

            Let's create a git repository for storing the docker file and the script that it will be provisioned with.

            Show
            ann.loraine Ann Loraine added a comment - Let's create a git repository for storing the docker file and the script that it will be provisioned with.
            Hide
            ann.loraine Ann Loraine added a comment -

            Ask the CyVerse support person who made the original Docker image to send us the Docker file "because we are adding some enhancements".

            Show
            ann.loraine Ann Loraine added a comment - Ask the CyVerse support person who made the original Docker image to send us the Docker file "because we are adding some enhancements".
            Hide
            karthik Karthik Raveendran added a comment - - edited

            Had a chat with Sriram on Cyverse DE chat portal and told him what we are planning on doing. He replied:

            "We support linear workflow but you cannot build another app on top of existing docker image. You will need a separate docker image and that will need to be integrated as separate app. Once that is done, we can help you set up a workflow or You could build a new docker image with both bamcoverage and tabix and then orchestrate the workflow with in the image."

            I asked him if he can direct me to a repo for the existing bamcoverage dockerfile; yet to hear from him.
            Update: He said: I don't know any repos on top my head.

            Show
            karthik Karthik Raveendran added a comment - - edited Had a chat with Sriram on Cyverse DE chat portal and told him what we are planning on doing. He replied: "We support linear workflow but you cannot build another app on top of existing docker image. You will need a separate docker image and that will need to be integrated as separate app. Once that is done, we can help you set up a workflow or You could build a new docker image with both bamcoverage and tabix and then orchestrate the workflow with in the image." I asked him if he can direct me to a repo for the existing bamcoverage dockerfile; yet to hear from him. Update: He said: I don't know any repos on top my head.
            Hide
            ann.loraine Ann Loraine added a comment -

            In that case, probably you need to proceed with using the deepTools image as the foundation and work with it. I bet that's the one they used, as well!

            Show
            ann.loraine Ann Loraine added a comment - In that case, probably you need to proceed with using the deepTools image as the foundation and work with it. I bet that's the one they used, as well!
            Hide
            karthik Karthik Raveendran added a comment -

            Yeah I have started on that.

            Show
            karthik Karthik Raveendran added a comment - Yeah I have started on that.
            ann.loraine Ann Loraine made changes -
            Sprint Fall 9 2021 Dec 13 - Dec 24, Spring 1 2022 Jan 3 - Jan 14 [ 135, 136 ] Fall 9 2021 Dec 13 - Dec 24, Spring 1 2022 Jan 3 - Jan 14, Spring 2 2022 Jan 18 - Jan 28 [ 135, 136, 137 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            Hide
            karthik Karthik Raveendran added a comment -

            Bitbucket repo for the docker has been created: https://bitbucket.org/KarthikRavee91/cyverse_bamcoverage_docker/src/master/
            Changes for bamcoverage is yet to be added as it is still being developed.

            Show
            karthik Karthik Raveendran added a comment - Bitbucket repo for the docker has been created: https://bitbucket.org/KarthikRavee91/cyverse_bamcoverage_docker/src/master/ Changes for bamcoverage is yet to be added as it is still being developed.
            nfreese Nowlan Freese made changes -
            Story Points 2 4
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Docker hub repository that will store the image: https://hub.docker.com/repository/docker/lorainelab/deeptools-for-cyverse

            User kraveend@uncc.edu at Dockerhub added to IGB group and given write permission to the repository.

            Show
            ann.loraine Ann Loraine added a comment - - edited Docker hub repository that will store the image: https://hub.docker.com/repository/docker/lorainelab/deeptools-for-cyverse User kraveend@uncc.edu at Dockerhub added to IGB group and given write permission to the repository.
            ann.loraine Ann Loraine made changes -
            Link This issue relates to IGBF-2280 [ IGBF-2280 ]
            Hide
            ann.loraine Ann Loraine added a comment -

            Instead of using $1 to create the name of the output file, suggest using "basename" instead to get the input file's prefix.

            See: https://bitbucket.org/lorainelab/bseq_rice/src/master/src/sbatch-doIt.sh for an example.

            Show
            ann.loraine Ann Loraine added a comment - Instead of using $1 to create the name of the output file, suggest using "basename" instead to get the input file's prefix. See: https://bitbucket.org/lorainelab/bseq_rice/src/master/src/sbatch-doIt.sh for an example.
            Hide
            ann.loraine Ann Loraine added a comment -

            We need to check what inputs bamCoverage requires. Specifically, does it require a .bai index file for the .bam input file?

            Show
            ann.loraine Ann Loraine added a comment - We need to check what inputs bamCoverage requires. Specifically, does it require a .bai index file for the .bam input file?
            ann.loraine Ann Loraine made changes -
            Sprint Fall 9 2021 Dec 13 - Dec 24, Spring 1 2022 Jan 3 - Jan 14, Spring 2 2022 Jan 18 - Jan 28 [ 135, 136, 137 ] Fall 9 2021 Dec 13 - Dec 24, Spring 1 2022 Jan 3 - Jan 14, Spring 2 2022 Jan 18 - Jan 28, Spring 3 2022 Jan 31 - Feb 11 [ 135, 136, 137, 138 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            ann.loraine Ann Loraine made changes -
            Description Use case scenario:

            A user has uploaded or created a BAM file with RNA-Seq data. They would like to create a bedgraph file representing a scaled RNA-Seq coverage graph. However, for the bedgraph to be useful, the file needs to be sorted, compressed, and tabix-indexed. Currently, we have a BioViz CyVerse app that creates a bedgraph file using genomeCov from the deepTools suite, but the bedgraph file is not compressed or tabix-indexed. Rather than just create the "plain" bedgraph file, let's also create the bedgraph file and then sort, compress, and tabix-index it. This will be much more useful to a user than simply the plain-text, uncompressed, unindexed file.

            This will probably require created a docker image where the docker "run" command is actually running a script that we write that does all of the above. Note that you can potentially obtain the original docker image with deepTools and use that to further provision it with tabix/samtools. Also, be sure to save the docker provisioning file (the docker file).

            Create a Cyverse analyses app that does two things:
            * creates a scaled coverage graph using bamCoverage (from the deepTools suite)
            * sorts and tabix-indexes the resulting bedgraph file (the output from bamCoverage)

            References:

            [bamCoverage documentation|https://deeptools.readthedocs.io/en/develop/content/tools/bamCoverage.htm]
            [How to index a bedgraph | https://www.biostars.org/p/121967/]
            [tabix and bgzip in Samtools | http://www.htslib.org/doc/tabix.html]
            [Deploying apps in CyVerse | https://learning.cyverse.org/projects/container_camp_workshop_2019/en/latest/cyverse/de_docker.html]
            [Tool integration in DE | https://learning.cyverse.org/projects/Container-camp-2020/en/latest/cyverse/tool_integration_app_building_DE.html]

            Some examples of scripts that do all of the above can be found in our code repositories in various places. Remind [~aloraine] to provide links to the most relevant repositories and scripts.
            Use case scenario:

            A user has uploaded or created a BAM file with RNA-Seq data. They would like to create a bedgraph file representing a scaled RNA-Seq coverage graph. However, for the bedgraph to be useful, the file needs to be sorted, compressed, and tabix-indexed. Currently, we have a BioViz CyVerse app that creates a bedgraph file using a tool from the deepTools suite, but the bedgraph file is not compressed or tabix-indexed. Rather than just create the "plain" bedgraph file, let's also create the bedgraph file and then sort, compress, and tabix-index it. This will be much more useful to a user than simply the plain-text, uncompressed, unindexed file.

            This will probably require created a docker image where the docker "run" command is actually running a script that we write that does all of the above. Note that you can potentially obtain the original docker image with deepTools and use that to further provision it with tabix/samtools. Also, be sure to save the docker provisioning file (the docker file).

            Create a Cyverse analyses app that does two things:
            * creates a scaled coverage graph using bamCoverage (from the deepTools suite)
            * sorts and tabix-indexes the resulting bedgraph file (the output from bamCoverage)

            References:

            [bamCoverage documentation|https://deeptools.readthedocs.io/en/develop/content/tools/bamCoverage.htm]
            [How to index a bedgraph | https://www.biostars.org/p/121967/]
            [tabix and bgzip in Samtools | http://www.htslib.org/doc/tabix.html]
            [Deploying apps in CyVerse | https://learning.cyverse.org/projects/container_camp_workshop_2019/en/latest/cyverse/de_docker.html]
            [Tool integration in DE | https://learning.cyverse.org/projects/Container-camp-2020/en/latest/cyverse/tool_integration_app_building_DE.html]

            Some examples of scripts that do all of the above can be found in our code repositories in various places. Remind [~aloraine] to provide links to the most relevant repositories and scripts.
            Hide
            ann.loraine Ann Loraine added a comment -

            Can maybe test using dockerhost EC2, which we used for creating the igb docker image used by bitbucket to compile integrated genome browser.
            See also lorainelab bitbucket repository: integrated-genome-browser-docker

            Show
            ann.loraine Ann Loraine added a comment - Can maybe test using dockerhost EC2, which we used for creating the igb docker image used by bitbucket to compile integrated genome browser. See also lorainelab bitbucket repository: integrated-genome-browser-docker
            Hide
            ann.loraine Ann Loraine added a comment -

            I increased the EBS volume size on the EC2 with docker installed. IP address is 54.237.97.202. Log in as user ec2-user. You should now be able to use it to test the new Docker image. If you have problems, please let me know.

            attn: Karthik Raveendran and Nowlan Freese

            Show
            ann.loraine Ann Loraine added a comment - I increased the EBS volume size on the EC2 with docker installed. IP address is 54.237.97.202. Log in as user ec2-user. You should now be able to use it to test the new Docker image. If you have problems, please let me know. attn: Karthik Raveendran and Nowlan Freese
            Hide
            karthik Karthik Raveendran added a comment -

            Dr. [~aloraine], I have some trouble connecting to the EC2 instance. I tried changing the inbound rules in the security groups to My IP but I do not have permission to change that.

            Show
            karthik Karthik Raveendran added a comment - Dr. [~aloraine] , I have some trouble connecting to the EC2 instance. I tried changing the inbound rules in the security groups to My IP but I do not have permission to change that.
            Hide
            karthik Karthik Raveendran added a comment - - edited

            Instructions to test the docker image in local system:

            To pull the latest image to the local system:

            Terminal
            docker pull lorainelab/deeptools-for-cyverse
            

            To run the image and keep the container running:

            Terminal
            docker run -d -t -v <host_directory_absolute_path>:/app --name demo lorainelab/deeptools-for-cyverse
            

            where "<host_directory_absolute_path>" is where the input and output files will be and "--name" is any custom container name that you would want.
            For example:

            Terminal
            docker run -d -t -v C:/Users/karth/deeptools-for-cyverse:/app --name demo lorainelab/deeptools-for-cyverse
            

            To check if the container is created and running:

            Terminal
            docker ps -a
            

            To run the script with a .bam file:

            Terminal
            docker exec -d <container_name> ./igb_tbi_creator.sh <file_rel_path>
            

            where "<file_rel_path>" is the relative path of file within the folder mounted above.
            For example:

            Terminal
            docker exec -d demo ./igb_tbi_creator.sh sample_files/deepToolsTest.bam
            

            sample_files is a folder within the folder mounted above

            To open the container in interactive mode and run the script there:

            Terminal
            docker exec -it <container_name> bash
            
            for example:
            docker exec -it demo bash
            

            The sample_files folder within the app directory has sample files to test on

            Show
            karthik Karthik Raveendran added a comment - - edited Instructions to test the docker image in local system: To pull the latest image to the local system: Terminal docker pull lorainelab/deeptools- for -cyverse To run the image and keep the container running: Terminal docker run -d -t -v <host_directory_absolute_path>:/app --name demo lorainelab/deeptools- for -cyverse where "<host_directory_absolute_path>" is where the input and output files will be and "--name" is any custom container name that you would want. For example: Terminal docker run -d -t -v C:/Users/karth/deeptools- for -cyverse:/app --name demo lorainelab/deeptools- for -cyverse To check if the container is created and running: Terminal docker ps -a To run the script with a .bam file: Terminal docker exec -d <container_name> ./igb_tbi_creator.sh <file_rel_path> where "<file_rel_path>" is the relative path of file within the folder mounted above. For example: Terminal docker exec -d demo ./igb_tbi_creator.sh sample_files/deepToolsTest.bam sample_files is a folder within the folder mounted above To open the container in interactive mode and run the script there: Terminal docker exec -it <container_name> bash for example: docker exec -it demo bash The sample_files folder within the app directory has sample files to test on
            Hide
            karthik Karthik Raveendran added a comment -

            Dr. Nowlan Freese, if you are reviewing this ticket I would really like to know if this will work with the Cyverse app infrastructure like the way the input path is passed and the location of output files and if the output files that you get is sufficient or more needs to added.

            Show
            karthik Karthik Raveendran added a comment - Dr. Nowlan Freese , if you are reviewing this ticket I would really like to know if this will work with the Cyverse app infrastructure like the way the input path is passed and the location of output files and if the output files that you get is sufficient or more needs to added.
            karthik Karthik Raveendran made changes -
            Assignee Karthik Raveendran [ karthik ]
            karthik Karthik Raveendran made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            ann.loraine Ann Loraine made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            Hide
            ann.loraine Ann Loraine added a comment -

            I am a little confused about the -v option.

            In order to use the image to create coverage graph output file, you first must run the docker image with -v option mapping the directory containing your *.bam file (or files) to the directory /app within the image?

            Karthik Raveendran: Is that correct?

            Show
            ann.loraine Ann Loraine added a comment - I am a little confused about the -v option. In order to use the image to create coverage graph output file, you first must run the docker image with -v option mapping the directory containing your *.bam file (or files) to the directory /app within the image? Karthik Raveendran : Is that correct?
            Hide
            ann.loraine Ann Loraine added a comment -

            The tag "latest" is not useful because when a new docker image is created, it cannot have the same tag. Please research the proper way to indicate docker image versions using tags. Kindly review how we have managed tags and versioning for the IGB docker image used to build IGB in bitbucket. Our build infrastructure depends on the tags being properly versioned to ensure that each version of IGB can be properly built, using the correct software for that version. The same must be done here.

            Show
            ann.loraine Ann Loraine added a comment - The tag "latest" is not useful because when a new docker image is created, it cannot have the same tag. Please research the proper way to indicate docker image versions using tags. Kindly review how we have managed tags and versioning for the IGB docker image used to build IGB in bitbucket. Our build infrastructure depends on the tags being properly versioned to ensure that each version of IGB can be properly built, using the correct software for that version. The same must be done here.
            ann.loraine Ann Loraine made changes -
            Status First Level Review in Progress [ 10301 ] To-Do [ 10305 ]
            Hide
            karthik Karthik Raveendran added a comment - - edited

            Yes. The output files will also appear in that host directory. I will fix the tags today as well based on IGB docker image

            Show
            karthik Karthik Raveendran added a comment - - edited Yes. The output files will also appear in that host directory. I will fix the tags today as well based on IGB docker image
            ann.loraine Ann Loraine made changes -
            Assignee Karthik Raveendran [ karthik ]
            karthik Karthik Raveendran made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Karthik Raveendran and [~aloraine] discussed the recent progress:

            • KR has deployed the docker image from lorainelab docker hub account into his private cyverse account space
            • He has tried to run the image but is not getting any output; it looks like the problem has to do with parameter passing, maybe?
            • KR will contact the CyVerse Team to ask about:

            Questions:

            • How do I get access to the condor stderr and stdout ?
            • How do I add metadata tags to my app even while it's in my private account and not yet widely available, i.e., "published"
            Show
            ann.loraine Ann Loraine added a comment - - edited Karthik Raveendran and [~aloraine] discussed the recent progress: KR has deployed the docker image from lorainelab docker hub account into his private cyverse account space He has tried to run the image but is not getting any output; it looks like the problem has to do with parameter passing, maybe? KR will contact the CyVerse Team to ask about: Questions: How do I get access to the condor stderr and stdout ? How do I add metadata tags to my app even while it's in my private account and not yet widely available, i.e., "published"
            Hide
            ann.loraine Ann Loraine added a comment -

            Moving this Closed and we will make new tickets for the next steps.

            Show
            ann.loraine Ann Loraine added a comment - Moving this Closed and we will make new tickets for the next steps.
            ann.loraine Ann Loraine made changes -
            Link This issue relates to IGBF-3076 [ IGBF-3076 ]
            ann.loraine Ann Loraine made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            ann.loraine Ann Loraine made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            ann.loraine Ann Loraine made changes -
            Status First Level Review in Progress [ 10301 ] Ready for Pull Request [ 10304 ]
            ann.loraine Ann Loraine made changes -
            Status Ready for Pull Request [ 10304 ] Pull Request Submitted [ 10101 ]
            ann.loraine Ann Loraine made changes -
            Status Pull Request Submitted [ 10101 ] Reviewing Pull Request [ 10303 ]
            ann.loraine Ann Loraine made changes -
            Status Reviewing Pull Request [ 10303 ] Merged Needs Testing [ 10002 ]
            ann.loraine Ann Loraine made changes -
            Status Merged Needs Testing [ 10002 ] Post-merge Testing In Progress [ 10003 ]
            ann.loraine Ann Loraine made changes -
            Resolution Done [ 10000 ]
            Status Post-merge Testing In Progress [ 10003 ] Closed [ 6 ]

              People

              • Assignee:
                karthik Karthik Raveendran
                Reporter:
                karthik Karthik Raveendran
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: