Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3673

Draft key concepts/topics to be covered in the ASPB 2024 workshop

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Critical
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Task: Create a document that outlines the topics we want to cover in our workshop. The proposal that we submitted was very broad, so this new document should provide the information we need to start building a more comprehensive outline of the workshop.

      For reference, here is part of what was submitted as the workshop proposal:
      "In this workshop, we'll demonstrate features that are particularly useful in working with RNA-Seq data by showing off one or two case studies where we found interesting "scenes" in plant bulk RNA-Seq data. At the end, we will share a teaching exercise students can do with IGB to learn about DNA patterns."

        Attachments

          Issue Links

            Activity

            pkulzer Paige Kulzer created issue -
            pkulzer Paige Kulzer made changes -
            Field Original Value New Value
            Epic Link IGBF-3532 [ 22751 ]
            pkulzer Paige Kulzer made changes -
            Link This issue relates to IGBF-3443 [ IGBF-3443 ]
            pkulzer Paige Kulzer made changes -
            Link This issue relates to IGBF-3672 [ IGBF-3672 ]
            ann.loraine Ann Loraine made changes -
            Sprint Spring 6 [ 190 ] Spring 7 [ 191 ]
            ann.loraine Ann Loraine made changes -
            Sprint Spring 7 [ 191 ]
            pkulzer Paige Kulzer made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            pkulzer Paige Kulzer made changes -
            Sprint Spring 9 [ 193 ]
            pkulzer Paige Kulzer made changes -
            Sprint Spring 9 [ 193 ] Spring 8 [ 192 ]
            ann.loraine Ann Loraine made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            As much of the following as possible should be done live, demonstrating with IGB or Web browser. The experiment description can be a slide.

            Draft outline:

            • Explain that IGB is a genome browser software program you can download to your desktop computer - show Web download page and mention that the web site produces a downlown link for the version of IGB suitable for your computer
            • It's main "use case" for IGB is for scientists like yourselves to check the quality of your experimental sequence data, check for any problems, such as sample switching, and then, to explore and understand your data.
            • In this workshop, we'll focus on analyzing and understanding RNA-Seq data.
            • In a typically RNA-Seq experiment, you create a biological replicate (a sample), and then make an RNA-Seq library out of it. Then, you sequence the library, along with many others representing additional replicates, controls, different sample types, and so on.
            • Once you have the sequence data, you'll send it through a computational pipeline that aligns the sequences to a reference genome and then produces "gene counts" - the number of sequences that aligned to each gene. There are lots of great, well-established pipeline software tools you can use for this! Galaxy is one such system, nextflow is another such system, and you can also design your own custom process suitable for your data. RNA-Seq technology has been around for more than ten years, so the processes and tools needed to go from read sequences to gene count tables to differential expression results are very well established! We think that most laboratories, with a little training, can use these methods in their research. Also, the cost of sequencing and library preparation is falling all the time. So this technology is becoming very accessible.
            • However, there is a dark side! Unless you visualize your data, you might make some shocking and embarrassing mistakes. One of the biggest such mistake that can happen is that someone along the line might accidentally mis-label your samples. This has happened to us twice, and we were able to catch the problem through visualizing the data in IGB. We're not going to talk in depth about these examples because we would rather talk about the interesting scientific results IGB helped discover!
            • Today, we're going to show you how we and our collaborators used RNA-Seq to study how heat stress affected gene expression in pollen tubes, during pollen tube growth. This data are from an NSF-funded project that is collaboration between four laboratories - Mark Johnson at Brown, Ravi Palanivelu at Arizona State, James Pease at Wake Forest, and Gloria Muday, also at Wake Forest.
            • We know from many different experiments and observations that pollen tube growth during fertilization is sensitive to high temperatures, but some varieties of tomato are more sensitive to stress than others. Interestingly, pollen tubes from a tomato mutant called "anthocyanin reduced" (abbreviated "ARE") which has a mutation in the flavonoid biosynthetic pathway are unusually sensitive to heat stress. Pollen tubes from ARE exhibit very poor growth under heat stress than non-mutants. This suggests that flavonoids play a role in maintaining pollen tube growth and integrity. So, our collaborators in the Muday lab designed and executed an RNA-Seq experiment using ARE in hopes of identifying pathways and genes that are important for pollen tube biology under stress.
            • To identify such genes and pathways, they applied a heat stress to pollen tubes growing in vitro from three tomato genotypes: the ARE "anthocyanin-reduced" mutant, VF36, and VF36 transformed with a construct over-expressing the gene that is defective in ARE. They collected samples, made RNA, and had them sequenced at a commercial provider.
            • We then aligned the data using a standard pipeline and then set up the data in an IGB Quickload data site, so that we and the Muday lab researchers could look at it. Also, at the same time, we did statistical analyses of the data, looking to identify genes that were higher or lower in heat-treated samples versus the non-heat treated samples, higher or lower in the mutant versus the non-mutant plants, and so on. Because of the design, there were many diverse comparisons that could be done! We needed ways to sanity-check all of the statistical analysis work. This is where the genomic visualization play a huge role.
            • After running the pipeline, we set up a "Quickload" site on-line and used it heavily during the project. We went through several iterations of the site, and then, when the data were published, we set up a final publication version so that readers of the published article could explore the results for themselves.
            • To see the site and start looking at the data, first start IGB!
            • Next, select the tomato genome by clicking the tomato image on the IGB home screen. When you do this, IGB loads the most recent release of a tomato genome assembly, which we are calling "SL5" and which was published in June of 2022. However, for this study, we actually used two different reference genomes and configured the Quickload site so that you can look at both of them. So, for example, if you are working with tomato genomics data, and you are using the earlier version, which we and others called "SL4," you can still work with these data. (Show the Current Genome Tab genome version menu.)
            • To start, let's take a look at RNA sequence reads aligned to the mutant locus in ARE. The ARE gene encodes an enzyme in the flavonoid biosynthetic pathway. To find it in IGB, we can search for the enzyme name (note: let's improve the description field in the annotations file to make it easy as possible to find ARE locus.) In ARE, there is a point mutation affecting the coding region, which ought to be visible in the RNA-Seq sequence data. So, one of the first things we did when we got the data was check that the mutation was present in the "ARE" samples and absent in the others.
            • Now, let's do that! But we will only check two samples, since time is limited. To start, let's import RNA-Seq sequence alignments into the viewer. To do that, we open the folder named for this study, and find that there are three folders inside it, named "Reads," "Scaled Coverage Graphs," and "Junctions." To check the sequences, we need to open. the "Read alignments" folder. (Next: show loading and adjusting the view of the read alignment tracks. Find the mutation and zoom into the sequence to observe the effect of the mutation. I think it introduces a stop codon? I can't remember!)
            • As I mentioned previously, we did a number of relatively complex statistical analyses. One of the most interesting analyses looked at how the heat response of the ARE mutant differed from the heat response of the other genotypes. We got a lot of interesting insights out of that, but before proceeding with next steps, such as new experiments, we need a way to visually check that the statistical results made sense. We're venturing into unknown territory, which is the nature of scientific work! And because this type of work necessarily often has to push the boundaries of methods, we need ways to check that the results make sense. (We can show a simpler example, instead. This is the coolest, in my view!)
            • To show you what I mean, I want to show you one of the other folders - the "genome coverage graphs folder." This folder contains a different type of data file, called a "scaled genome coverage graph." To start, let's load samples from ARE genotype and VF36 at 75 minutes post-heat stress, controls and treatment samples. We'll navigate to a gene that we think, based on the statisitcal analysis, responds differently to the stress in the ARE mutant versus VF36, the non-mutant. Now that the data are loaded, we need to put them all on the same scale. We do that by opening the Graphs tab, pressing "Select All" to select all the Graph tracks, and then setting the upper and lower boundaries of the graphs to the same values.
            • These data files were made using a genomic analysis program called "bamCoverage" that produces an scaled expression value for every postition in the reference genome. The scaled value takes into account the total amount of sequence coming from the sample, such that you can visually compare them in IGB. This makes it possible to compare the graph heights across the tracks in the same location. Genes that are differentially expressed across the samples are obvious because the heights of the graphs are different. Thus, in this case, the differential expression is visible.
            • You can also load much larger regions and spot "by eye" the most highly differentially expressed genes. To see this, let's delete a few of the graphs, leaving just two of the ARE control smaples and two of the VF36 samples. Next, let's move to the first chromosome of the genome. We'll then zoom out all the way to show the entire chromosome and then load all the data for the entire chromosome. Next, we'll put them all on the same scale, as before.
            • Now, we can easily locate regions of the genome where expression is higher or lower in ARE relative to the VF36 control. In this way, we can come up with a prelimary listing of genes that ought to be found as differentially expressed in the statistical anlaysis. In this way, we can, right off the bat, provide our analysis with a list of good "positive controls" that ought to be found in the statistical analysis.
            • Thank you for your attention! And to help you and your students use IGB in the way we described, we are providing a tutorial that walks you through all of the above steps. The tutorial is avaialble from our our Web site bioviz.org (provide link - we still need to come up with the best way to deploy and distribute these materials.)
            Show
            ann.loraine Ann Loraine added a comment - - edited As much of the following as possible should be done live, demonstrating with IGB or Web browser. The experiment description can be a slide. Draft outline: Explain that IGB is a genome browser software program you can download to your desktop computer - show Web download page and mention that the web site produces a downlown link for the version of IGB suitable for your computer It's main "use case" for IGB is for scientists like yourselves to check the quality of your experimental sequence data, check for any problems, such as sample switching, and then, to explore and understand your data. In this workshop, we'll focus on analyzing and understanding RNA-Seq data. In a typically RNA-Seq experiment, you create a biological replicate (a sample), and then make an RNA-Seq library out of it. Then, you sequence the library, along with many others representing additional replicates, controls, different sample types, and so on. Once you have the sequence data, you'll send it through a computational pipeline that aligns the sequences to a reference genome and then produces "gene counts" - the number of sequences that aligned to each gene. There are lots of great, well-established pipeline software tools you can use for this! Galaxy is one such system, nextflow is another such system, and you can also design your own custom process suitable for your data. RNA-Seq technology has been around for more than ten years, so the processes and tools needed to go from read sequences to gene count tables to differential expression results are very well established! We think that most laboratories, with a little training, can use these methods in their research. Also, the cost of sequencing and library preparation is falling all the time. So this technology is becoming very accessible. However, there is a dark side! Unless you visualize your data, you might make some shocking and embarrassing mistakes. One of the biggest such mistake that can happen is that someone along the line might accidentally mis-label your samples. This has happened to us twice, and we were able to catch the problem through visualizing the data in IGB. We're not going to talk in depth about these examples because we would rather talk about the interesting scientific results IGB helped discover! Today, we're going to show you how we and our collaborators used RNA-Seq to study how heat stress affected gene expression in pollen tubes, during pollen tube growth. This data are from an NSF-funded project that is collaboration between four laboratories - Mark Johnson at Brown, Ravi Palanivelu at Arizona State, James Pease at Wake Forest, and Gloria Muday, also at Wake Forest. We know from many different experiments and observations that pollen tube growth during fertilization is sensitive to high temperatures, but some varieties of tomato are more sensitive to stress than others. Interestingly, pollen tubes from a tomato mutant called "anthocyanin reduced" (abbreviated "ARE") which has a mutation in the flavonoid biosynthetic pathway are unusually sensitive to heat stress. Pollen tubes from ARE exhibit very poor growth under heat stress than non-mutants. This suggests that flavonoids play a role in maintaining pollen tube growth and integrity. So, our collaborators in the Muday lab designed and executed an RNA-Seq experiment using ARE in hopes of identifying pathways and genes that are important for pollen tube biology under stress. To identify such genes and pathways, they applied a heat stress to pollen tubes growing in vitro from three tomato genotypes: the ARE "anthocyanin-reduced" mutant, VF36, and VF36 transformed with a construct over-expressing the gene that is defective in ARE. They collected samples, made RNA, and had them sequenced at a commercial provider. We then aligned the data using a standard pipeline and then set up the data in an IGB Quickload data site, so that we and the Muday lab researchers could look at it. Also, at the same time, we did statistical analyses of the data, looking to identify genes that were higher or lower in heat-treated samples versus the non-heat treated samples, higher or lower in the mutant versus the non-mutant plants, and so on. Because of the design, there were many diverse comparisons that could be done! We needed ways to sanity-check all of the statistical analysis work. This is where the genomic visualization play a huge role. After running the pipeline, we set up a "Quickload" site on-line and used it heavily during the project. We went through several iterations of the site, and then, when the data were published, we set up a final publication version so that readers of the published article could explore the results for themselves. To see the site and start looking at the data, first start IGB! Next, select the tomato genome by clicking the tomato image on the IGB home screen. When you do this, IGB loads the most recent release of a tomato genome assembly, which we are calling "SL5" and which was published in June of 2022. However, for this study, we actually used two different reference genomes and configured the Quickload site so that you can look at both of them. So, for example, if you are working with tomato genomics data, and you are using the earlier version, which we and others called "SL4," you can still work with these data. (Show the Current Genome Tab genome version menu.) To start, let's take a look at RNA sequence reads aligned to the mutant locus in ARE. The ARE gene encodes an enzyme in the flavonoid biosynthetic pathway. To find it in IGB, we can search for the enzyme name (note: let's improve the description field in the annotations file to make it easy as possible to find ARE locus.) In ARE, there is a point mutation affecting the coding region, which ought to be visible in the RNA-Seq sequence data. So, one of the first things we did when we got the data was check that the mutation was present in the "ARE" samples and absent in the others. Now, let's do that! But we will only check two samples, since time is limited. To start, let's import RNA-Seq sequence alignments into the viewer. To do that, we open the folder named for this study, and find that there are three folders inside it, named "Reads," "Scaled Coverage Graphs," and "Junctions." To check the sequences, we need to open. the "Read alignments" folder. (Next: show loading and adjusting the view of the read alignment tracks. Find the mutation and zoom into the sequence to observe the effect of the mutation. I think it introduces a stop codon? I can't remember!) As I mentioned previously, we did a number of relatively complex statistical analyses. One of the most interesting analyses looked at how the heat response of the ARE mutant differed from the heat response of the other genotypes. We got a lot of interesting insights out of that, but before proceeding with next steps, such as new experiments, we need a way to visually check that the statistical results made sense. We're venturing into unknown territory, which is the nature of scientific work! And because this type of work necessarily often has to push the boundaries of methods, we need ways to check that the results make sense. (We can show a simpler example, instead. This is the coolest, in my view!) To show you what I mean, I want to show you one of the other folders - the "genome coverage graphs folder." This folder contains a different type of data file, called a "scaled genome coverage graph." To start, let's load samples from ARE genotype and VF36 at 75 minutes post-heat stress, controls and treatment samples. We'll navigate to a gene that we think, based on the statisitcal analysis, responds differently to the stress in the ARE mutant versus VF36, the non-mutant. Now that the data are loaded, we need to put them all on the same scale. We do that by opening the Graphs tab, pressing "Select All" to select all the Graph tracks, and then setting the upper and lower boundaries of the graphs to the same values. These data files were made using a genomic analysis program called "bamCoverage" that produces an scaled expression value for every postition in the reference genome. The scaled value takes into account the total amount of sequence coming from the sample, such that you can visually compare them in IGB. This makes it possible to compare the graph heights across the tracks in the same location. Genes that are differentially expressed across the samples are obvious because the heights of the graphs are different. Thus, in this case, the differential expression is visible. You can also load much larger regions and spot "by eye" the most highly differentially expressed genes. To see this, let's delete a few of the graphs, leaving just two of the ARE control smaples and two of the VF36 samples. Next, let's move to the first chromosome of the genome. We'll then zoom out all the way to show the entire chromosome and then load all the data for the entire chromosome. Next, we'll put them all on the same scale, as before. Now, we can easily locate regions of the genome where expression is higher or lower in ARE relative to the VF36 control. In this way, we can come up with a prelimary listing of genes that ought to be found as differentially expressed in the statistical anlaysis. In this way, we can, right off the bat, provide our analysis with a list of good "positive controls" that ought to be found in the statistical analysis. Thank you for your attention! And to help you and your students use IGB in the way we described, we are providing a tutorial that walks you through all of the above steps. The tutorial is avaialble from our our Web site bioviz.org (provide link - we still need to come up with the best way to deploy and distribute these materials.)
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            To do:

            • Set up SRA-downloaded data sets (alignments, coverage graphs, junction files) in the RNA-Seq Quickload (requires fixing sample label problems!)
            • Improve the gene description for the ARE gene in SL4 and SL5 annotations
            • Create a tutorial suitable for teaching classes (and training researchers)
            • Identify a good platform for sharing the materials from the workshop
            • Create visuals for the workshop
            • Rehearse a bunch of times so that *everybody* on the team is comfortable doing the presentation.
            Show
            ann.loraine Ann Loraine added a comment - - edited To do: Set up SRA-downloaded data sets (alignments, coverage graphs, junction files) in the RNA-Seq Quickload (requires fixing sample label problems!) Improve the gene description for the ARE gene in SL4 and SL5 annotations Create a tutorial suitable for teaching classes (and training researchers) Identify a good platform for sharing the materials from the workshop Create visuals for the workshop Rehearse a bunch of times so that * everybody * on the team is comfortable doing the presentation.
            ann.loraine Ann Loraine made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ] Paige Kulzer [ pkulzer ]
            pkulzer Paige Kulzer made changes -
            Link This issue relates to IGBF-3711 [ IGBF-3711 ]
            ann.loraine Ann Loraine made changes -
            Sprint Spring 8 [ 192 ] Spring 8, Spring 9 [ 192, 193 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            pkulzer Paige Kulzer made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            pkulzer Paige Kulzer made changes -
            Link This issue relates to IGBF-3712 [ IGBF-3712 ]
            Hide
            pkulzer Paige Kulzer added a comment -

            I've created several tickets to address these tasks:

            • Set up SRA-downloaded data sets (alignments, coverage graphs, junction files) in the RNA-Seq Quickload (requires fixing sample label problems!): IGBF-3711
            • Improve the gene description for the ARE gene in SL4 and SL5 annotations: IGBF-3712
            • Create a tutorial suitable for teaching classes (and training researchers)/Create visuals for the workshop: IGBF-3672

            Thank you for all of this detail, Ann Loraine! Having all of these key talking points drafted will make creating the workshop much easier. Closing this ticket.

            Show
            pkulzer Paige Kulzer added a comment - I've created several tickets to address these tasks: Set up SRA-downloaded data sets (alignments, coverage graphs, junction files) in the RNA-Seq Quickload (requires fixing sample label problems!): IGBF-3711 Improve the gene description for the ARE gene in SL4 and SL5 annotations: IGBF-3712 Create a tutorial suitable for teaching classes (and training researchers)/Create visuals for the workshop: IGBF-3672 Thank you for all of this detail, Ann Loraine ! Having all of these key talking points drafted will make creating the workshop much easier. Closing this ticket.
            pkulzer Paige Kulzer made changes -
            Status First Level Review in Progress [ 10301 ] Ready for Pull Request [ 10304 ]
            pkulzer Paige Kulzer made changes -
            Status Ready for Pull Request [ 10304 ] Pull Request Submitted [ 10101 ]
            pkulzer Paige Kulzer made changes -
            Status Pull Request Submitted [ 10101 ] Reviewing Pull Request [ 10303 ]
            pkulzer Paige Kulzer made changes -
            Status Reviewing Pull Request [ 10303 ] Merged Needs Testing [ 10002 ]
            pkulzer Paige Kulzer made changes -
            Status Merged Needs Testing [ 10002 ] Post-merge Testing In Progress [ 10003 ]
            pkulzer Paige Kulzer made changes -
            Resolution Done [ 10000 ]
            Status Post-merge Testing In Progress [ 10003 ] Closed [ 6 ]
            pkulzer Paige Kulzer made changes -
            Assignee Paige Kulzer [ pkulzer ] Ann Loraine [ aloraine ]

              People

              • Assignee:
                ann.loraine Ann Loraine
                Reporter:
                pkulzer Paige Kulzer
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: