Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3826

Investigate BREW3R tool for extending gene annotations using RNA-Seq data

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Minor
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      BREW3R (https://github.com/lldelisle/BREW3R.r) is a tool designed to extend gene annotations using publicly available data, addressing cases where gene annotations appear truncated relative to coverage graphs in IGB. This integration aims to enhance the accuracy of gene models in IGB by updating reference annotations with longer UTRs or additional exons where appropriate.

      Conduct initial evaluation to assess the feasibility and benefits of integrating BREW3R into IGB. Also, please identify similar tools with the same functionality that we might be able to implement instead, especially those on platforms other than R.

        Attachments

          Activity

          Hide
          pkulzer Paige Kulzer added a comment -

          FYI: This tool is also available on Galaxy as BREW3R.r.

          Show
          pkulzer Paige Kulzer added a comment - FYI: This tool is also available on Galaxy as BREW3R.r.
          Hide
          ann.loraine Ann Loraine added a comment - - edited

          Thanks Paige Kulzer for letting us know about this tool.

          There isn't really a way to incorporate this tool into IGB, but IGB would definitely be a great tool to check the output of BREW3R.r

          I think this would be a great resource for improving gene annotations as part of a genome annotation project.

          • For example, we could use it to improve reference gene model annotations for tardigrade genome assemblies using RNA-Seq data that is available.
          • Or, we could use it to improve reference gene model annotations for tomato SL4 or SL5 genome assemblies.

          The output of either effort would be a new "collection" of RNA-Seq-improved gene models, which we could release ourselves right away as a new reference gene model track in IGB Quickload.

          But getting these models to be used by others in the community would be a challenge.

          Of the two options, the "tardigrade" would be the easiest, as there is basically no-one apart from NCBI and us who are supporting a genome browser for tardigrade, so far as I know. And also, the tardigrade researchers themselves don't seem to be using genome assemblies in their work.

          Getting new tomato gene model annotations used and incorporated into other systems would be a massive undertaking, however, as the SolGenomics group at Cornell pretty much "owns" tomato, for better or worse!

          Rather than do this next, I think we need to circle back with the Goldstein lab now and talk about the work we have done with tardigrade. My feeling is this would be our most productive next step.

          Show
          ann.loraine Ann Loraine added a comment - - edited Thanks Paige Kulzer for letting us know about this tool. There isn't really a way to incorporate this tool into IGB, but IGB would definitely be a great tool to check the output of BREW3R.r I think this would be a great resource for improving gene annotations as part of a genome annotation project. For example, we could use it to improve reference gene model annotations for tardigrade genome assemblies using RNA-Seq data that is available. Or, we could use it to improve reference gene model annotations for tomato SL4 or SL5 genome assemblies. The output of either effort would be a new "collection" of RNA-Seq-improved gene models, which we could release ourselves right away as a new reference gene model track in IGB Quickload. But getting these models to be used by others in the community would be a challenge. Of the two options, the "tardigrade" would be the easiest, as there is basically no-one apart from NCBI and us who are supporting a genome browser for tardigrade, so far as I know. And also, the tardigrade researchers themselves don't seem to be using genome assemblies in their work. Getting new tomato gene model annotations used and incorporated into other systems would be a massive undertaking, however, as the SolGenomics group at Cornell pretty much "owns" tomato, for better or worse! Rather than do this next, I think we need to circle back with the Goldstein lab now and talk about the work we have done with tardigrade. My feeling is this would be our most productive next step.
          Hide
          ann.loraine Ann Loraine added a comment - - edited

          Sorry! I completely misunderstood what this tool does. According to the description, this is what it does:

          contains a single function which enable to extend three prime of gene annotations using another gene annotation as template.

          I assumed it uses RNA-Seq data to extend the 3' and 5' boundaries of gene models. Which would be useful for genome annotation, definitely.

          Sorry Dylan Marrotte and Paige Kulzer!

          Actually what it does is extend gene model annotations using other gene model annotations.

          It uses the "genome ranges" data structures that are part of BioConductor. I have looked at these from time to time. I never liked them very much because I find them unwieldy, confusing and basically unnecessary for the work I normally do in R.

          I recommend closing this "investigate" ticket with the conclusion: Not worthwhile to pursue at this time.

          Assigning to Paige Kulzer to take a look. Was there something you saw in the Galaxy interface or at GCC that piqued your interest about this?

          Show
          ann.loraine Ann Loraine added a comment - - edited Sorry! I completely misunderstood what this tool does. According to the description, this is what it does: contains a single function which enable to extend three prime of gene annotations using another gene annotation as template. I assumed it uses RNA-Seq data to extend the 3' and 5' boundaries of gene models. Which would be useful for genome annotation, definitely. Sorry Dylan Marrotte and Paige Kulzer ! Actually what it does is extend gene model annotations using other gene model annotations. It uses the "genome ranges" data structures that are part of BioConductor. I have looked at these from time to time. I never liked them very much because I find them unwieldy, confusing and basically unnecessary for the work I normally do in R. I recommend closing this "investigate" ticket with the conclusion: Not worthwhile to pursue at this time. Assigning to Paige Kulzer to take a look. Was there something you saw in the Galaxy interface or at GCC that piqued your interest about this?
          Hide
          pkulzer Paige Kulzer added a comment -

          I had also been under the initial impression after chatting with the tool's developer at GCC that it used RNA-Seq data to extend gene annotations, but I noticed yesterday too thanks to Galaxy's interface and re-reading the documentation that it uses another gene annotation to extend gene annotations.

          I agree that this is not very useful for us at this time, so I will close this ticket.

          Show
          pkulzer Paige Kulzer added a comment - I had also been under the initial impression after chatting with the tool's developer at GCC that it used RNA-Seq data to extend gene annotations, but I noticed yesterday too thanks to Galaxy's interface and re-reading the documentation that it uses another gene annotation to extend gene annotations. I agree that this is not very useful for us at this time, so I will close this ticket.

            People

            • Assignee:
              pkulzer Paige Kulzer
              Reporter:
              dmarrott Dylan Marrotte (Inactive)
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: