Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3047

Investigate: New splice variant annotations for tomato

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None
    • Story Points:
      2
    • Sprint:
      Spring 9 2022 May 9, Summer 1 2022 May 23, Summer 2 2022 June 6, Summer 3 2022 June 21, Summer 4 2022 July 4

      Description

      The S. lycopersicon (cultivated tomato) gene annotations include only one gene model per gene. However, visualizing RNA-Seq data in IGB shows that a large number of genes produce multiple splice forms. At least one other group has noticed this, as well. In their article "Expanding Alternative Splicing Identification by Integrating Multiple Sources of Transcription Data in Tomato", a group at Ohio State University led by Prof. Xiangjia (Jack) Min reported using transcriptome data, including ESTs and RNA-Seq data, to assemble new gene models. I downloaded these and deployed them to IGB Quickload; they are one of the available data sets for the next to last genome release.

      There may be other groups developing similar datasets for the most recent genome release for tomato. And in order to quantify splice variant expression using current methods, it would be extremely helpful to have an up-to-date, accurate-as-possible collection of gene models annotated with functional information. Who else is interested in this and would be interested in contributing? Or, is this something only our group might care about?

      As part of the pollen NSF project, we are trying to understand and discover how heat stress triggers changes in RNA synthesis in pollen, in pollen tubes, and in other sample types related to reproduction in plants, especially tomato?

      How homogenous are the RNA-Seq data sets coming from the pollen project? So far, all the data have been from a single cell type: germinating pollen tubes. I do not recall seeing much evidence for alternative splicing in these datasets, at least not as compared with other samples that included many cell types, e.g., root or shoot. Also, are there splice forms that exist mainly in pollen but not other tissue types? We found some examples of this in the Arabidospis pollen RNA-Seq data described in our paper "RNA-seq of Arabidopsis pollen uncovers novel transcription and alternative splicing".

      How many tomato RNA-Seq data sets are there, and how good are they? For the purpose of producing new gene models, the best bulk RNA-Seq data would be paired end, very long read lengths, and strand-specific. Are such data available currently, or would we need to create new data to cover the entirety of transcription?

        Attachments

          Issue Links

            Activity

            ann.loraine Ann Loraine created issue -
            ann.loraine Ann Loraine made changes -
            Field Original Value New Value
            Epic Link IGBF-2993 [ 21429 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            ann.loraine Ann Loraine made changes -
            Description The S. lycopersicon (cultivated tomato) genome gene annotations include only one gene model per gene. However, visualizing RNA-Seq data in IGB shows that a large number of genes produce multiple splice forms. At least one other group has noticed this, as well. In their article "[Expanding Alternative Splicing Identification by Integrating Multiple Sources of Transcription Data in Tomato|https://www.frontiersin.org/articles/10.3389/fpls.2019.00689/full]", a group at Ohio State University led by Prof. Xiangjia (Jack) Min reported using transcriptome data, including ESTs and RNA-Seq data, to assemble new gene models. I downloaded these and deployed them to IGB Quickload; they are one of the available data sets for the next to last genome release.

            There may be other groups developing similar datasets for the most recent genome release for tomato. And in order to quantify splice variant expression using current methods, it would be extremely helpful to have an up-to-date, accurate-as-possible collection of gene models annotated with functional information.

            Another question we can and probably should address right away is: How homogenous are the RNA-Seq data sets coming from the pollen project? So far, all the data have been from a single cell type: germinating pollen tubes. I do not recall seeing much evidence for alternative splicing in these datasets, at least not as compared with other sample
            The S. lycopersicon (cultivated tomato) genome gene annotations include only one gene model per gene. However, visualizing RNA-Seq data in IGB shows that a large number of genes produce multiple splice forms. At least one other group has noticed this, as well. In their article "[Expanding Alternative Splicing Identification by Integrating Multiple Sources of Transcription Data in Tomato|https://www.frontiersin.org/articles/10.3389/fpls.2019.00689/full]", a group at Ohio State University led by Prof. Xiangjia (Jack) Min reported using transcriptome data, including ESTs and RNA-Seq data, to assemble new gene models. I downloaded these and deployed them to IGB Quickload; they are one of the available data sets for the next to last genome release.

            There may be other groups developing similar datasets for the most recent genome release for tomato. And in order to quantify splice variant expression using current methods, it would be extremely helpful to have an up-to-date, accurate-as-possible collection of gene models annotated with functional information. Who else is interested in this and would be interested in contributing? Or is this something only our group might care about, since we are studying the effects of heat stress and heat stress, along with desiccation stress, triggers changes in alternative splicing?

            How homogenous are the RNA-Seq data sets coming from the pollen project? So far, all the data have been from a single cell type: germinating pollen tubes. I do not recall seeing much evidence for alternative splicing in these datasets, at least not as compared with other samples that included many cell types, e.g., root or shoot.
            ann.loraine Ann Loraine made changes -
            Description The S. lycopersicon (cultivated tomato) genome gene annotations include only one gene model per gene. However, visualizing RNA-Seq data in IGB shows that a large number of genes produce multiple splice forms. At least one other group has noticed this, as well. In their article "[Expanding Alternative Splicing Identification by Integrating Multiple Sources of Transcription Data in Tomato|https://www.frontiersin.org/articles/10.3389/fpls.2019.00689/full]", a group at Ohio State University led by Prof. Xiangjia (Jack) Min reported using transcriptome data, including ESTs and RNA-Seq data, to assemble new gene models. I downloaded these and deployed them to IGB Quickload; they are one of the available data sets for the next to last genome release.

            There may be other groups developing similar datasets for the most recent genome release for tomato. And in order to quantify splice variant expression using current methods, it would be extremely helpful to have an up-to-date, accurate-as-possible collection of gene models annotated with functional information. Who else is interested in this and would be interested in contributing? Or is this something only our group might care about, since we are studying the effects of heat stress and heat stress, along with desiccation stress, triggers changes in alternative splicing?

            How homogenous are the RNA-Seq data sets coming from the pollen project? So far, all the data have been from a single cell type: germinating pollen tubes. I do not recall seeing much evidence for alternative splicing in these datasets, at least not as compared with other samples that included many cell types, e.g., root or shoot.
            The S. lycopersicon (cultivated tomato) genome gene annotations include only one gene model per gene. However, visualizing RNA-Seq data in IGB shows that a large number of genes produce multiple splice forms. At least one other group has noticed this, as well. In their article "[Expanding Alternative Splicing Identification by Integrating Multiple Sources of Transcription Data in Tomato|https://www.frontiersin.org/articles/10.3389/fpls.2019.00689/full]", a group at Ohio State University led by Prof. Xiangjia (Jack) Min reported using transcriptome data, including ESTs and RNA-Seq data, to assemble new gene models. I downloaded these and deployed them to IGB Quickload; they are one of the available data sets for the next to last genome release.

            There may be other groups developing similar datasets for the most recent genome release for tomato. And in order to quantify splice variant expression using current methods, it would be extremely helpful to have an up-to-date, accurate-as-possible collection of gene models annotated with functional information. Who else is interested in this and would be interested in contributing? Or is this something only our group might care about, since we are studying the effects of heat stress and heat stress, along with desiccation stress, triggers changes in alternative splicing?

            How homogenous are the RNA-Seq data sets coming from the pollen project? So far, all the data have been from a single cell type: germinating pollen tubes. I do not recall seeing much evidence for alternative splicing in these datasets, at least not as compared with other samples that included many cell types, e.g., root or shoot. Also, are there splice forms that exist mainly in pollen but not other tissue types? We found some examples of this in the Arabidospis pollen RNA-Seq data described in our paper "[RNA-seq of Arabidopsis pollen uncovers novel transcription and alternative splicing|https://pubmed.ncbi.nlm.nih.gov/23590974/]".

            How many tomato RNA-Seq data sets are there, and how good are they? For the purpose of producing new gene models, the best bulk RNA-Seq data would be paired end, very long read lengths, and strand-specific. Are such data available currently, or would we need to create new data to cover the entirety of transcription?
            ann.loraine Ann Loraine made changes -
            Description The S. lycopersicon (cultivated tomato) genome gene annotations include only one gene model per gene. However, visualizing RNA-Seq data in IGB shows that a large number of genes produce multiple splice forms. At least one other group has noticed this, as well. In their article "[Expanding Alternative Splicing Identification by Integrating Multiple Sources of Transcription Data in Tomato|https://www.frontiersin.org/articles/10.3389/fpls.2019.00689/full]", a group at Ohio State University led by Prof. Xiangjia (Jack) Min reported using transcriptome data, including ESTs and RNA-Seq data, to assemble new gene models. I downloaded these and deployed them to IGB Quickload; they are one of the available data sets for the next to last genome release.

            There may be other groups developing similar datasets for the most recent genome release for tomato. And in order to quantify splice variant expression using current methods, it would be extremely helpful to have an up-to-date, accurate-as-possible collection of gene models annotated with functional information. Who else is interested in this and would be interested in contributing? Or is this something only our group might care about, since we are studying the effects of heat stress and heat stress, along with desiccation stress, triggers changes in alternative splicing?

            How homogenous are the RNA-Seq data sets coming from the pollen project? So far, all the data have been from a single cell type: germinating pollen tubes. I do not recall seeing much evidence for alternative splicing in these datasets, at least not as compared with other samples that included many cell types, e.g., root or shoot. Also, are there splice forms that exist mainly in pollen but not other tissue types? We found some examples of this in the Arabidospis pollen RNA-Seq data described in our paper "[RNA-seq of Arabidopsis pollen uncovers novel transcription and alternative splicing|https://pubmed.ncbi.nlm.nih.gov/23590974/]".

            How many tomato RNA-Seq data sets are there, and how good are they? For the purpose of producing new gene models, the best bulk RNA-Seq data would be paired end, very long read lengths, and strand-specific. Are such data available currently, or would we need to create new data to cover the entirety of transcription?
            The S. lycopersicon (cultivated tomato) gene annotations include only one gene model per gene. However, visualizing RNA-Seq data in IGB shows that a large number of genes produce multiple splice forms. At least one other group has noticed this, as well. In their article "[Expanding Alternative Splicing Identification by Integrating Multiple Sources of Transcription Data in Tomato|https://www.frontiersin.org/articles/10.3389/fpls.2019.00689/full]", a group at Ohio State University led by Prof. Xiangjia (Jack) Min reported using transcriptome data, including ESTs and RNA-Seq data, to assemble new gene models. I downloaded these and deployed them to IGB Quickload; they are one of the available data sets for the next to last genome release.

            There may be other groups developing similar datasets for the most recent genome release for tomato. And in order to quantify splice variant expression using current methods, it would be extremely helpful to have an up-to-date, accurate-as-possible collection of gene models annotated with functional information. Who else is interested in this and would be interested in contributing? Or is this something only our group might care about, since we are studying the effects of heat stress and heat stress, along with desiccation stress, triggers changes in alternative splicing?

            How homogenous are the RNA-Seq data sets coming from the pollen project? So far, all the data have been from a single cell type: germinating pollen tubes. I do not recall seeing much evidence for alternative splicing in these datasets, at least not as compared with other samples that included many cell types, e.g., root or shoot. Also, are there splice forms that exist mainly in pollen but not other tissue types? We found some examples of this in the Arabidospis pollen RNA-Seq data described in our paper "[RNA-seq of Arabidopsis pollen uncovers novel transcription and alternative splicing|https://pubmed.ncbi.nlm.nih.gov/23590974/]".

            How many tomato RNA-Seq data sets are there, and how good are they? For the purpose of producing new gene models, the best bulk RNA-Seq data would be paired end, very long read lengths, and strand-specific. Are such data available currently, or would we need to create new data to cover the entirety of transcription?
            nfreese Nowlan Freese made changes -
            Rank Ranked lower
            ann.loraine Ann Loraine made changes -
            Sprint Spring 9 2022 May 9 [ 144 ]
            ann.loraine Ann Loraine made changes -
            Sprint Spring 9 2022 May 9 [ 144 ] Spring 9 2022 May 9, Summer 1 2022 May 23 [ 144, 147 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            ann.loraine Ann Loraine made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ]
            ann.loraine Ann Loraine made changes -
            Status In Progress [ 3 ] To-Do [ 10305 ]
            ann.loraine Ann Loraine made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            ann.loraine Ann Loraine made changes -
            Description The S. lycopersicon (cultivated tomato) gene annotations include only one gene model per gene. However, visualizing RNA-Seq data in IGB shows that a large number of genes produce multiple splice forms. At least one other group has noticed this, as well. In their article "[Expanding Alternative Splicing Identification by Integrating Multiple Sources of Transcription Data in Tomato|https://www.frontiersin.org/articles/10.3389/fpls.2019.00689/full]", a group at Ohio State University led by Prof. Xiangjia (Jack) Min reported using transcriptome data, including ESTs and RNA-Seq data, to assemble new gene models. I downloaded these and deployed them to IGB Quickload; they are one of the available data sets for the next to last genome release.

            There may be other groups developing similar datasets for the most recent genome release for tomato. And in order to quantify splice variant expression using current methods, it would be extremely helpful to have an up-to-date, accurate-as-possible collection of gene models annotated with functional information. Who else is interested in this and would be interested in contributing? Or is this something only our group might care about, since we are studying the effects of heat stress and heat stress, along with desiccation stress, triggers changes in alternative splicing?

            How homogenous are the RNA-Seq data sets coming from the pollen project? So far, all the data have been from a single cell type: germinating pollen tubes. I do not recall seeing much evidence for alternative splicing in these datasets, at least not as compared with other samples that included many cell types, e.g., root or shoot. Also, are there splice forms that exist mainly in pollen but not other tissue types? We found some examples of this in the Arabidospis pollen RNA-Seq data described in our paper "[RNA-seq of Arabidopsis pollen uncovers novel transcription and alternative splicing|https://pubmed.ncbi.nlm.nih.gov/23590974/]".

            How many tomato RNA-Seq data sets are there, and how good are they? For the purpose of producing new gene models, the best bulk RNA-Seq data would be paired end, very long read lengths, and strand-specific. Are such data available currently, or would we need to create new data to cover the entirety of transcription?
            The S. lycopersicon (cultivated tomato) gene annotations include only one gene model per gene. However, visualizing RNA-Seq data in IGB shows that a large number of genes produce multiple splice forms. At least one other group has noticed this, as well. In their article "[Expanding Alternative Splicing Identification by Integrating Multiple Sources of Transcription Data in Tomato|https://www.frontiersin.org/articles/10.3389/fpls.2019.00689/full]", a group at Ohio State University led by Prof. Xiangjia (Jack) Min reported using transcriptome data, including ESTs and RNA-Seq data, to assemble new gene models. I downloaded these and deployed them to IGB Quickload; they are one of the available data sets for the next to last genome release.

            There may be other groups developing similar datasets for the most recent genome release for tomato. And in order to quantify splice variant expression using current methods, it would be extremely helpful to have an up-to-date, accurate-as-possible collection of gene models annotated with functional information. Who else is interested in this and would be interested in contributing? Or is this something only our group might care about.

            As part of the pollen NSF project, we are trying to understand and discover how heat stress triggers changes in RNA synthesis in pollen, in pollen tubes, and in other sample types related to reproduction in plants, especially tomato?

            How homogenous are the RNA-Seq data sets coming from the pollen project? So far, all the data have been from a single cell type: germinating pollen tubes. I do not recall seeing much evidence for alternative splicing in these datasets, at least not as compared with other samples that included many cell types, e.g., root or shoot. Also, are there splice forms that exist mainly in pollen but not other tissue types? We found some examples of this in the Arabidospis pollen RNA-Seq data described in our paper "[RNA-seq of Arabidopsis pollen uncovers novel transcription and alternative splicing|https://pubmed.ncbi.nlm.nih.gov/23590974/]".

            How many tomato RNA-Seq data sets are there, and how good are they? For the purpose of producing new gene models, the best bulk RNA-Seq data would be paired end, very long read lengths, and strand-specific. Are such data available currently, or would we need to create new data to cover the entirety of transcription?
            ann.loraine Ann Loraine made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            ann.loraine Ann Loraine made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            ann.loraine Ann Loraine made changes -
            Status First Level Review in Progress [ 10301 ] Ready for Pull Request [ 10304 ]
            ann.loraine Ann Loraine made changes -
            Status Ready for Pull Request [ 10304 ] Pull Request Submitted [ 10101 ]
            ann.loraine Ann Loraine made changes -
            Status Pull Request Submitted [ 10101 ] Reviewing Pull Request [ 10303 ]
            ann.loraine Ann Loraine made changes -
            Status Reviewing Pull Request [ 10303 ] Merged Needs Testing [ 10002 ]
            ann.loraine Ann Loraine made changes -
            Status Merged Needs Testing [ 10002 ] Post-merge Testing In Progress [ 10003 ]
            ann.loraine Ann Loraine made changes -
            Resolution Done [ 10000 ]
            Status Post-merge Testing In Progress [ 10003 ] Closed [ 6 ]
            ann.loraine Ann Loraine made changes -
            Resolution Done [ 10000 ]
            Status Closed [ 6 ] To-Do [ 10305 ]
            ann.loraine Ann Loraine made changes -
            Sprint Spring 9 2022 May 9, Summer 1 2022 May 23 [ 144, 147 ] Spring 9 2022 May 9, Summer 1 2022 May 23, Summer 2 2022 June 6 [ 144, 147, 148 ]
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ]
            ann.loraine Ann Loraine made changes -
            Sprint Spring 9 2022 May 9, Summer 1 2022 May 23, Summer 2 2022 June 6 [ 144, 147, 148 ] Spring 9 2022 May 9, Summer 1 2022 May 23, Summer 2 2022 June 6, Summer 3 2022 June 20 [ 144, 147, 148, 149 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            ann.loraine Ann Loraine made changes -
            ann.loraine Ann Loraine made changes -
            Link This issue relates to IGBF-3135 [ IGBF-3135 ]
            ann.loraine Ann Loraine made changes -
            Sprint Spring 9 2022 May 9, Summer 1 2022 May 23, Summer 2 2022 June 6, Summer 3 2022 June 21 [ 144, 147, 148, 149 ] Spring 9 2022 May 9, Summer 1 2022 May 23, Summer 2 2022 June 6, Summer 3 2022 June 21, Summer 4 2022 July 4 [ 144, 147, 148, 149, 150 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            ann.loraine Ann Loraine made changes -
            Description The S. lycopersicon (cultivated tomato) gene annotations include only one gene model per gene. However, visualizing RNA-Seq data in IGB shows that a large number of genes produce multiple splice forms. At least one other group has noticed this, as well. In their article "[Expanding Alternative Splicing Identification by Integrating Multiple Sources of Transcription Data in Tomato|https://www.frontiersin.org/articles/10.3389/fpls.2019.00689/full]", a group at Ohio State University led by Prof. Xiangjia (Jack) Min reported using transcriptome data, including ESTs and RNA-Seq data, to assemble new gene models. I downloaded these and deployed them to IGB Quickload; they are one of the available data sets for the next to last genome release.

            There may be other groups developing similar datasets for the most recent genome release for tomato. And in order to quantify splice variant expression using current methods, it would be extremely helpful to have an up-to-date, accurate-as-possible collection of gene models annotated with functional information. Who else is interested in this and would be interested in contributing? Or is this something only our group might care about.

            As part of the pollen NSF project, we are trying to understand and discover how heat stress triggers changes in RNA synthesis in pollen, in pollen tubes, and in other sample types related to reproduction in plants, especially tomato?

            How homogenous are the RNA-Seq data sets coming from the pollen project? So far, all the data have been from a single cell type: germinating pollen tubes. I do not recall seeing much evidence for alternative splicing in these datasets, at least not as compared with other samples that included many cell types, e.g., root or shoot. Also, are there splice forms that exist mainly in pollen but not other tissue types? We found some examples of this in the Arabidospis pollen RNA-Seq data described in our paper "[RNA-seq of Arabidopsis pollen uncovers novel transcription and alternative splicing|https://pubmed.ncbi.nlm.nih.gov/23590974/]".

            How many tomato RNA-Seq data sets are there, and how good are they? For the purpose of producing new gene models, the best bulk RNA-Seq data would be paired end, very long read lengths, and strand-specific. Are such data available currently, or would we need to create new data to cover the entirety of transcription?
            The S. lycopersicon (cultivated tomato) gene annotations include only one gene model per gene. However, visualizing RNA-Seq data in IGB shows that a large number of genes produce multiple splice forms. At least one other group has noticed this, as well. In their article "[Expanding Alternative Splicing Identification by Integrating Multiple Sources of Transcription Data in Tomato|https://www.frontiersin.org/articles/10.3389/fpls.2019.00689/full]", a group at Ohio State University led by Prof. Xiangjia (Jack) Min reported using transcriptome data, including ESTs and RNA-Seq data, to assemble new gene models. I downloaded these and deployed them to IGB Quickload; they are one of the available data sets for the next to last genome release.

            There may be other groups developing similar datasets for the most recent genome release for tomato. And in order to quantify splice variant expression using current methods, it would be extremely helpful to have an up-to-date, accurate-as-possible collection of gene models annotated with functional information. Who else is interested in this and would be interested in contributing? Or, is this something only our group might care about?

            As part of the pollen NSF project, we are trying to understand and discover how heat stress triggers changes in RNA synthesis in pollen, in pollen tubes, and in other sample types related to reproduction in plants, especially tomato?

            How homogenous are the RNA-Seq data sets coming from the pollen project? So far, all the data have been from a single cell type: germinating pollen tubes. I do not recall seeing much evidence for alternative splicing in these datasets, at least not as compared with other samples that included many cell types, e.g., root or shoot. Also, are there splice forms that exist mainly in pollen but not other tissue types? We found some examples of this in the Arabidospis pollen RNA-Seq data described in our paper "[RNA-seq of Arabidopsis pollen uncovers novel transcription and alternative splicing|https://pubmed.ncbi.nlm.nih.gov/23590974/]".

            How many tomato RNA-Seq data sets are there, and how good are they? For the purpose of producing new gene models, the best bulk RNA-Seq data would be paired end, very long read lengths, and strand-specific. Are such data available currently, or would we need to create new data to cover the entirety of transcription?
            ann.loraine Ann Loraine made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            ann.loraine Ann Loraine made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            ann.loraine Ann Loraine made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            ann.loraine Ann Loraine made changes -
            Status First Level Review in Progress [ 10301 ] Ready for Pull Request [ 10304 ]
            ann.loraine Ann Loraine made changes -
            Status Ready for Pull Request [ 10304 ] Pull Request Submitted [ 10101 ]
            ann.loraine Ann Loraine made changes -
            Status Pull Request Submitted [ 10101 ] Reviewing Pull Request [ 10303 ]
            ann.loraine Ann Loraine made changes -
            Status Reviewing Pull Request [ 10303 ] Merged Needs Testing [ 10002 ]
            ann.loraine Ann Loraine made changes -
            Status Merged Needs Testing [ 10002 ] Post-merge Testing In Progress [ 10003 ]
            ann.loraine Ann Loraine made changes -
            Resolution Done [ 10000 ]
            Status Post-merge Testing In Progress [ 10003 ] Closed [ 6 ]
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ]

              People

              • Assignee:
                ann.loraine Ann Loraine
                Reporter:
                ann.loraine Ann Loraine
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: