Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-2943

Investigate recent literature on compendium databases with RNA-Seq

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Find five "top papers" (much-cited, influential) in which authors have developed compendium data with RNA-Seq data, focusing on bulk RNA-Seq primarily.

      Question: Do any sites provide a REST endpoint that is easy to use and enables import of "raw" counts or other type of expression value into interactive computational environments such as RStudio or Jupyter notebooks?

        Attachments

          Issue Links

            Activity

            Hide
            ann.loraine Ann Loraine added a comment -

            Notes:

            • Nick Provart has done some work in this area
            • We have too – see "metabolic network" paper from 2006
            Show
            ann.loraine Ann Loraine added a comment - Notes: Nick Provart has done some work in this area We have too – see "metabolic network" paper from 2006
            Hide
            nfreese Nowlan Freese added a comment - - edited

            Published 2018
            NetMiner-an ensemble pipeline for building genome-wide and high-quality gene co-expression network using massive-scale RNA-seq samples

            In this study, we have downloaded 456 primary rice RNA-seq samples from the NCBI Sequence Read Archive (SRA) (see S1 and S2 Datasets for details), with the keywords of “Oryza sativa” [Organism] AND “platform illumina” [Properties] AND “strategy rna seq” [Properties] (accessed on May 29, 2014). These RNA-seq samples contain a wide spread of experimental conditions, tissue types and developmental stages.

            Show
            nfreese Nowlan Freese added a comment - - edited Published 2018 NetMiner-an ensemble pipeline for building genome-wide and high-quality gene co-expression network using massive-scale RNA-seq samples In this study, we have downloaded 456 primary rice RNA-seq samples from the NCBI Sequence Read Archive (SRA) (see S1 and S2 Datasets for details), with the keywords of “Oryza sativa” [Organism] AND “platform illumina” [Properties] AND “strategy rna seq” [Properties] (accessed on May 29, 2014). These RNA-seq samples contain a wide spread of experimental conditions, tissue types and developmental stages.
            Hide
            nfreese Nowlan Freese added a comment - - edited

            Published 2020
            A Comprehensive Online Database for Exploring 20,000 Public Arabidopsis RNA-Seq Libraries

            Here, we present the Arabidopsis RNA-seq database (ARS) (http://ipf.sustech.edu.cn/pub/athrna/) that integrates 20 068 publicly available Arabidopsis RNA-seq library data deposited at the Gene Expression Omnibus, the Sequence Read Archive, the European Nucleotide Archive, and the DNA Data Bank of Japan databases (Supplemental Table 1) before the end of March 2019 (Figure 1B). We downloaded raw data of all libraries and re-processed them with a standardized pipeline, mapped the reads to the TAIR10 genome, and calculated a normalized expression level in FPKM (fragments per kilobase of transcript per million mapped reads) at each library for all the 37 336 genes annotated in Araport11, and performed coexpression analysis using these data (see Supplemental Information for details).

            Ann's note: Searched Web site mentioned in the paper (http://ipf.sustech.edu.cn/pub/athrna/) for "AT1G07350" but search never ended. Site is broken?

            1st response from authors:

            Dear Nowlan,

            Thank you for help reporting the problem! There seems to be some issue with our campus network which was newly upgraded. We will fix this as soon as possible (it's a holiday break in China now, so may take a few days).

            Meanwhile, if you are interested in the entire dataset. The following link can access the matrices of all RNA-seq FPKM from our database, this file is about 2.3 Gb (~28,000 libraries).
            https://www.dropbox.com/s/q95yqoyqkprnvua/gene_FPKM_200501.csv.gz?dl=0

            Feel free to let us know if you have any questions.

            Best,
            Jixian

            2nd response from authors:

            Dear Nowlan,

            Sorry for the failure to access the site due to CPU overload, the site is now accessible and welcome!

            Thanks,
            Best.

            3rd response from authors:

            Dear Nowlan,

            Yes, we are also working on RNA-seq databases for rice, maize and soybean, which hopefully will be available later this year.

            You can access the test version by clicking the links at the top left of the page:

            Show
            nfreese Nowlan Freese added a comment - - edited Published 2020 A Comprehensive Online Database for Exploring 20,000 Public Arabidopsis RNA-Seq Libraries Here, we present the Arabidopsis RNA-seq database (ARS) ( http://ipf.sustech.edu.cn/pub/athrna/ ) that integrates 20 068 publicly available Arabidopsis RNA-seq library data deposited at the Gene Expression Omnibus, the Sequence Read Archive, the European Nucleotide Archive, and the DNA Data Bank of Japan databases (Supplemental Table 1) before the end of March 2019 (Figure 1B). We downloaded raw data of all libraries and re-processed them with a standardized pipeline, mapped the reads to the TAIR10 genome, and calculated a normalized expression level in FPKM (fragments per kilobase of transcript per million mapped reads) at each library for all the 37 336 genes annotated in Araport11, and performed coexpression analysis using these data (see Supplemental Information for details). Ann's note: Searched Web site mentioned in the paper ( http://ipf.sustech.edu.cn/pub/athrna/ ) for "AT1G07350" but search never ended. Site is broken? 1st response from authors: Dear Nowlan, Thank you for help reporting the problem! There seems to be some issue with our campus network which was newly upgraded. We will fix this as soon as possible (it's a holiday break in China now, so may take a few days). Meanwhile, if you are interested in the entire dataset. The following link can access the matrices of all RNA-seq FPKM from our database, this file is about 2.3 Gb (~28,000 libraries). https://www.dropbox.com/s/q95yqoyqkprnvua/gene_FPKM_200501.csv.gz?dl=0 Feel free to let us know if you have any questions. Best, Jixian 2nd response from authors: Dear Nowlan, Sorry for the failure to access the site due to CPU overload, the site is now accessible and welcome! Thanks, Best. 3rd response from authors: Dear Nowlan, Yes, we are also working on RNA-seq databases for rice, maize and soybean, which hopefully will be available later this year. You can access the test version by clicking the links at the top left of the page:
            Hide
            nfreese Nowlan Freese added a comment - - edited

            Published 2018
            AgriSeqDB: an online RNA-Seq database for functional studies of agriculturally relevant plant species

            AgriSeqDB (https://expression.latrobe.edu.au/agriseqdb) is a database for viewing, analysing and interpreting developmental and tissue/cell-specific transcriptome data from several species, including major agricultural crops such as wheat, rice, maize, barley and tomato. The disparate manner in which public transcriptome data is often warehoused and the challenge of visualizing raw data are both major hurdles to data reuse. The popular eFP browser does an excellent job of presenting transcriptome data in an easily interpretable view, but previous implementation has been mostly on a case-by-case basis. Here we present an integrated visualisation database of transcriptome data-sets from six species that did not previously have public-facing visualisations. We combine the eFP browser, for gene-by-gene investigation, with the Degust browser, which enables visualisation of all transcripts across multiple samples.

            Show
            nfreese Nowlan Freese added a comment - - edited Published 2018 AgriSeqDB: an online RNA-Seq database for functional studies of agriculturally relevant plant species AgriSeqDB ( https://expression.latrobe.edu.au/agriseqdb ) is a database for viewing, analysing and interpreting developmental and tissue/cell-specific transcriptome data from several species, including major agricultural crops such as wheat, rice, maize, barley and tomato. The disparate manner in which public transcriptome data is often warehoused and the challenge of visualizing raw data are both major hurdles to data reuse. The popular eFP browser does an excellent job of presenting transcriptome data in an easily interpretable view, but previous implementation has been mostly on a case-by-case basis. Here we present an integrated visualisation database of transcriptome data-sets from six species that did not previously have public-facing visualisations. We combine the eFP browser, for gene-by-gene investigation, with the Degust browser, which enables visualisation of all transcripts across multiple samples.
            Hide
            nfreese Nowlan Freese added a comment - - edited

            Published 2020:
            IC4R-2.0: Rice Genome Reannotation Using Massive RNA-seq Data

            The Information Commons for Rice (IC4R, http://ic4r.org) [15], [16], [17], one of the core resources of National Genomics Data Center (NGDC, http://bigd.big.ac.cn) [18], [19], [20], is a public database integrating multiple omics data for rice and providing high-quality annotations. Here, we perform rice genome reannotation based on integration of large-scale RNA-seq data and consequently release a new annotation system—IC4R-2.0 for O. sativa L. ssp. japonica. IC4R-2.0 presents considerable improvements by enhancing structural completeness of protein-coding genes, incorporating an abundance of functional annotations, and systematically identifying long non-coding RNAs (lncRNAs) and circular RNAs (circRNAs) in rice genome.

            Annotations from above available for download and import into IGB here: http://ic4r.org/download

            To facilitate access to the new annotation system, IC4R provides a series of flat files for public downloading (http://ic4r.org/download), including gene structural annotation (GFF format), nucleotide and protein sequences (FASTA format), correspondence between IC4R-2.0, MSU-7.0, and RAP-DB ID systems (CSV format), predicted CpG island (TSV format), as well as exon–exon junction information (BED format). Furthermore, to make these associated data accessible more efficiently, an open application programming interface (API) (http://ic4r.org/api) is provided for automatic retrieval.

            This paper discusses additional details of the IC4R project.

            Show
            nfreese Nowlan Freese added a comment - - edited Published 2020: IC4R-2.0: Rice Genome Reannotation Using Massive RNA-seq Data The Information Commons for Rice (IC4R, http://ic4r.org ) [15] , [16] , [17] , one of the core resources of National Genomics Data Center (NGDC, http://bigd.big.ac.cn ) [18] , [19] , [20] , is a public database integrating multiple omics data for rice and providing high-quality annotations. Here, we perform rice genome reannotation based on integration of large-scale RNA-seq data and consequently release a new annotation system—IC4R-2.0 for O. sativa L. ssp. japonica. IC4R-2.0 presents considerable improvements by enhancing structural completeness of protein-coding genes, incorporating an abundance of functional annotations, and systematically identifying long non-coding RNAs (lncRNAs) and circular RNAs (circRNAs) in rice genome. Annotations from above available for download and import into IGB here: http://ic4r.org/download To facilitate access to the new annotation system, IC4R provides a series of flat files for public downloading ( http://ic4r.org/download ), including gene structural annotation (GFF format), nucleotide and protein sequences (FASTA format), correspondence between IC4R-2.0, MSU-7.0, and RAP-DB ID systems (CSV format), predicted CpG island (TSV format), as well as exon–exon junction information (BED format). Furthermore, to make these associated data accessible more efficiently, an open application programming interface (API) ( http://ic4r.org/api ) is provided for automatic retrieval. This paper discusses additional details of the IC4R project.
            Hide
            nfreese Nowlan Freese added a comment -

            Published 2015:
            TENOR: Database for Comprehensive mRNA-Seq Experiments in Rice

            Here we present TENOR (Transcriptome ENcyclopedia Of Rice, http://tenor.dna.affrc.go.jp ), a database that encompasses large-scale mRNA sequencing (mRNA-Seq) data obtained from rice under a wide variety of conditions. Since the elucidation of the ability of plants to adapt to various growing conditions is a key issue in plant sciences, it is of great interest to understand the regulatory networks of genes responsible for environmental changes. We used mRNA-Seq and performed a time-course transcriptome analysis of rice, Oryza sativa L. (cv. Nipponbare), under 10 abiotic stress conditions (high salinity; high and low phosphate; high, low and extremely low cadmium; drought; osmotic; cold; and flood) and two plant hormone treatment conditions (ABA and jasmonic acid).

            Show
            nfreese Nowlan Freese added a comment - Published 2015: TENOR: Database for Comprehensive mRNA-Seq Experiments in Rice Here we present TENOR (Transcriptome ENcyclopedia Of Rice, http://tenor.dna.affrc.go.jp ), a database that encompasses large-scale mRNA sequencing (mRNA-Seq) data obtained from rice under a wide variety of conditions. Since the elucidation of the ability of plants to adapt to various growing conditions is a key issue in plant sciences, it is of great interest to understand the regulatory networks of genes responsible for environmental changes. We used mRNA-Seq and performed a time-course transcriptome analysis of rice, Oryza sativa L. (cv. Nipponbare), under 10 abiotic stress conditions (high salinity; high and low phosphate; high, low and extremely low cadmium; drought; osmotic; cold; and flood) and two plant hormone treatment conditions (ABA and jasmonic acid).
            Hide
            nfreese Nowlan Freese added a comment -

            The most promising result is from the website http://ipf.sustech.edu.cn/pub/athrna/. The authors have been very responsive and while they do not currently have rice RNA-Seq data, they plan on offering it by the end of 2021.

            Show
            nfreese Nowlan Freese added a comment - The most promising result is from the website http://ipf.sustech.edu.cn/pub/athrna/ . The authors have been very responsive and while they do not currently have rice RNA-Seq data, they plan on offering it by the end of 2021.
            Hide
            nfreese Nowlan Freese added a comment - - edited

            Published 2021:
            Single-cell transcriptome atlas of the leaf and root of rice seedlings

            In this study, we apply single-cell RNA sequencing to both shoot and root of rice seedlings growing in Kimura B nutrient solution or exposed to various abiotic stresses and characterize transcriptomes for a total of 237,431 individual cells. We identify 15 and nine cell types in the leaf and root, respectively, and observe that common transcriptome features are often shared between leaves and roots in the same tissue layer, except for endodermis or epidermis.

            All sequencing data have been deposited to the Genome Sequence Archive (Wang et al., 2017) in BIG Data Center (https://ngdc.cncb.ac.cn/gsa/), Beijing Institute of Genomics, Chinese Academy of Sciences, under the accession number CRA004082. Codes to analyze the data and generate figures are available at GitHub (https://github.com/Yuwang-art/scRNA-seq_in_rice_seedlings) and Zenodo (https://doi.org/10.5281/zenodo.4916334).

            Show
            nfreese Nowlan Freese added a comment - - edited Published 2021: Single-cell transcriptome atlas of the leaf and root of rice seedlings In this study, we apply single-cell RNA sequencing to both shoot and root of rice seedlings growing in Kimura B nutrient solution or exposed to various abiotic stresses and characterize transcriptomes for a total of 237,431 individual cells. We identify 15 and nine cell types in the leaf and root, respectively, and observe that common transcriptome features are often shared between leaves and roots in the same tissue layer, except for endodermis or epidermis. All sequencing data have been deposited to the Genome Sequence Archive (Wang et al., 2017) in BIG Data Center ( https://ngdc.cncb.ac.cn/gsa/ ), Beijing Institute of Genomics, Chinese Academy of Sciences, under the accession number CRA004082. Codes to analyze the data and generate figures are available at GitHub ( https://github.com/Yuwang-art/scRNA-seq_in_rice_seedlings ) and Zenodo ( https://doi.org/10.5281/zenodo.4916334 ).

              People

              • Assignee:
                nfreese Nowlan Freese
                Reporter:
                ann.loraine Ann Loraine
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: