[IGBF-2943] Investigate recent literature on compendium databases with RNA-Seq - JIRA UNCC

Ann Loraine created issue - 14/Sep/21 10:27 AM

Ann Loraine made changes - 14/Sep/21 10:27 AM

Field	Original Value	New Value
Epic Link		~~IGBF-2883~~ [ 21320 ]

Ann Loraine made changes - 14/Sep/21 10:28 AM

Assignee

Ann Loraine [ aloraine ]

Hide

Permalink

Ann Loraine added a comment - 14/Sep/21 10:29 AM

Notes:

Nick Provart has done some work in this area
We have too – see "metabolic network" paper from 2006

Show

Ann Loraine added a comment - 14/Sep/21 10:29 AM Notes: Nick Provart has done some work in this area We have too – see "metabolic network" paper from 2006

Ann Loraine made changes - 14/Sep/21 10:33 AM

Sprint

Fall 3 2021Sep 13 - Sep 24 [ 129 ]

Ann Loraine made changes - 14/Sep/21 10:34 AM

Rank

Ranked higher

Nowlan Freese made changes - 14/Sep/21 10:35 AM

Assignee

Nowlan Freese [ nfreese ]

Nowlan Freese made changes - 14/Sep/21 10:35 AM

Status

To-Do [ 10305 ]

In Progress [ 3 ]

Hide

Permalink

Nowlan Freese added a comment - 14/Sep/21 3:27 PM - edited

Published 2018
NetMiner-an ensemble pipeline for building genome-wide and high-quality gene co-expression network using massive-scale RNA-seq samples

In this study, we have downloaded 456 primary rice RNA-seq samples from the NCBI Sequence Read Archive (SRA) (see S1 and S2 Datasets for details), with the keywords of “Oryza sativa” [Organism] AND “platform illumina” [Properties] AND “strategy rna seq” [Properties] (accessed on May 29, 2014). These RNA-seq samples contain a wide spread of experimental conditions, tissue types and developmental stages.

Show

Nowlan Freese added a comment - 14/Sep/21 3:27 PM - edited Published 2018 NetMiner-an ensemble pipeline for building genome-wide and high-quality gene co-expression network using massive-scale RNA-seq samples In this study, we have downloaded 456 primary rice RNA-seq samples from the NCBI Sequence Read Archive (SRA) (see S1 and S2 Datasets for details), with the keywords of “Oryza sativa” [Organism] AND “platform illumina” [Properties] AND “strategy rna seq” [Properties] (accessed on May 29, 2014). These RNA-seq samples contain a wide spread of experimental conditions, tissue types and developmental stages.

Hide

Permalink

Nowlan Freese added a comment - 14/Sep/21 3:53 PM - edited

Published 2020
A Comprehensive Online Database for Exploring 20,000 Public Arabidopsis RNA-Seq Libraries

Here, we present the Arabidopsis RNA-seq database (ARS) (http://ipf.sustech.edu.cn/pub/athrna/) that integrates 20 068 publicly available Arabidopsis RNA-seq library data deposited at the Gene Expression Omnibus, the Sequence Read Archive, the European Nucleotide Archive, and the DNA Data Bank of Japan databases (Supplemental Table 1) before the end of March 2019 (Figure 1B). We downloaded raw data of all libraries and re-processed them with a standardized pipeline, mapped the reads to the TAIR10 genome, and calculated a normalized expression level in FPKM (fragments per kilobase of transcript per million mapped reads) at each library for all the 37 336 genes annotated in Araport11, and performed coexpression analysis using these data (see Supplemental Information for details).

Ann's note: Searched Web site mentioned in the paper (http://ipf.sustech.edu.cn/pub/athrna/) for "AT1G07350" but search never ended. Site is broken?

1st response from authors:

Dear Nowlan,

Thank you for help reporting the problem! There seems to be some issue with our campus network which was newly upgraded. We will fix this as soon as possible (it's a holiday break in China now, so may take a few days).

Meanwhile, if you are interested in the entire dataset. The following link can access the matrices of all RNA-seq FPKM from our database, this file is about 2.3 Gb (~28,000 libraries).
https://www.dropbox.com/s/q95yqoyqkprnvua/gene_FPKM_200501.csv.gz?dl=0

Feel free to let us know if you have any questions.

Best,
Jixian

2nd response from authors:

Dear Nowlan,

Sorry for the failure to access the site due to CPU overload, the site is now accessible and welcome!

Thanks,
Best.

3rd response from authors:

Dear Nowlan,

Yes, we are also working on RNA-seq databases for rice, maize and soybean, which hopefully will be available later this year.

You can access the test version by clicking the links at the top left of the page:

Show

Nowlan Freese added a comment - 14/Sep/21 3:53 PM - edited Published 2020 A Comprehensive Online Database for Exploring 20,000 Public Arabidopsis RNA-Seq Libraries Here, we present the Arabidopsis RNA-seq database (ARS) ( http://ipf.sustech.edu.cn/pub/athrna/ ) that integrates 20 068 publicly available Arabidopsis RNA-seq library data deposited at the Gene Expression Omnibus, the Sequence Read Archive, the European Nucleotide Archive, and the DNA Data Bank of Japan databases (Supplemental Table 1) before the end of March 2019 (Figure 1B). We downloaded raw data of all libraries and re-processed them with a standardized pipeline, mapped the reads to the TAIR10 genome, and calculated a normalized expression level in FPKM (fragments per kilobase of transcript per million mapped reads) at each library for all the 37 336 genes annotated in Araport11, and performed coexpression analysis using these data (see Supplemental Information for details). Ann's note: Searched Web site mentioned in the paper ( http://ipf.sustech.edu.cn/pub/athrna/ ) for "AT1G07350" but search never ended. Site is broken? 1st response from authors: Dear Nowlan, Thank you for help reporting the problem! There seems to be some issue with our campus network which was newly upgraded. We will fix this as soon as possible (it's a holiday break in China now, so may take a few days). Meanwhile, if you are interested in the entire dataset. The following link can access the matrices of all RNA-seq FPKM from our database, this file is about 2.3 Gb (~28,000 libraries). https://www.dropbox.com/s/q95yqoyqkprnvua/gene_FPKM_200501.csv.gz?dl=0 Feel free to let us know if you have any questions. Best, Jixian 2nd response from authors: Dear Nowlan, Sorry for the failure to access the site due to CPU overload, the site is now accessible and welcome! Thanks, Best. 3rd response from authors: Dear Nowlan, Yes, we are also working on RNA-seq databases for rice, maize and soybean, which hopefully will be available later this year. You can access the test version by clicking the links at the top left of the page:

Hide

Permalink

Nowlan Freese added a comment - 14/Sep/21 3:57 PM - edited

Published 2018
AgriSeqDB: an online RNA-Seq database for functional studies of agriculturally relevant plant species

AgriSeqDB (https://expression.latrobe.edu.au/agriseqdb) is a database for viewing, analysing and interpreting developmental and tissue/cell-specific transcriptome data from several species, including major agricultural crops such as wheat, rice, maize, barley and tomato. The disparate manner in which public transcriptome data is often warehoused and the challenge of visualizing raw data are both major hurdles to data reuse. The popular eFP browser does an excellent job of presenting transcriptome data in an easily interpretable view, but previous implementation has been mostly on a case-by-case basis. Here we present an integrated visualisation database of transcriptome data-sets from six species that did not previously have public-facing visualisations. We combine the eFP browser, for gene-by-gene investigation, with the Degust browser, which enables visualisation of all transcripts across multiple samples.

Show

Nowlan Freese added a comment - 14/Sep/21 3:57 PM - edited Published 2018 AgriSeqDB: an online RNA-Seq database for functional studies of agriculturally relevant plant species AgriSeqDB ( https://expression.latrobe.edu.au/agriseqdb ) is a database for viewing, analysing and interpreting developmental and tissue/cell-specific transcriptome data from several species, including major agricultural crops such as wheat, rice, maize, barley and tomato. The disparate manner in which public transcriptome data is often warehoused and the challenge of visualizing raw data are both major hurdles to data reuse. The popular eFP browser does an excellent job of presenting transcriptome data in an easily interpretable view, but previous implementation has been mostly on a case-by-case basis. Here we present an integrated visualisation database of transcriptome data-sets from six species that did not previously have public-facing visualisations. We combine the eFP browser, for gene-by-gene investigation, with the Degust browser, which enables visualisation of all transcripts across multiple samples.

Nowlan Freese made changes - 16/Sep/21 10:15 AM

Status

In Progress [ 3 ]

To-Do [ 10305 ]

Hide

Permalink

Nowlan Freese added a comment - 17/Sep/21 9:51 AM - edited

Published 2020:
IC4R-2.0: Rice Genome Reannotation Using Massive RNA-seq Data

The Information Commons for Rice (IC4R, http://ic4r.org) [15], [16], [17], one of the core resources of National Genomics Data Center (NGDC, http://bigd.big.ac.cn) [18], [19], [20], is a public database integrating multiple omics data for rice and providing high-quality annotations. Here, we perform rice genome reannotation based on integration of large-scale RNA-seq data and consequently release a new annotation system—IC4R-2.0 for O. sativa L. ssp. japonica. IC4R-2.0 presents considerable improvements by enhancing structural completeness of protein-coding genes, incorporating an abundance of functional annotations, and systematically identifying long non-coding RNAs (lncRNAs) and circular RNAs (circRNAs) in rice genome.

Annotations from above available for download and import into IGB here: http://ic4r.org/download

To facilitate access to the new annotation system, IC4R provides a series of flat files for public downloading (http://ic4r.org/download), including gene structural annotation (GFF format), nucleotide and protein sequences (FASTA format), correspondence between IC4R-2.0, MSU-7.0, and RAP-DB ID systems (CSV format), predicted CpG island (TSV format), as well as exon–exon junction information (BED format). Furthermore, to make these associated data accessible more efficiently, an open application programming interface (API) (http://ic4r.org/api) is provided for automatic retrieval.

This paper discusses additional details of the IC4R project.

Show

Nowlan Freese added a comment - 17/Sep/21 9:51 AM - edited Published 2020: IC4R-2.0: Rice Genome Reannotation Using Massive RNA-seq Data The Information Commons for Rice (IC4R, http://ic4r.org ) [15] , [16] , [17] , one of the core resources of National Genomics Data Center (NGDC, http://bigd.big.ac.cn ) [18] , [19] , [20] , is a public database integrating multiple omics data for rice and providing high-quality annotations. Here, we perform rice genome reannotation based on integration of large-scale RNA-seq data and consequently release a new annotation system—IC4R-2.0 for O. sativa L. ssp. japonica. IC4R-2.0 presents considerable improvements by enhancing structural completeness of protein-coding genes, incorporating an abundance of functional annotations, and systematically identifying long non-coding RNAs (lncRNAs) and circular RNAs (circRNAs) in rice genome. Annotations from above available for download and import into IGB here: http://ic4r.org/download To facilitate access to the new annotation system, IC4R provides a series of flat files for public downloading ( http://ic4r.org/download ), including gene structural annotation (GFF format), nucleotide and protein sequences (FASTA format), correspondence between IC4R-2.0, MSU-7.0, and RAP-DB ID systems (CSV format), predicted CpG island (TSV format), as well as exon–exon junction information (BED format). Furthermore, to make these associated data accessible more efficiently, an open application programming interface (API) ( http://ic4r.org/api ) is provided for automatic retrieval. This paper discusses additional details of the IC4R project.

Ann Loraine made changes - 20/Sep/21 10:28 AM

Description

Find five "top papers" (much-cited, influential) in which authors have developed compendium data with RNA-Seq data, focusing on bulk RNA-Seq primarily.

Find five "top papers" (much-cited, influential) in which authors have developed compendium data with RNA-Seq data, focusing on bulk RNA-Seq primarily.

Question: Do any sites provide a REST endpoint that is easy to use and enables import of "raw" counts or other type of expression value into interactive computational environments such as RStudio or Jupyter notebooks?

Ann Loraine made changes - 20/Sep/21 10:28 AM

Description

Find five "top papers" (much-cited, influential) in which authors have developed compendium data with RNA-Seq data, focusing on bulk RNA-Seq primarily.

Question: Do any sites provide a REST endpoint that is easy to use and enables import of "raw" counts or other type of expression value into interactive computational environments such as RStudio or Jupyter notebooks?

Nowlan Freese made changes - 23/Sep/21 9:34 AM

Status

To-Do [ 10305 ]

In Progress [ 3 ]

Hide

Permalink

Nowlan Freese added a comment - 23/Sep/21 12:07 PM

Published 2015:
TENOR: Database for Comprehensive mRNA-Seq Experiments in Rice

Here we present TENOR (Transcriptome ENcyclopedia Of Rice, http://tenor.dna.affrc.go.jp ), a database that encompasses large-scale mRNA sequencing (mRNA-Seq) data obtained from rice under a wide variety of conditions. Since the elucidation of the ability of plants to adapt to various growing conditions is a key issue in plant sciences, it is of great interest to understand the regulatory networks of genes responsible for environmental changes. We used mRNA-Seq and performed a time-course transcriptome analysis of rice, Oryza sativa L. (cv. Nipponbare), under 10 abiotic stress conditions (high salinity; high and low phosphate; high, low and extremely low cadmium; drought; osmotic; cold; and flood) and two plant hormone treatment conditions (ABA and jasmonic acid).

Show

Nowlan Freese added a comment - 23/Sep/21 12:07 PM Published 2015: TENOR: Database for Comprehensive mRNA-Seq Experiments in Rice Here we present TENOR (Transcriptome ENcyclopedia Of Rice, http://tenor.dna.affrc.go.jp ), a database that encompasses large-scale mRNA sequencing (mRNA-Seq) data obtained from rice under a wide variety of conditions. Since the elucidation of the ability of plants to adapt to various growing conditions is a key issue in plant sciences, it is of great interest to understand the regulatory networks of genes responsible for environmental changes. We used mRNA-Seq and performed a time-course transcriptome analysis of rice, Oryza sativa L. (cv. Nipponbare), under 10 abiotic stress conditions (high salinity; high and low phosphate; high, low and extremely low cadmium; drought; osmotic; cold; and flood) and two plant hormone treatment conditions (ABA and jasmonic acid).

Hide

Permalink

Nowlan Freese added a comment - 23/Sep/21 12:18 PM

The most promising result is from the website http://ipf.sustech.edu.cn/pub/athrna/. The authors have been very responsive and while they do not currently have rice RNA-Seq data, they plan on offering it by the end of 2021.

Show

Nowlan Freese added a comment - 23/Sep/21 12:18 PM The most promising result is from the website http://ipf.sustech.edu.cn/pub/athrna/ . The authors have been very responsive and while they do not currently have rice RNA-Seq data, they plan on offering it by the end of 2021.