load_dataset: Loading and processing input data

View source: R/load_data.R

load_datasetR Documentation

Loading and processing input data

Description

Currently supports five types of inputs: file path to a zip file containing microarray CEL files in a dataset (isProcessed=FALSE, isRNAseq=FALSE); file path to a folder containing microarray CEL files in a dataset (isProcessed=FALSE, isRNAseq=FALSE); an ArrayExpress experiment accession number in the format of "E-XXXX-n" (the experiment must be measured on the "A-AFFY-30" platform) (isProcessed=FALSE, isRNAseq=FALSE); file path to a tab-delimited txt file storing processed gene expression values in a dataset (gene identifiers in the first column and then one sample per column) (isProcessed=TRUE, isRNAseq=TRUE/FALSE); a data.frame object storing processed gene expression values in a dataset (gene identifiers in the first column and then one sample per column) (isProcessed=TRUE, isRNAseq=TRUE/FALSE);

Usage

load_dataset(input, isProcessed, isRNAseq, model, compendium, quantile_ref,
  download_folder = "./", norm01 = FALSE)

Arguments

input

file path to the input file or input folder or ArrayExpress accession number or a data.frame object

isProcessed

a logical value indicating whether the input_data has already been processed into expression values at the gene level.

isRNAseq

a logical value indicating whether the processed input_data is RNAseq data. If TRUE, the processed RNAseq data will be normalized to a comparable range with the microarray-based compendium using TDM. If FALSE, the processed input_data is considered a microarray dataset and will be quantile normalized to be comparable to the compendium.

model

the ADAGE model used to analyze the input_data

compendium

the gene expression compendium of an organism

quantile_ref

a vector storing the reference quantile distribution of the input compendium at the microarray probe level. Since the input microarray data needs to be normalized to the processed compendium, the compendium and the quantile_ref must match each other.

download_folder

file path to save files downloaded from ArrayExpress when input is an ArrayExpress accession number.

norm01

a logical value indicating whether the output should be zero-one normalized (default: FALSE)

Value

a data.frame containing the processed gene expression values ready for ADAGE analysis.


greenelab/ADAGEpath documentation built on May 25, 2022, 7:11 a.m.