prepareData: Prepare raw adbundance/count/length data for later analysis

View source: R/utility_functions.R

prepareDataR Documentation

Prepare raw adbundance/count/length data for later analysis

Description

prepareData generates abundance and count data to be used later, notably in generateData

Usage

prepareData(
  abundance,
  counts,
  lengths,
  tx2gene,
  nsamp,
  key = NULL,
  infReps = "none",
  samps = NULL
)

Arguments

abundance

is a dataframe with nsamp+1 columns, with names Sample1, Sample2, etc and a column for tx_id (that often comes from the rownames). Rows are transcript level quantification estimates. Column names should not include "TPM".

counts

is a dataframe with nsamp+1 columns, with names Sample1, Sample2, etc and a column for tx_id (that often comes from the rownames). Rows are transcript level quantification estimates. Column names should not include "Cnt".

lengths

is a dataframe with nsamp+1 columns, with names Sample1, Sample2, etc and a column for tx_id (that often comes from the rownames). Rows are transcript level effective length information. Column names should not include "Length".

tx2gene

is a dataframe that matches transcripts to genes. Can be created by maketx2gene.

nsamp

is the number of biological samples/replicates used in the analysis

key

is a data.frame with columns "Sample" (corresponding to the unique biological identifier for the analysis), "Condition" (giving the condition/treatment effect variables for the data), and "Identifier", which should be named "Sample1", "Sample2", ... up to the number of rows of key. This "Identifier" needs to be created like this even if the observations don't correspond to unique biological samples.

infReps

is a character variable indicating what kind of inferential replicates (if any) are to be analyzed by the current function call. Values to be used should be "none", "Boot", and "Gibbs". Default is "none".

samps

is an optional vector containing the sample names. Need to specify this if sample names are not just paste0("Sample", 1:nsamp) without any missing.

Value

list of length 2 with the first element being the abundance data (abGeneTempF) and the second being the count data (cntGeneTempF) for use with generateData


skvanburen/CompDTUReg documentation built on Jan. 23, 2025, 9:01 a.m.