RegspliceData: RegspliceData objects.

View source: R/class_RegspliceData.R

RegspliceData-classR Documentation

RegspliceData objects.

Description

RegspliceData objects contain data in the format required by functions in the regsplice analysis pipeline.

Usage

RegspliceData(counts, gene_IDs = NULL, n_exons = NULL, condition = NULL)

## S4 method for signature 'RegspliceData'
assays(x, withDimnames, ..., value)

countsData(x)

## S4 method for signature 'RegspliceData'
countsData(x)

weightsData(x)

## S4 method for signature 'RegspliceData'
weightsData(x)

## S4 method for signature 'RegspliceData'
rowData(x)

## S4 method for signature 'RegspliceData'
colData(x, ..., value)

## S4 method for signature 'RegspliceData,ANY,ANY,ANY'
x[i, j]

Arguments

counts

RNA-seq read counts or exon microarray intensities (matrix or data frame). Rows are exons, and columns are biological samples. Alternatively, counts also accepts a SummarizedExperiment input object containing all required input data, which may be useful when running regsplice as part of a pipeline with other packages.

gene_IDs

Vector of gene IDs (character vector). Length is equal to the number of genes.

n_exons

Vector of exon lengths (numeric vector of integers), i.e. the number of exon bins per gene. Length is equal to the number of genes.

condition

Experimental condition for each biological sample (character or numeric vector, or factor).

x

RegspliceData object (for accessor or subsetting functions).

withDimnames

See SummarizedExperiment::assays().

...

Additional arguments for replacement with `[<-`.

value

Value for replacement with `[<-`.

i

Gene names (character vector) or row numbers (numeric vector) for subsetting genes or exons. Note that when subsetting whole genes, gene names (character vector) should be provided instead of row numbers, to avoid possible errors due to selecting incorrect row numbers. Row numbers may be provided to subset individual exons.

j

Column numbers (numeric vector) for subsetting biological samples.

Details

The RegspliceData format is based on the SummarizedExperiment container. Initially, objects contain raw data along with meta-data for rows (genes and exons) and columns (biological samples). During subsequent steps in the regsplice analysis pipeline, the data values are modified, and additional data and meta-data are added to the object. Final results are stored in a RegspliceResults object.

RegspliceData objects are created with the constructor function RegspliceData().

Required inputs for the constructor function are counts (matrix or data frame of RNA-seq read counts or exon microarray intensities), gene_IDs (vector of gene IDs), n_exons (vector of exon lengths, i.e. number of exon bins per gene), and condition (vector of experimental conditions for each biological sample).

Alternatively, the inputs can be provided as a SummarizedExperiment object, which will be parsed to extract each of these components. This may be useful when running regsplice as part of a pipeline together with other packages.

See the vignette for an example showing how to construct gene_IDs and n_exons from a column of gene:exon IDs.

Exon microarray intensities should be log2-transformed, which is usually done during pre-processing of microarray data. (RNA-seq counts will be transformed automatically during the regsplice analysis pipeline; see runVoom.)

After creating a RegspliceData object, the wrapper function regsplice can be used to run the analysis pipeline with a single command. Alternatively, you can run the individual functions for each step in the pipeline, beginning with filterZeros (see vignette for details).

Value

Returns a RegspliceData object.

Fields

counts

Matrix of RNA-seq read counts or exon microarray intensities. Rows are exons, and columns are biological samples.

weights

(Optional) Matrix of observation-level weights. Rows are exons, and columns are biological samples. Created by the runVoom function.

rowData

DataFrame of row meta-data. This should contain two columns: gene_IDs and exon_IDs, which are created by the RegspliceData constructor function.

colData

DataFrame of column meta-data. This contains the experimental condition and (optionally) normalization factors for each biological sample. Normalization factors are created by the runVoom function.

Accessor functions

  • countsData(): Accesses the counts data matrix.

  • weightsData(): Accesses the (optional) weights data matrix.

  • rowData(): Accesses the DataFrame of row meta-data. This should contain two columns: gene_IDs and exon_IDs.

  • colData(): Accesses the DataFrame of column meta-data. This contains the experimental condition and (optionally) normalization factors for each biological sample.

Subsetting

Subsetting of RegspliceData objects is performed with square brackets, x[i, j], where x is the name of the object. The subsetting operations are designed to keep data and meta-data in sync.

For subsetting by rows, there are two possibilities:

  • Subsetting genes: To subset whole genes, provide a character vector of gene names to the argument i. The returned object will contain all rows corresponding to these genes. Row numbers should not be used when subsetting whole genes, since this risks potential errors due to selecting incorrect rows.

  • Subsetting exons: To subset individual exons, provide the corresponding row numbers to the argument i.

For subsetting by columns (biological samples), provide the corresponding column numbers to the argument j.

See Also

regsplice filterZeros

Examples

# ---------
# Example 1
# ---------

counts <- matrix(sample(100:200, 14 * 6, replace = TRUE), nrow = 14, ncol = 6)
gene_IDs <- paste0("gene", 1:5)
n_exons <- c(3, 2, 3, 1, 5)
condition <- rep(c(0, 1), each = 3)

rs_data <- RegspliceData(counts, gene_IDs, n_exons, condition)

rs_data
countsData(rs_data)
rowData(rs_data)
colData(rs_data)

rs_data[1, ]
rs_data[1, 1:3]

rs_data["gene1", ]
rs_data["gene1", 1:3]


# --------------------
# Example 2 (Vignette)
# --------------------

file_counts <- system.file("extdata/vignette_counts.txt", package = "regsplice")
data <- read.table(file_counts, header = TRUE, sep = "\t", stringsAsFactors = FALSE)
head(data)

counts <- data[, 2:7]
tbl_exons <- table(sapply(strsplit(data$exon, ":"), function(s) s[[1]]))
gene_IDs <- names(tbl_exons)
n_exons <- unname(tbl_exons)
condition <- rep(c("untreated", "treated"), each = 3)

rs_data <- RegspliceData(counts, gene_IDs, n_exons, condition)

rs_data
head(countsData(rs_data))
rowData(rs_data)
colData(rs_data)

rs_data[1, ]
rs_data[1, 1:3]

rs_data["ENSG00000000003", ]
rs_data["ENSG00000000003", 1:3]


lmweber/regsplice documentation built on March 19, 2024, 1:45 p.m.