Linnorm-Hvar pipeline for highly variable gene discovery.

Share:

Description

This function first performs Linnorm transformation on the dataset. Then, it will perform highly variable gene discovery.

Usage

1
2
3
Linnorm.HVar(datamatrix, input = "Raw", method = "SD", spikein = NULL,
  showinfo = FALSE, perturbation = 10, minZeroPortion = 2/3,
  keepAll = FALSE, log.p = FALSE, sig.value = "p", sig = 0.05)

Arguments

datamatrix

The matrix or data frame that contains your dataset. Each row is a feature (or Gene) and each column is a sample (or replicate). Raw Counts, CPM, RPKM, FPKM or TPM are supported. Undefined values such as NA are not supported. It is not compatible with log transformed datasets. If a Linnorm transfored dataset is being used, please set the "input" argument into "Linnorm".

input

Character. "Raw" or "Linnorm". In case you have already transformed your dataset with Linnorm, set input into "Linnorm" so that you can input the Linnorm transformed dataset into the "datamatrix" argument. Defaults to "Raw".

method

Character. "SE" or "SD". Use Standard Error (SE) or Standard Deviation (SD) to calculate p values. Defaults to SD.

spikein

character vector. Row names of the spike-in genes in the datamatrix. If this is provided, test of significance will be performed against the spike in genes. Defaults to NULL.

showinfo

Logical. Show lambda value calculated. Defaults to FALSE.

perturbation

Integer >=2. To search for an optimal minimal deviation parameter (please see the article), Linnorm uses the iterated local search algorithm which perturbs away from the initial local minimum. The range of the area searched in each perturbation is exponentially increased as the area get further away from the initial local minimum, which is determined by their index. This range is calculated by 10 * (perturbation ^ index).

minZeroPortion

Double >=0, <= 1. For example, setting minZeroPortion as 0.5 will remove genes with more than half data values being zero in the calculation of normalizing parameter. Since this test is based on variance, which requires more non-zero values, it is suggested to set it to a larger value. Defaults to 2/3.

keepAll

Logical. After applying minZeroPortion filtering, should Linnorm keep all genes in the results? Defualts to FALSE.

log.p

Logical. Output p/q values in log scale. Defaults to FALSE.

sig.value

Character. "p" or "q". Use p or q value for highlighting significant genes. Defaults to "p".

sig

Double >0, <= 1. Significant level of p or q value for plotting. Defaults to 0.05.

Details

This function discovers highly variable gene in the dataset using Linnorm transformation.

Value

This function will output a list with the following objects:

  • Results: A matrix with the results.

  • plot: Mean vs Standard Deviation Plot which highlights significant genes.

  • Linnorm: Linnorm transformed and filtered data matrix.

The Results matrix has the following columns:

  • XPM: Average non-zero expression level in XPM. If input is raw coutns or CPM, this column is in CPM unit. If input is RPKM, FPKM or TPM, this column is in the TPM unit.

  • XPM.SD: Standard deviation of average non-zero expression.

  • Transformed.Avg.Exp: Average expression of non-zero Linnorm transformed data.

  • Transformed.SD: Standard deviation of non-zero Linnorm transformed data.

  • Normalized.Log2.SD.Fold.Change: Normalized log2 fold change of the gene's standard deviation.

  • p.value: p value of the statistical test.

  • q.value: q value/false discovery rate/adjusted p value of the statistical test.

Examples

1
2