Linnorm.DataImput: Linnorm Data Imputation Function. (In development)

Description Usage Arguments Details Value Examples

View source: R/Linnorm.DataImput.R

Description

This function performs data imputation for (sc)RNA-seq expression data or large scale count data. It will treat every zero count in the dataset as missing data and replace them with predicted values.

Usage

1
2
3
Linnorm.DataImput(datamatrix, RowSamples = FALSE, showinfo = FALSE,
  MZP = 0.25, LC_F = "Auto", max_LC_F = 0.75, FG_Recov = 0.5,
  method = "euclidean", VarPortion = 0.75, ...)

Arguments

datamatrix

The matrix or data frame that contains your dataset. It is only compatible with log transformed datasets.

RowSamples

Logical. In the datamatrix, if each row is a sample and each row is a feature, set this to TRUE so that you don't need to transpose it. Defaults to FALSE.

showinfo

Logical. Show algorithm running information. Defaults to FALSE.

MZP

Double >=0, <= 1. Minimum non-Zero Portion Threshold for this function. Genes not satisfying this threshold will be removed. For exmaple, if set to 0.3, genes without at least 30 percent of the samples being non-zero will be removed. Defaults to 0.25.

LC_F

Double >= 0.01, <= 0.95 or Character "Auto". Filter this portion of the lowest expressing genes. It can be determined automatically by setting to "Auto". Defaults to "Auto".

max_LC_F

Double >=0, <= 0.95. When LC_F is set to auto, this is the maximum threshold that Linnorm would assign. Defaults to 0.75.

FG_Recov

Double >=0, <= 1. In the low count gene filtering algorithm, recover this portion of genes that are filtered. Defaults to 0.5.

method

Character. Method for calculating the distance matrix. This must be one of "euclidean", "maximum", "manhattan", "canberra", "binary", "pearson", "correlation", "spearman" or "kendall". Any unambiguous substring can be given. Defaults to "euclidean".

VarPortion

Double >0, <=0.95. Portion of the variance from PCA to be used for data imputation. Defaults to 0.5.

...

place holder for any new arguments.

Details

This function performs data imputation on the dataset. It first generates a distance matrix using principal components from PCA. Then, by default, using the distance matrix as weight, it predicts missing values from each gene using inverse euclidean distance weighted mean.

Value

This function returns a data matrix.

Examples

1
2
3
4
5
6
#Obtain example matrix:
data(Islam2011)
#Transformation:
Transformed <- Linnorm(Islam2011)
#Data imputation
DataImput <- Linnorm.DataImput(Transformed)

Linnorm documentation built on Nov. 8, 2020, 6:48 p.m.