missing_value_impute: Data_impute

Data_imputeR Documentation

Data_impute

Description

data clean process: detect and remove outlier sample and impute missing value. The process is following: 1. Remove some genes which the number of missing value larger than maxNAratio. 2. Outlier sample detect and remove these sample. 3. Repeat Steps 1-2 untile meet the iteration times or no outlier sample can be detected. 4. impute the missing value. The function also can only do gene filter or remove outlier or impute missing value.

Usage

Data_impute(data, inf = "inf", intensity = "LFQ", miss.value = NA,
            splNExt = TRUE, maxNAratio = 0.5,
            removeOutlier = TRUE,
            outlierdata = "intensity", iteration = NA, sdout = 2,
            distmethod = "manhattan", A.IAC = FALSE,
            dohclust = FALSE, treelabels = NA,
            plot = TRUE, filename = NULL,
            text.cex = 0.7, text.col = "red", text.pos = 1,
            text.labels = NA, abline.col = "red", abline.lwd = 2,
            impute = TRUE, verbose = 1, ...)

Arguments

data

MaxQconvert data or a list Vector which contain two data.frame:ID information and quantification data

inf

the data.frame name contain protein ID information

intensity

the data.frame name only contain quantification data

miss.value

the type of miss.value showed in quantificaiton data. The default value is NA. The miss.value usually can be NA or 0.

splNExt

a logical value whether extract sample name.(suited for MaxQuant quantification data)

maxNAratio

The maximum percent missing data allowed in any row (default 50%).For any rows with more than maxNAratio% missing will deleted.

removeOutlier

a logical value indicated whether remove outlier sample.

outlierdata

The value is deprecated. which data will be used to analysis outlier sample detect.This must be (an abbreviation of) one of the strings "intensity","relative_value","log2_value".

iteration

a numberic value indicating how many times it go through the outlier sample detect and remove loop.NA means do loops until no outlier sample.

sdout

a numberic value indicating the threshold to judge the outlier sample. The default 2 means 0.95 confidence intervals

distmethod

The distance measure to be used. This must be (an abbreviation of) one of the strings "manhattan","euclidean", "canberra","correlation"

A.IAC

a logical value indicated whether decreasing correlation variance.

dohclust

a logical value indicated whether doing hierarchical clustering and plot dendrograms.

treelabels

labels of dendrograms

plot

a logical value indicated whether plot numbersd scatter diagrams.

filename

the filename of plot. The number and plot type information will added automatically. The default value is NULL which means no file saving. all the plot will be saved to "plot" folder and saved in pdf format.

text.cex

outlier sample annotation text size(scatter diagrams parameters)

text.col

outlier sample annotation color(scatter diagrams parameters)

text.pos

outlier sample annotation position(scatter diagrams parameters)

text.labels

outlier sample annotation (scatter diagrams parameters)

abline.col

the threshold line color (scatter diagrams parameters)

abline.lwd

the threshold line width (scatter diagrams parameters)

impute

a logical value indicated whether do knn imputation.

verbose

integer level of verbosity. Zero means silent, 1 means have some Diagnostic Messages.

...

Other arguments.

Details

detect and remove outlier sample and impute missing value.

Value

a list of proteomic data.

inf

Portein information included protein IDs and other information.

intensity

Quantification informaton.

relative_value

intensity divided by geometric mean

log2_value

log2 of relative_value

Author(s)

Kefu Liu

Examples

data(Dforimpute)
data <- Data_impute(Dforimpute,distmethod="manhattan")

DDPNA documentation built on May 17, 2022, 5:05 p.m.