dist_approx: Distance matrix computation respecting non-random missing...

Description Usage Arguments Details Value Methods (by class) See Also

View source: R/calculate_distance.R

Description

The method calculates the distances between each samples or proteins of the matrix X and returns a distance matrix and the corresponding uncertainty. Because of the missing values no exact distance can be calculated instead realistic values for the missing values are considered and the mean with the corresponding variance is calculated for each distance.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
dist_approx(X, params = NULL, by_sample = TRUE, blind = TRUE,
  mu_mis = NULL, var_mis = NULL)

## S4 method for signature 'SummarizedExperiment'
dist_approx(X, params = NULL,
  by_sample = TRUE, blind = TRUE, mu_mis = NULL, var_mis = NULL)

## S4 method for signature 'MSnSet'
dist_approx(X, params = NULL, by_sample = TRUE,
  blind = TRUE, mu_mis = NULL, var_mis = NULL)

Arguments

X

the numerical data where each column is one sample and each row is one protein. Missing values are coded as NA.

params

an object of class 'prodd_parameters' which for example is returned by the fit_parameters() function.

by_sample

boolean. Indicate if the distances between samples (columns) or proteins (rows) is calculated. Default: TRUE.

blind

boolean. If one provides the params argument infered by the find_parameters() function, the feature parameters contain information about the condition of each sample. This can be undesirable if one wants to infer unsupervised sample similarities for quality control. This is the most common use case for the dist_approx() function. Thus the function by default removes condition information to give unbiased distance estimates by internally transforming the params object using transform_parameters(params, rep(1, length(params$experimental_design))). Default: TRUE.

mu_mis

mean of the replacement values. Can be a single number, a vector with one number for each sample or a matrix with the same dimensions as X. Can be provided instead of the 'params' parameters.

var_mis

variance of the replacement values. Can be a single number, a vector with one number for each sample or a matrix with the same dimensions as X. Can be provided instead of the 'params' parameters.

Details

Usually the method is called with the data matrix 'X 'and the object that is returned by fit_parameters() or the result obtained by first calling transform_parameters() to remove the group information. The 'params' object must be of type 'prodd_parameters'.

If particular information are available, where the missing values would have been (ie. mean and variance for each missing value), they can instead of the params object be provided in form of two matrices (with the same dimensions as X) or individual values.

Unlike the stats::dist function which always calculates the distance between the rows of the matrix and one transposes X to find the distances between columns, this method uses the by_sample parameter. In 'X' (and correspondingly for 'mu_mis' and 'var_mis') the columns always correspond to the samples and the rows to the proteins. By default the distances are calculated between the samples, to calculate the distances between the proteins set by_sample=FALSE.

Value

a list with two elements:

mean

a distance matrix with the mean of the distance estimate

var

the corresponding uncertainty to each distance estimate

Methods (by class)

See Also

dist


const-ae/proDD documentation built on Jan. 14, 2020, 9:34 a.m.