hybridize: Pre-processing step for PCP_hybrid

View source: R/hybridize.R

hybridizeR Documentation

Pre-processing step for PCP_hybrid

Description

hybridize computes the necessary information for the PCP_hybrid method.

Usage

hybridize(D, r, limit)

Arguments

D

The data matrix to hybridize.

r

The rank of the data matrix.

limit

The percent of each column in D to put under the LOD. Ex: if limit = 0.25 then the first quartile of each column in D is put under the LOD. Avoid passing limit = 0 (see Warnings below).

Value

List containing: "M_hybrid", "is_safe", "below_lod_mat". See Methods below for further details.

"M_hybrid"

The hybridized matrix.

"is_safe"

A logical vector of length nrows(D). Entries are TRUE when the corresponding row in D is a "safe row", and FALSE when the corresponding row in D is "unsafe".

"below_lod_mat"

A binary matrix, where 1's signify the corresponding entry in D was below LOD, and 0's signify the corresponding entry was above LOD.

Methods

The main idea with PCP_hybrid is to make use of information on "safe" vs. "unsafe" rows in a data matrix. A safe row is defined as a row with at least r-many entries above the LOD. An unsafe row is a row with less than r-many entries above the LOD. When hybridizing a matrix, if an entry is below the LOD in a safe row, it is imputed as -1, whereas if it is below the LOD in an unsafe row, it is instead imputed as LOD/sqrt{2}.

Warnings

Do not pass limit = 0 to hybridize, as the underlying quantile function will result in some values put under the lod anyway (since passing 0 to quantile) results in the minimum value selected as the LOD.

Examples

data <- sim_data(sim_seed = 1, nrow = 10, ncol = 10, rank = 3, sigma=0, add_sparse = FALSE)
mat <- data$M
hybridize(mat, r = 3, limit = 0.25)

Columbia-PRIME/PCPhelpers documentation built on April 24, 2022, 7:57 p.m.