BuildFeatureMatrix: Build a Feature matrix from the with speaq 2.0 processed data

View source: R/BuildFeatureMatrix.R

BuildFeatureMatrixR Documentation

Build a Feature matrix from the with speaq 2.0 processed data

Description

This function converts the grouped peak data to a matrix. The matrix has features (peaks groups) in the columns and the value of the peak for every sample in the rows.

Usage

BuildFeatureMatrix(
  Y.data,
  var = "peakValue",
  impute = "zero",
  imputation_val = NA,
  delete.below.threshold = FALSE,
  baselineThresh = 500,
  snrThres = 3,
  thresholds.pass = "any-to-pass"
)

Arguments

Y.data

The dataset after (at least) peak detection and grouping with speaq 2.0. The dataset after peak filling is recommended.

var

The variable to be used in the Featurematrix. This can be any of 'peakIndex', 'peakPPM', 'peakValue' (default), 'peakSNR', 'peakScale', or 'Sample'.

impute

What to impute when a certain peak is missing for a certain sample and feature combo. Options are "zero" (or "zeros", the default), "median" (imputation with feature median), "randomForest" (imputation with missForest function from package missForest) or kNN followed by a number indicating the amount of neighbours to use e.g. "kNN5" or "kNN10" (as per the method of Troyanskaya, 2001) or lasty "User_value" (this will allow the use of any value specified with the imputation_val argument e.g. the median of the raw spectra). Any other statement will produce NA's.

imputation_val

If the "User_value" imputation option is chosen this value will be used to impute the missing values.

delete.below.threshold

Whether to ignore peaks for which the 'var' variable has a value below 'baselineThresh' (default = FALSE).

baselineThresh

The threshold for the 'var' variable that peaks have to surpass to be included in the feature matrix.

snrThres

The threshold for the signal-to-noise ratio of a peak.

thresholds.pass

This variable lets users decide whether a peak has to pass all the thresholds (both snrThres and baselineThresh), or just one. (If the peak does not need to surpass any thresholds set 'delete.below.threshold' to FALSE).

Value

a matrix, data.matrix, with samples for rows and features for columns. The values in the matrix are those of the 'var' variable.

Author(s)

Charlie Beirnaert, charlie.beirnaert@uantwerpen.be

References

Olga Troyanskaya, Michael Cantor, Gavin Sherlock, Pat Brown, Trevor Hastie, Robert Tibshirani, David Botstein and Russ B. Altman, Missing value estimation methods for DNA microarrays BIOINFORMATICS Vol. 17 no. 6, 2001 Pages 520-525

Examples

subset <- GetWinedata.subset()
# to reduce the example time we only select spectra 1 & 2
subset.spectra = as.matrix(subset$Spectra)[1:2,] 
subset.ppm = as.numeric(subset$PPM)

test.peaks <- getWaveletPeaks(Y.spec=subset.spectra, 
                              X.ppm=subset.ppm,
                              nCPU = 1) # nCPU set to 2 for the vignette build

test.grouped <- PeakGrouper(Y.peaks = test.peaks)
                           
test.Features <- BuildFeatureMatrix(test.grouped)

        

speaq documentation built on May 23, 2022, 5:06 p.m.