imputationTest: Test for the various methods
In AurelieGuilbault/VIQCing: Data Processing For Metabolomic Data

Description Usage Arguments Details Value See Also Examples

View source: R/imputation.R

Test imputation accuracy for the given dataset with different method options. Compute the NRMSE values for the desired number of test. If asked, produces an output <filename>_Accuracy.txt with the columns

"Method";
"missing_proportion";
"transformation";
and "NRMSE"

imputationTest(input, output = NULL, k = 2, method = "knn",
  npcs = 3, sigma = 0.1, nbTest = 10, nTree = 30,
  na.string = "NA", missing = 0.05, missingType = "MCAR",
  transformation = "None", sampleStart = 3)

`input`	file containing the test dataset; Should contain: all the columns sample from <sampleStart> to the end of the file; *MUST BE COMPLETE (no NA values) for accurate results
`output`	default NULL, name of the test results file if not NULL
`k`	default 2, the k used for the knn imputation;
`method`	default "knn", the chosen method for replacing the missing values. Can be "knn", "RF", "QRILC", "SVD", "mean", "median", "HM" or "0". See Details.
`npcs`	default 3, npcs for SVD method;
`sigma`	default 0.1, tune sigma parameter for QRILC method;
`nTree`	default 30, number of tree for the RF method;
`na.string`	default "NA", string to consider as NA in the dataset;
`missing`	default 0.05, proportion of missing values;
`missingType,`	default "MCAR" (missing completely at random), can be "MNAR" (not at ramdom), will target the values under the median;
`transformation`	default "None", can be "scale" or "log";
`sampleStart`	default 3, 1st column of the actual data;
`nTest,`	default 10, number of test to loop;
`compound`	default NULL, position of the compound column if named otherwise;
`metabolite`	default NULL, position of the metabolite column if named otherwise;

Will compute de NRMSE (Normalized Root Mean Squared Error) for an imputation test. Available methods:

"knn": From the impute package, use the k nearest neighboors to impute the values;
"RF": From the missForest package, use RandomForest algorithm to impute the values;
"QRILC": From the imputeLCMD package, use Quantile regression to impute the values;
"SVD": From the pcaMethods package, use SVDimpute algorithm as proposed by Troyanskaya et al, 2001. to impute the values;
"mean","median", ""median", "0", "HM": simple value replacement, either by the mean, median, 0 of Half minimum of the row;

resDf, the result NRMSE dataframe.

impute packagehttps://www.rdocumentation.org/packages/impute
missForest package https://www.rdocumentation.org/packages/missForest
imputeLCMD package https://www.rdocumentation.org/packages/imputeLCMD
pcaMethods package https://www.rdocumentation.org/packages/pcaMethods

for a dataset with the following header ; Compound, m/z, Metabolite, RT, Sample #1, ...
imputationTest("dummySet.tsv", method="knn", transformation="log", sampleStart=5)

for a dataset with the following header ; compound, m/z, metabolite, RT, Sample #1, ...
imputationTest("dummySet.tsv", method="knn", transformation="log", metabolite = 3, compound = 1, sampleStart=5)