View source: R/SignatureExtractionLib.R
SignatureExtraction | R Documentation |
Perform signature extraction, by applying NMF to the input matrix. Multiple NMF runs and bootstrapping is used for robustness, followed by clustering of the solutions. A range of number of signatures to be used is required.
SignatureExtraction(
cat,
outFilePath,
matrix_of_fixed_signatures = NULL,
blacklist = c(),
nrepeats = 10,
nboots = 20,
clusteringMethod = "MC",
completeLinkageFlag = FALSE,
useMaxMatching = TRUE,
filterBestOfEachBootstrap = TRUE,
filterBest_RTOL = 0.001,
filterBest_nmaxtokeep = 10,
nparallel = 1,
nsig = c(3:15),
mut_thr = 0,
type_of_extraction = "subs",
project = "extraction",
parallel = FALSE,
nmfmethod = "brunet",
removeDuplicatesInCatalogue = FALSE,
removeDuplicatesThreshold = 0.98,
normaliseCatalogue = FALSE,
plotCatalogue = FALSE,
plotResultsFromAllClusteringMethods = TRUE
)
cat |
matrix with samples as columns and channels as rows |
outFilePath |
path were the extraction output files should go. Remember to add "/" at the end of the path |
matrix_of_fixed_signatures |
matrix with known signatures as columns and channels as rows. Used for partial extraction with NNLM package, with Lee KLD (brunet) only. If NULL, NMF package is used instead and different nmf methods can be used. |
blacklist |
list of samples (column names) to ignore |
nrepeats |
how many runs for each bootstrap (if filterBestOfEachBootstrap=TRUE with default params, only at most 10 runs within 0.1 percent of best will be considered, so nrepeats should be at least 10) |
nboots |
how many bootstrapped catalogues to use |
clusteringMethod |
choose among "HC","PAM","MC", hierarchical clustering (HC), partitioning around the medoids (PAM) and matched clustering (MC) |
completeLinkageFlag |
if clusteringMethod="HC", use complete linkage instead of default average linkage |
useMaxMatching |
if clusteringMethod="MC", use the assignment problem algorithm (match with max similarity) instead of the stable matching algorithm (any stable match) |
filterBestOfEachBootstrap |
if TRUE only at most filterBest_nmaxtokeep of the nrepeats runs that are within filterBest_RTOL*best from the best are kept |
filterBest_RTOL |
realtive tolerace from best fit to consider a run as good as the best, RTOL=0.001 is recommended |
filterBest_nmaxtokeep |
max number of runs that should be kept that are within the relative tolerance from the best |
nparallel |
how many processing units to use |
nsig |
list of number of signatures to try |
mut_thr |
threshold of mutations to remove empty/almost empty rows and columns |
type_of_extraction |
choose among "subs","rearr","generic","dnv" |
project |
give a name to your project |
parallel |
set to TRUE to use parallel computation (Recommended) |
nmfmethod |
choose among "brunet","lee","nsNMF", this choice will be passed to the NMF::nmf function |
removeDuplicatesInCatalogue |
remove 0.99 cos sim similar samples |
normaliseCatalogue |
scale samples to sum to 1 |
plotCatalogue |
also plot the catalogue, this may crash the library if the catalogue is too big, should work up to ~300 samples |
plotResultsFromAllClusteringMethods |
if TRUE, all clustering methods are used and results are reported and plotted for all of them. If FALSE, only the requested clustering is reported |
result files will be available in the outFilePath directory
n_row <- 96
n_col <- 50
rnd_matrix <- round(matrix(runif(n_row*n_col,min = 0,max = 50),nrow = n_row,ncol = n_col))
colnames(rnd_matrix) <- paste0("C",1:n_col)
row.names(rnd_matrix) <- paste0("R",1:n_row)
SignatureExtraction(cat = rnd_matrix,
outFilePath = paste0("extraction_test_subs/"),
nrepeats = 10,
nboots = 2,
nparallel = 2,
nsig = 2:3,
mut_thr = 0,
type_of_extraction = "subs",
project = "test",
parallel = TRUE,
nmfmethod = "brunet")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.