View source: R/fit_to_signatures_strict.R
| fit_to_signatures_strict | R Documentation | 
Refitting signatures with this function suffers less from overfitting. The strictness of the refitting is dependent on 'max_delta'. A downside of this method is that it might increase signature misattribution. Different signatures might be attributed to similar samples. You can use 'fit_to_signatures_bootstrapped()', to see if this is happening. Using less signatures for the refitting will decrease this issue. Fitting less strictly will also decrease this issue.
fit_to_signatures_strict(
  mut_matrix,
  signatures,
  max_delta = 0.004,
  method = c("backwards", "best_subset")
)
| mut_matrix | Mutation count matrix (dimensions: x mutation types X n samples) | 
| signatures | Signature matrix (dimensions: x mutation types X n signatures) | 
| max_delta | The maximum difference in original vs reconstructed cosine similarity between two iterations. | 
| method | The method used to select signatures. | 
Find a linear non-negative combination of mutation signatures that reconstructs the mutation matrix. Signature selection (feature selection) is done to reduce overfitting. This can be done via either a 'backwards' (default) or 'best_subset' method. The 'backwards' method starts by achieving an optimal reconstruction via 'fit_to_signatures'. The signature with the lowest contribution is then removed and refitting is repeated. This is done in an iterative fashion. Each time the cosine similarity between the original and reconstructed profile is calculated. The 'best_subset' method also starts by achieving an optimal reconstruction via 'fit_to_signatures'. Signature refitting is then repeated for each combination of n-1 signatures, where n is the number of signatures in the signature matrix. The cosine similarity between the original and reconstructed profile is calculated for each combination. The combination with the highest cosine similarity is then chosen. This is done in an iterative fashion for n-2, n-3, ect. With both methods, iterations are stopped when the difference between two iterations becomes more than 'max_delta'. The second-last set of signatures is then used for a final refit.
The 'best_subset' method can result in more accurate results than the 'backwards' method, however it becomes very slow when a large amount of signatures are used for refitting. We recommend only using the 'best_subset' method when fitting a maximum of 10-15 signatures. When using the 'best_subset' method a lower 'max_delta' should be used, as the expected differences in cosine similarity are reduced.
A list containing a fit_res object, similar to 'fit_to_signatures' and a list of ggplot graphs that for each sample shows in what order the signatures were removed and how this affected the cosine similarity.
mut_matrix,
fit_to_signatures,
fit_to_signatures_bootstrapped
## See the 'mut_matrix()' example for how we obtained the mutation matrix:
mut_mat <- readRDS(system.file("states/mut_mat_data.rds",
  package = "MutationalPatterns"
))
## Get signatures
signatures <- get_known_signatures()
## Fit to signatures strict
strict_refit <- fit_to_signatures_strict(mut_mat, signatures, max_delta = 0.004)
## fit_res similar to 'fit_to_signatures()'
fit_res <- strict_refit$fit_res
## list of ggplots that shows how the cosine similarity was reduced during the iterations
fig_l <- strict_refit$sim_decay_fig
## Fit to signatures with the best_subset method
## This can be more accurate than the standard backwards method, 
## but can only be used with a limited amount of signatures.
## Here we use only 5 signatures to reduce the runtime. 
## In practice up to 10-15 signatures could be used.
best_subset_refit <- fit_to_signatures_strict(mut_mat, 
   signatures[,1:5], 
   max_delta = 0.002, 
   method = "best_subset"
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.