determine_regional_similarity: Determine regional mutation pattern similarity
In UMCUGenetics/MutationalPatterns: Comprehensive genome-wide analysis of mutational processes

View source: R/determine_regional_similarity.R

determine_regional_similarity

R Documentation

Determine regional mutation pattern similarity

Description

Calculate the cosine similarities between the global mutation profile and the mutation profile of smaller genomic windows, using a sliding window approach. Regions with a very different mutation profile can be identified in this way. This function generally requires many mutations (~100,000) to work properly.

Usage

determine_regional_similarity(
  vcf,
  ref_genome,
  chromosomes,
  window_size = 100,
  stepsize = 25,
  extension = 1,
  oligo_correction = FALSE,
  exclude_self_mut_mat = TRUE,
  max_window_size_gen = 2e+07,
  verbose = FALSE
)

Arguments

`vcf`	GRanges object
`ref_genome`	BSgenome reference genome object
`chromosomes`	Vector of chromosome/contig names of the reference genome to be plotted.
`window_size`	The number of mutations in a window. (Default: 100)
`stepsize`	The number of mutations that a window slides in each step. (Default: 25)
`extension`	The number of bases, that's extracted upstream and downstream of the base substitutions, to create the mutation matrices. (Default: 1).
`oligo_correction`	Boolean describing whether oligonucleotide frequency correction should be applied. (Default: FALSE)
`exclude_self_mut_mat`	Boolean describing whether the mutations in a window should be subtracted from the global mutation matrix. (Default: TRUE)
`max_window_size_gen`	The maximum size of a window before it is removed. (Default: 20,000,000)
`verbose`	Boolean determining the verbosity of the function. (Default: FALSE)

Details

First a global mutation matrix is calculated using all mutations. Next, a sliding window is used. This means that we create a window containing the first x mutations. The cosine similarity, between the mutation profiles of this window and the global mutation matrix, is then calculated. The window then slides y mutations to the right and the cosine similarity is again calculated. This process is repeated until the final mutation on a chromosome is reached. This process is performed separately per chromosome. Windows that span a too large region of the genome are removed, because they are unlikely to contain biologically relevant information.

The number of mutations that the window slides to the right in each step is called the stepsize. The best stepsize depends on the window size. In general, we recommend setting the stepsize between 25 window size.

The analysis can be performed for trinucleotides contexts, for a larger context, or for just the base substitutions. A smaller context might miss detailed differences in mutation profiles, but is also less noisy. We recommend using a smaller extension when analyzing small datasets.

It's possible to correct for the oligonucleotide frequency of the windows. This is done by calculating the cosine similarity of the oligonucleotide frequency between each window and the genome. The cosine similarity of the mutation profiles is then divided by the oligonucleotide similarity. This ensures that regions with an abnormal oligonucleotide frequency don't show up as having a very different profile. The oligonucleotide frequency correction slows down the function, so we advise the user to keep it off for exploratory analyses and to only turn it on to validate interesting results.

By default the mutations in a window are subtracted from the global mutation matrix, before calculating the cosine similarity. This increases sensitivity, but could also decrease specificity. This subtraction can be turned of with the 'exclude_self_mut_mat' argument.

Value

A "region_cossim" object containing both the cosine similarities and the settings used in this analysis.

Examples


## See the 'read_vcfs_as_granges()' example for how we obtained the
## following data:
grl <- readRDS(system.file("states/read_vcfs_as_granges_output.rds",
  package = "MutationalPatterns"
))

## We pool all the variants together, because the function doesn't work well
## with a limited number of mutations. Still, in practice we recommend to use
## more mutations that in this example.
gr = unlist(grl)

## Specifiy the chromosomes of interest.
chromosomes <- names(genome(gr)[1:3])

 ## Load the corresponding reference genome.
ref_genome <- "BSgenome.Hsapiens.UCSC.hg19"
library(ref_genome, character.only = TRUE)

## Determine the regional similarities. Here we use a small window size to make the function work.
## In practice, we recommend a larger window size.
regional_sims = determine_regional_similarity(gr,
  ref_genome, 
  chromosomes,
  window_size = 40,
  stepsize = 10,
  max_window_size_gen = 40000000
)

## Here we use an extensiof of 0 to reduce noise.
## We also turned verbosity on, so you can see at what step the function is.
## This can be useful on large datasets.
regional_sims_0_extension = determine_regional_similarity(gr,
  ref_genome, 
  chromosomes,
  window_size = 40,
  stepsize = 10,
  extension = 0,
  max_window_size_gen = 40000000,
  verbose = TRUE
)

UMCUGenetics/MutationalPatterns documentation built on Nov. 24, 2022, 4:31 a.m.

UMCUGenetics/MutationalPatterns index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

UMCUGenetics/MutationalPatterns
Comprehensive genome-wide analysis of mutational processes

determine_regional_similarity: Determine regional mutation pattern similarity
In UMCUGenetics/MutationalPatterns: Comprehensive genome-wide analysis of mutational processes

Determine regional mutation pattern similarity

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to determine_regional_similarity in UMCUGenetics/MutationalPatterns...

R Package Documentation

Browse R Packages

We want your feedback!

UMCUGenetics/MutationalPatterns Comprehensive genome-wide analysis of mutational processes

determine_regional_similarity: Determine regional mutation pattern similarity In UMCUGenetics/MutationalPatterns: Comprehensive genome-wide analysis of mutational processes

Determine regional mutation pattern similarity

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to determine_regional_similarity in UMCUGenetics/MutationalPatterns...

R Package Documentation

Browse R Packages

We want your feedback!

UMCUGenetics/MutationalPatterns
Comprehensive genome-wide analysis of mutational processes

determine_regional_similarity: Determine regional mutation pattern similarity
In UMCUGenetics/MutationalPatterns: Comprehensive genome-wide analysis of mutational processes