README.md

contextendR

A toolbox for exploring kmers in mutation datasets

Installation

This package relies heavily on Bioconductor packages, and prior to installing contextendR you will need to install the following packages through the Bioconductor installer:

devtools is also required. Installing contextendR and all it's dependencies can be done through these commands:

install.packages(c('devtools', 'BiocManager'))
BiocManager::install(c('BiocGenerics', 'BSgenome', 'Biostrings', 'GenomeInfoDb', 'GenomicRanges', 'GenomicRanges', 'IRanges'))
devtools::install_git("https://github.com/lindberg-m/contextendR.git", build_vignettes = TRUE)

build_vignettes = TRUE is optional but recommended, it's only required for accessing the vignette. Note, however, that building the vignette requires BSgenome.Hsapiens.UCSC.hg19 to be installed.

Usage

After installing the package, the vignette can be consulted to get a detailed outline of the package. Issue the command vignette("contextendR") to view the vignette for this package.

Quick overview

The following functions may be of interest:

extend_positions

A function for extracting sequence contexts and sampling of genomic positions

count_kmers

A function for counting occurances of fixed-size kmers in a dataset

kmer_freq

A function for counting kmers as well as statistical inference on kmers being singificantly over- or under-represented for mutation types

kmer_random_forest

A function for exploring "kmer importance" in a dataset.

kmer_logistic_regression

A function for fitting regularized multinomial logistic regression models to mutation datasets. Use a trinucleotide "core" and coefficients for modulating kmers

kmer_position

A function for visualizing the mutations surrounding a certain kmer, and how mutation probabilities at these positions are affected by kmer precense.



lindberg-m/contextendR documentation built on Jan. 8, 2022, 3:16 a.m.