CrisprSet-class: CrisprSet class

Description Arguments Fields Methods Author(s) See Also Examples

Description

A ReferenceClass container for holding a set of narrowed alignments, each corresponding to the same target region. Individual samples are represented as CrisprRun objects. CrisprRun objects with no on-target reads are excluded. CrisprSet objects are constructed with readsToTarget or readsToTargets. For most use cases, a CrisprSet object should not be initialized directly.

Arguments

crispr.runs

A list of CrisprRun objects, typically representing individual samples within an experiment

reference

The reference sequence, must be the same length as the target region

target

The target location (GRanges). Variants will be counted over this region. Need not correspond to the guide sequence.

rc

Should the alignments be reverse complemented, i.e. displayed w.r.t the reverse strand? (default: FALSE)

names

A list of names for each of the samples, e.g. for displaying in plots. If not supplied, the names of the crispr.runs are used, which default to the filenames of the bam files if available (Default: NULL)

renumbered

Should the variants be renumbered using target.loc as the zero point? If TRUE, variants are described by the location of their 5'-most base with respect to the target.loc. A 3bp deletion starting 5bp 5' of the cut site would be labelled as -5:3D (Default: TRUE)

target.loc

The location of the Cas9 cut site with respect to the supplied target. (Or some other central location). Can be displayed on plots and used as the zero point for renumbering variants. For a target region with the PAM location from bases 21-23, the target.loc is base 17 (default: 17)

match.label

Label for sequences with no variants (default: "no variant")

mismatch.label

Label for sequences with only single nucleotide variants (default: "SNV")

bpparam

A BiocParallel parameter object specifying how many cores to use. The parallelisable step is calling SNVs. Parallelisation is by sample. (default: SerialParam, i.e. no parallelization)

verbose

If true, prints information about initialisation progress (default: TRUE)

Fields

crispr_runs

A list of CrisprRun objects, typically corresponding to samples of an experiment.

ref

The reference sequence for the target region, as a Biostrings::DNAString object

cigar_freqs

A matrix of counts for each variant

target

The target location, as a GRanges object

Methods

classifyCodingBySize(var_type, cutoff = 10)

Description: This is a naive classification of variants as frameshift or in-frame Coding indels are summed, and indels with sum divisible by 3 are considered frameshift. Note that this may not be correct for variants that span an intron-exon boundary Input paramters: var_type: A vector of var_type. Only variants with var_type == "coding" are considered. Intended to work with classifyVariantsByLoc cutoff: Variants are divided into those less than and greater than "cutoff" (Default: 10) Result: A character vector with a classification for each variant allele

classifyVariantsByLoc(txdb, add_chr = TRUE, verbose = TRUE, ...)

Description: Uses the VariantAnnotation package to look up the location of the variants. VariantAnnotation allows multiple classification tags per variant, this function returns a single tag. The following preference order is used: spliceSite > coding > intron > fiveUTR > threeUTR > promoter > intergenic

Input parameters: txdb: A BSgenome transcription database add_chr: Add "chr" to chromosome names to make compatible with UCSC (default: TRUE) verbose: Print progress (default: TRUE) ...: Filtering arguments for variantCounts

Return value: A vector of classification tags, matching the rownames of .self$cigar_freqs (the variant count table)

classifyVariantsByType(...)

Description: Classifies variants as insertions, deletions, or complex (combinations). In development Input parameters: ... Optional arguments to "variantCounts" for filtering variants before classification Return value: A named vector classifying variant alleles as insertions, deletions, etc

consensusAlleles(cig_freqs = .self$cigar_freqs, return_nms = TRUE, match.ops = c("M", "X", "="))

Description: Get variants by their cigar string, make the pairwise alignments for the consensus sequence for each variant allele

Input parameters: cig_freqs: A table of variant allele frequencies (by default: .self$cigar_freqs, but could also be filtered) return_nms: If true, return a list of sequences and labels (Default:FALSE) match.ops: CIGAR operations for 1-1 alignments

Return: A DNAStringSet of the consensus sequences for the specified alleles, or a list containing the consensus sequences and names for the labels if return_nms = TRUE

filterUniqueLowQual(min_count = 2, max_n = 0, verbose = TRUE)

Description: Deletes reads containing rare variant combinations and more than a minimum number of ambiguity characters within the target region. These are assumed to be alignment errors.

Input parameters: min_count: the number of times a variant combination must occur across all samples to keep (default: 2, i.e. a variant must occur at least twice in one or more samples to keep) max_n: maximum number of ambiguity ("N") bases a read with a rare variant combination may contain. (default: 0) verbose: If TRUE, print the number of sequences removed (default: TRUE)

filterVariants(cig_freqs = NULL, names = NULL, columns = NULL, include.chimeras = TRUE)

Description: Relabels specified variants in a table of variant allele counts as non-variant, e.g. variants known to exist in control samples. Accepts either a size, e.g. "1D", or a specific mutation, e.g. "-4:3D". For alleles that include one variant to be filtered and one other variant, the other variant will be retained. If SNVs are included, these will be removed entirely, but note that SNVs are only called in reads that do not contain an insertion/deletion variant

Input parameters: cig_freqs: A table of variant allele counts (Default: NULL, i.e. .self$cigar_freqs) names: Labels of variants alleles to remove (Default: NULL) columns: Indices or names of control samples. Remove all variants that occur in these columns. (Default: NULL) include.chimeras: Should chimeric reads be included? (Default: TRUE)

heatmapCigarFreqs(as.percent = TRUE, x.size = 8, y.size = 8, x.axis.title = NULL, x.angle = 90, min.freq = 0, min.count = 0, top.n = nrow(.self$cigar_freqs), type = c("counts", "proportions"), header = c("default", "counts", "efficiency"), order = NULL, alleles = NULL, create.plot = TRUE, ...)

Description: Internal method for CrispRVariants:plotFreqHeatmap, optionally filters the table of variants, then a table of variant counts, coloured by counts or proportions.

Input parameters: as.percent: Should colours represent the percentage of reads per sample (TRUE) or the actual counts (FALSE)? (Default: TRUE) x.size: Font size for x axis labels (Default: 8) y.size: Font size for y axis labels (Default: 8) x.axis.title: Title for x axis min.freq: Include only variants with frequency at least min.freq in at least one sample min.count: Include only variants with count at least min.count in at least one sample top.n: Include only the n most common variants type: Should labels show counts or proportions? (Default: counts) header: What should be displayed in the header of the heatmap. Default: total count for type = "counts" or proportion of reads shown in the matrix for type = "proportions". If "counts" is selected, total counts will be shown for both types. "efficiency" shows the mutation efficiency (calculated with default settings) order: Reorder the columns according to this order (Default: NULL) alleles: Names of alleles to include. Selection of alleles takes place after filtering (Default: NULL). exclude: Names of alleles to exclude (Default: NULL) create.plot: Should the plot be created (TRUE, default), or the data used in the plot returned. ...: Extra filtering or plotting options

Return value: A ggplot2 plot object. Call "print(obj)" to display

See also: CrispRVariants::plotFreqHeatmap

makePairwiseAlns(cig_freqs = .self$cigar_freqs, ...)

Description: Get variants by their cigar string, make the pairwise alignments for the consensus sequence for each variant allele

Input parameters: cig_freqs: A table of variant allele frequencies (by default: .self$cigar_freqs, but could also be filtered) ...: Extra arguments for CrispRVariants::seqsToAln, e.g. which symbol should be used for representing deleted bases

mutationEfficiency(snv = c("non_variant", "include", "exclude"), include.chimeras = TRUE, exclude.cols = NULL, group = NULL, filter.vars = NULL, filter.cols = NULL, count.alleles = FALSE, per.sample = TRUE, min.freq = 0)

Description: Calculates summary statistics for the mutation efficiency, i.e. the percentage of reads that contain a variant. Reads that do not contain and insertion or deletion, but do contain a single nucleotide variant (snv) can be considered as mutated, non-mutated, or not included in efficiency calculations as they are ambiguous. Note: mutationEfficiency does not treat partial alignments differently

Input parameters: snv: One of "include" (consider reads with mismatches to be mutated), "exclude" (do not include reads with snvs in efficiency calculations), and "non_variant" (consider reads with mismatches to be non-mutated). include.chimeras: Should chimeras be counted as variants? (Default: TRUE) exclude.cols: A list of column names to exclude from calculation, e.g. if one sample is a control (default: NULL, i.e. include all columns) group: A grouping variable. Efficiency will be calculated per group, instead of for individual. Cannot be used with exclude.cols. filter.vars: Variants that should not be counted as mutations. filter.cols: Column names to be considered controls. Variants occuring in a control sample will not be counted as mutations. count.alleles: If TRUE, also report statistics about the number of alleles per sample/per group. (Default: FALSE) per.sample: Return efficiencies for each sample (Default: TRUE) min.freq: Minimum frequency for counting alleles. Does not apply to calculating efficiency. To filter when calculating efficiency, first use "variantCounts". (Default: 0, i.e. no filtering) Return value: A vector of efficiency statistics per sample and overall, or a matrix if a group is supplied.

plotVariants(min.freq = 0, min.count = 0, top.n = nrow(.self$cigar_freqs), alleles = NULL, renumbered = .self$pars["renumbered"], add.other = TRUE, create.plot = TRUE, plot.regions = NULL, allow.partial = TRUE, ...)

Description: Internal method for CrispRVariants:plotAlignments, optionally filters the table of variants, then plots variants with respect to the reference sequence, collapsing insertions and displaying insertion sequences below the plot.

Input parameters: min.freq: i( in at least one sample min.count i (integer) include variants that occur at leas i times in at least one sample top.n: n (integer) Plot only the n most frequent variants (default: plot all) Note that if there are ties in variant ranks, top.n only includes ties with all members ranking <= top.n alleles: Alleles to include after filtering. Default NULL means use all alleles that pass filtering. renumbered: If TRUE, the x-axis is numbered with respect to the target (cut) site. If FALSE, x-axis shows genomic locations. (default: TRUE) add.other Add a blank row named "Other" for chimeric alignments, if there are any (Default: TRUE) create.plot Data is plotted if TRUE and returned without if FALSE. (Default: TRUE) plot.regions Subregion of the target to plot (Default: NULL) allow.partial Should partial alignments be allowed? (Default: TRUE) ... additional arguments for plotAlignments

Return value: A ggplot2 plot object. Call "print(obj)" to display

Author(s)

Helen Lindsay

See Also

readsToTarget and readsToTargets for initialising a CrisprSet, CrisprRun

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Load the metadata table
md_fname <- system.file("extdata", "gol_F1_metadata_small.txt", package = "CrispRVariants")
md <- read.table(md_fname, sep = "\t", stringsAsFactors = FALSE)

# Get bam filenames and their full paths
bam_fnames <- sapply(md$bam.filename, function(fn){
 system.file("extdata", fn, package = "CrispRVariants")})

reference <- Biostrings::DNAString("GGTCTCTCGCAGGATGTTGCTGG")
gd <- GenomicRanges::GRanges("18", IRanges::IRanges(4647377, 4647399), strand = "+")

crispr_set <- readsToTarget(bam_fnames, target = gd, reference = reference,
                           names = md$experiment.name, target.loc = 17)

HLindsay/CrispRVariants documentation built on May 28, 2019, 12:40 p.m.