collapseMutantsBySimilarity: Collapse mutants by similarity
In fmicompbio/mutscan: Preprocessing and Analysis of Deep Mutational Scanning Data

collapseMutantsBySimilarity

R Documentation

Collapse mutants by similarity

Description

These functions can be used to collapse variants, either by similarity or according to a pre-defined grouping. The functions collapseMutants and collapseMutantsByAA assume that a grouping variable is available as a column in rowData(se) (collapseMutantsByAA is a convenience function for the case when this column is "mutantNameAA", and is provided for backwards compatibility). The collapseMutantsBySimilarity will generate the grouping variable based on user-provided thresholds on the sequence similarity (defined by the Hamming distance), and subsequently collapse based on the derived grouping.

Usage

collapseMutantsBySimilarity(
  se,
  assayName,
  scoreMethod = "rowSum",
  sequenceCol = "sequence",
  collapseMaxDist = 0,
  collapseMinScore = 0,
  collapseMinRatio = 0,
  verbose = TRUE
)

collapseMutantsByAA(se)

collapseMutants(se, nameCol)

Arguments

`se`	A `SummarizedExperiment` generated by `summarizeExperiment`
`assayName`	The name of the assay that will be used to calculate a "score" (typically derived from the read counts) for each variant.
`scoreMethod`	Character scalar giving the approach used to calculate ranking scores from the assay defined by `assayName`. Currently, this can be one of `"rowSum"` or `"rowMean"`. All filtering criteria will be applied to these scores.
`sequenceCol`	Character scalar giving the name of the column in `rowData(se)` that contains the nucleotide sequence of the variants.
`collapseMaxDist`	Numeric scalar defining the tolerance for collapsing similar sequences. If the value is in [0, 1), it defines the maximal Hamming distance in terms of a fraction of sequence length: (`round(collapseMaxDist * nchar(sequence))`). A value greater or equal to 1 is rounded and directly used as the maximum allowed Hamming distance. Note that sequences can only be collapsed if they are all of the same length.
`collapseMinScore`	Numeric scalar, indicating the minimum score for the sequence to be considered for collapsing with similar sequences.
`collapseMinRatio`	Numeric scalar. During collapsing of similar sequences, a low-frequency sequence will be collapsed with a higher-frequency sequence only if the ratio between the high-frequency and the low-frequency scores is at least this high. The default value of 0 indicates that no such check is performed.
`verbose`	Logical, whether to print progress messages.
`nameCol`	A character scalar providing the column of `rowData(se)` that contains the amino acid mutant names (that will be the new row names).

Value

A SummarizedExperiment where counts have been aggregated by the mutated amino acid(s).

Author(s)

Charlotte Soneson, Michael Stadler

Examples

se <- readRDS(system.file("extdata", "GSE102901_cis_se.rds",
                          package = "mutscan"))[1:200, ]
## The rows of this object correspond to individual codon variants
dim(se)
head(rownames(se))

## Collapse by amino acid
sec <- collapseMutantsByAA(se)
## The rows of the collapsed object correspond to amino acid variants
dim(sec)
head(rownames(sec))
## The mutantName column contains the individual codon variants that were 
## collapsed
head(SummarizedExperiment::rowData(sec))

## Collapse similar sequences
sec2 <- collapseMutantsBySimilarity(
    se = se, assayName = "counts", scoreMethod = "rowSum",
    sequenceCol = "sequence", collapseMaxDist = 2,
    collapseMinScore = 0, collapseMinRatio = 0)
dim(sec2)
head(rownames(sec2))
head(SummarizedExperiment::rowData(sec2))
## collapsed count matrix
SummarizedExperiment::assay(sec2, "counts")

fmicompbio/mutscan documentation built on Feb. 22, 2025, 11:47 a.m.