collapseMutantsBySimilarity: Collapse mutants by similarity

View source: R/collapseMutantsByAA.R

collapseMutantsBySimilarityR Documentation

Collapse mutants by similarity

Description

These functions can be used to collapse variants, either by similarity or according to a pre-defined grouping. The functions collapseMutants and collapseMutantsByAA assume that a grouping variable is available as a column in rowData(se) (collapseMutantsByAA is a convenience function for the case when this column is "mutantNameAA", and is provided for backwards compatibility). The collapseMutantsBySimilarity will generate the grouping variable based on user-provided thresholds on the sequence similarity (defined by the Hamming distance), and subsequently collapse based on the derived grouping.

Usage

collapseMutantsBySimilarity(
  se,
  assayName,
  scoreMethod = "rowSum",
  sequenceCol = "sequence",
  collapseMaxDist = 0,
  collapseMinScore = 0,
  collapseMinRatio = 0,
  verbose = TRUE
)

collapseMutantsByAA(se)

collapseMutants(se, nameCol)

Arguments

se

A SummarizedExperiment generated by summarizeExperiment

assayName

The name of the assay that will be used to calculate a "score" (typically derived from the read counts) for each variant.

scoreMethod

Character scalar giving the approach used to calculate ranking scores from the assay defined by assayName. Currently, this can be one of "rowSum" or "rowMean". All filtering criteria will be applied to these scores.

sequenceCol

Character scalar giving the name of the column in rowData(se) that contains the nucleotide sequence of the variants.

collapseMaxDist

Numeric scalar defining the tolerance for collapsing similar sequences. If the value is in [0, 1), it defines the maximal Hamming distance in terms of a fraction of sequence length: (round(collapseMaxDist * nchar(sequence))). A value greater or equal to 1 is rounded and directly used as the maximum allowed Hamming distance. Note that sequences can only be collapsed if they are all of the same length.

collapseMinScore

Numeric scalar, indicating the minimum score for the sequence to be considered for collapsing with similar sequences.

collapseMinRatio

Numeric scalar. During collapsing of similar sequences, a low-frequency sequence will be collapsed with a higher-frequency sequence only if the ratio between the high-frequency and the low-frequency scores is at least this high. The default value of 0 indicates that no such check is performed.

verbose

Logical, whether to print progress messages.

nameCol

A character scalar providing the column of rowData(se) that contains the amino acid mutant names (that will be the new row names).

Value

A SummarizedExperiment where counts have been aggregated by the mutated amino acid(s).

Author(s)

Charlotte Soneson, Michael Stadler

Examples

se <- readRDS(system.file("extdata", "GSE102901_cis_se.rds",
                          package = "mutscan"))[1:200, ]
## The rows of this object correspond to individual codon variants
dim(se)
head(rownames(se))

## Collapse by amino acid
sec <- collapseMutantsByAA(se)
## The rows of the collapsed object correspond to amino acid variants
dim(sec)
head(rownames(sec))
## The mutantName column contains the individual codon variants that were 
## collapsed
head(SummarizedExperiment::rowData(sec))

## Collapse similar sequences
sec2 <- collapseMutantsBySimilarity(
    se = se, assayName = "counts", scoreMethod = "rowSum",
    sequenceCol = "sequence", collapseMaxDist = 2,
    collapseMinScore = 0, collapseMinRatio = 0)
dim(sec2)
head(rownames(sec2))
head(SummarizedExperiment::rowData(sec2))
## collapsed count matrix
SummarizedExperiment::assay(sec2, "counts")


fmicompbio/mutscan documentation built on Jan. 10, 2025, 9:10 a.m.