summariseGermlineDistance: Function to summarise distance-from-germline by clones

View source: R/germlineDistances.R

summariseGermlineDistanceR Documentation

Function to summarise distance-from-germline by clones

Description

The function first summarises for each clone (optionally with other metadata columns present in the input distFromGermline data frame) the median distance-from-germline of all sequences in the clone. It then used this median distance to order the clones (from lowest - i.e. fewest mutations - to highest) and treat this as an 'expected' order of the clone in terms of mutational level, and calculate for each clone its 'actual' percentile of its median distance over all other clones. The discrepancy between the 'actual' and 'expected' percentiles returned by this function can subsequently be used to plot curves showing overall mutational level of a repertoire (see Example), or to seek for a quantification using the Germline Likeness metric (see ?getGermlineLikeness).

Usage

summariseGermlineDistance(
  distFromGermline,
  dist_column = "dist",
  cloneID_column = "CloneID",
  summarise_variables
)

Arguments

distFromGermline

data.frame, output of getGermlineDistance and annotated with extra columns detailing metadata of samples.

dist_column

Character, column name in distFromGermline which holds the distance-from-germline metrics.

cloneID_column

Character, column name in distFromGermline which holds the clone ID to which the given sequence belongs.

summarise_variables

Character vector, names of columns in distFromGermline used to partition the sequences into subsets for which summary is sought

Value

A data.frame with each row corresponding to one clone, containing, in addition to metadata columns given in summarise_variables, the following columns:

dist_median

Numeric, between 0 to 1, the percentile of the actual median distance-from-germline of the given clone.

clone_order

Numeric, between 0 to 1, the percentile of the given clone in the distribution where all clones are ordered by their median distance.

Examples

## Not run: 
# We have included in the package the pre-computed germline distances of 
# sequences in the 'input' data frame, here use this as an example
distFromGermline <- system.file( "extdata/input_GermlineDistances.csv", package = "BrepPhylo")
distFromGermline <- read.csv(distFromGermline)

# this summarise for each clone the median germline distance
# and place each clone in the distribution of this median-distance over all clones
germlineDistance_summary <- summariseGermlineDistance( 
  distFromGermline, dist_column = "distFromGermline", 
  cloneID_column = "CloneID", 
  summarise_variables = c( "PatientID", "CloneID" ) 
)

## End(Not run)

Fraternalilab/BrepPhylo documentation built on Jan. 3, 2025, 10:03 a.m.