identifyClusters: identifyClusters

Description Usage Arguments Value See Also Examples

View source: R/identifyClusters.R

Description

A function that finds and annotate clusters in a genomic data tibble.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
identifyClusters(dataTable, maxDistance, chromHeader = "chrom",
  sampleIdHeader = "sampleIDs", positionHeader = "start",
  refHeader = "ref", altHeader = "alt",
  contextHeader = "surrounding", mutationSymbol = ".",
  linkPatterns = TRUE, reverseComplement = FALSE,
  searchPatterns = NULL, searchRefHeader = "ref",
  searchAltHeader = "alt", searchContextHeader = "surrounding",
  searchIdHeader = "process", searchDistanceHeader = "maxDistance",
  searchMutationSymbol = ".", searchReverseComplement = TRUE,
  linkClustersOnly = TRUE, renameReverse = FALSE, asTibble = TRUE)

Arguments

dataTable

A data.frame or tibble that contains at least chromosome name, sample ID and position information. The data cannot contain any NA. For an example use testDataSet.

maxDistance

A number with the maximum distance between DNA mutations that are defined as being in a cluster.

chromHeader

A string with the name of the column with the chromosome name. (So the data in the column needs to be notated as e.g. "chr2")

sampleIdHeader

A string with the name of the column with the sample ID.

positionHeader

A string with the name of the column with the position of the mutation. (The data in the column needs to be nummeric.)

refHeader

Contains the name of the column with the reference nucleotides.

altHeader

Contains the name of the column with the alternative nucleotides.

contextHeader

A string with the name of the column with the context. The data inside this column is e.g. "C.G" hereby stands the "." for the location of the mutation. What symbol is used to describe this location is arbitrary but be sure to adjust the mutationSymbol accordingly when searching for patterns. The contextHeader is irrelevant if linkPatterns is FALSE.

mutationSymbol

A string with the symbol that stands for the mutated nucleotide location in the context. (e.g. "." in "G.C")

linkPatterns

A Boolean to tell if it's necessary to try and link the mutations to patterns. If FALSE then the search... parameters and the linkClustersOnly parameter are irrelevant. For more information see linkPatterns.

reverseComplement

A Boolean to tell if the ref, alt and context needed to be searched with the reverse complement. Irrelevant if searchReverseComplement = TRUE.

searchPatterns

A tibble with the known mutation patterns. The mutationPatterns is the default search table.

searchRefHeader

A string with the column name of the one with the reference nucleotide in the searchPatterns table.

searchAltHeader

A string with the column name of the one with the alternative nucleotide in the searchPatterns table.

searchContextHeader

A string with the column name of the one with the context nucleotide in the searchPatterns table.

searchIdHeader

A string with the column name of the one with the pattern IDs.

searchDistanceHeader

A string with the column name of the one with the maximum distance between clustered mutations. Not needed if the distance parameter is NULL. NA's within this column are allowed.

searchMutationSymbol

A string with symbol that stands for the mutated nucleotide location in the column of the searchContextHeader. (e.g. "." in "G.C")

searchReverseComplement

A boolean to also search the patterns in the reverse complement of the searchPatterns tibble.

linkClustersOnly

A boolean to tell if only the clustered mutations are needed to be linked with the patterns in the searchPatterns table. When it is FALSE all the mutations will be used.

renameReverse

A Boolean if the id of the process needs to be renamed. This has the effect on the cMut functions that it will no longer treat the reverse complement and non reverse complement as the same. This parameter will irrelevant if searchReverseComplement is FALSE.

asTibble

A boolean to tell if the result table has to be a tibble. When it is FALSE it will return data.frame

Value

The tibble that was sent as an argument for this function with extra columns: clusterId, is.clustered and distance till nearest mutation below the maximum distance.

See Also

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# Example data set:
data <- testDataSet

# Example for just clustering:
results <- identifyClusters(dataTable   = data,
                            maxDistance = 20000,
                            linkPatterns = FALSE)

# Example for clustering and linking patterns with the default searchPattern table:
results <- identifyClusters(dataTable    = data,
                            maxDistance  = 20000,
                            linkPatterns = TRUE)

# For more information about the added columns, use:
cat(comment(results))

AlexJanse/cMut documentation built on May 25, 2019, 4 a.m.