shuffleMutations: shuffleMutations

Description Usage Arguments Examples

View source: R/shuffleMutations.R

Description

A function to shuffle the reference, alternative and surrounding nucleotides. Then it will use the identifyClusters and groupClusters functions and returns a summary of the frequency of found patterns.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
shuffleMutations(dataTable, chromHeader = "chrom",
  positionHeader = "start", refHeader = "ref", altHeader = "alt",
  contextHeader = "surrounding", sampleIdHeader = "sampleIDs",
  nBootstrap = 1000, maxDistance = 20000, linkPatterns = TRUE,
  reverseComplement = FALSE, searchPatterns = NULL,
  searchRefHeader = "ref", searchAltHeader = "alt",
  searchContextHeader = "surrounding", searchIdHeader = "process",
  searchDistanceHeader = "maxDistance", searchReverseComplement = TRUE,
  asTibble = TRUE, returnEachBootstrap = FALSE,
  searchClusterPatterns = TRUE, renameReverse = FALSE,
  no.cores = parallel::detectCores())

Arguments

dataTable

A table with the reference, alternative and surrounding nucleotides. The best data to use is the output of the identifyClusters where is.clustered is TRUE.

chromHeader

A string with the name of the column with the chromosome name. (So the data in the column needs to be notated as e.g. "chr2")

positionHeader

A string with the name of the column with the position of the mutation. (The data in the column needs to be nummeric.)

refHeader

A string with the column header of the reference nucleotide.

altHeader

A string with the column header of the alternative nucleotide.

contextHeader

A string with the column header of the context nucleotides.

sampleIdHeader

A string with the name of the column with the sample ID.

nBootstrap

A number with the amount of bootstraps there need to be executed.

maxDistance

A number with the maximum distance between DNA mutations that are defined as being in a cluster.

linkPatterns

A Boolean to tell if it's necessary to try and link the mutations to patterns. If FALSE then the search... parameters and the linkClustersOnly parameter are irrelevant. For more information see linkPatterns.

reverseComplement

A Boolean to tell if the ref, alt and context needed to be searched with the reverse complement. Irrelevant if searchReverseComplement = TRUE.

searchPatterns

A tibble with the known mutation patterns. The mutationPatterns is the default search table.

searchRefHeader

A string with the column name of the one with the reference nucleotide in the searchPatterns table.

searchAltHeader

A string with the column name of the one with the alternative nucleotide in the searchPatterns table.

searchContextHeader

A string with the column name of the one with the context nucleotide in the searchPatterns table.

searchIdHeader

A string with the column name of the one with the pattern IDs.

searchDistanceHeader

A string with the column name of the one with the maximum distance between clustered mutations. Not needed if the distance parameter is NULL. NA's within this column are allowed.

searchReverseComplement

A boolean to also search the patterns in the reverse complement of the searchPatterns tibble.

asTibble

A Boolean if the returned results needs to be a tibble. It will return a data.frame otherwise. Irrelevant if returnEachBootstrap == TRUE

returnEachBootstrap

A Boolean if the summaries per bootstrap are needed to be returned. If TRUE then the return value will be a list with a summary of each bootstrap.

searchClusterPatterns

A Boolean if it's needed to search to cluster patterns (e.g. GA > TT).

renameReverse

A Boolean if the id of the process needs to be renamed. This has the effect on the cMut functions that it will no longer treat the reverse complement and non reverse complement as the same. This parameter will irrelevant if searchReverseComplement is FALSE.

no.cores

A number with the amount of clusters that is allowed to use during shuffle. Default is maximum amount of cores present on the system.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
identResults <- identifyClusters(dataTable    = cMut::testDataSet,
                                 maxDistance  = 20000,
                                 linkPatterns = TRUE)
clusteredMutations <- identResults[identResults$is.clustered, ]

# If only the mutation patterns are needed searched:
shuffleResults <- shuffleMutations(dataTable             = clusteredMutations,
                                   searchClusterPatterns = FALSE,
                                   no.cores              = 2)

# If also the cluster patterns are needed to be added:
shuffleResults <- shuffleMutations(dataTable  = identResults[identResults$is.clustered,],
                                    no.cores   = 2)

AlexJanse/cMut documentation built on May 25, 2019, 4 a.m.