partition: Paritions data by most inter-dependent positions

Description Usage Arguments Value Author(s) Examples

View source: R/DepLogoR.R

Description

Partitions data by the nucleotides at the most inter-dependent positions as measures by pairwise mutual information. Paritioning is performed recursively on the resulting subsets until i) the number of sequences in a partition is less then minElements, ii) the average pairwise dependency between the current position and numBestForSorting other positions with the largest mutual information value drops below threshold, or iii) maxNum recursive splits have already been performed. If splitting results in smaller partitions than minElements, these are added to the smallest partition with more than minElements sequences.

Usage

1
2
3
4
5
6
7
8
9
partition(
  data,
  minElements = 10,
  threshold = 0.1,
  numBestForSorting = 3,
  maxNum = 6,
  sortByWeights = NULL,
  partition.by = NULL
)

Arguments

data

the data as DLData object

minElements

the minimum number of elements to perform a further split

threshold

the threshold on the average mutual information value

numBestForSorting

the number of dependencies to other positions considered

maxNum

the maximum number of recursive splits

sortByWeights

if TRUE, partitions are ordered by their average weight value, if false by frequency of symbols at the partitioning position otherwise. If NULL, the $sortByWeights value of the DLData object is used

partition.by

specify fixed positions to partition by

Value

the partitions as list of DLData objects

Author(s)

Jan Grau <grau@informatik.uni-halle.de>

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# create DLData object
seqs <- read.table(system.file("extdata", "cjun.txt", package = "DepLogo"),
    stringsAsFactors = FALSE)
data <- DLData(sequences = seqs[, 1], weights = log1p(seqs[,2]) )

# partition data using default parameters
partitions <- partition(data)

# partition data using a threshold of 0.3 on the mutual 
# information value to the most dependent position, 
# sorting the resulting partitions by weight
partitions2 <- partition(data = data, threshold = 0.3, numBestForSorting = 1, sortByWeights = TRUE)

DepLogo documentation built on April 17, 2021, 1:07 a.m.