removeDuplicateFeatures: Remove Duplicate Features or Samples from a Binary Matrix...

removeDuplicatesR Documentation

Remove Duplicate Features or Samples from a Binary Matrix Object

Description

The removeDuplicates function removes duplicate columns from a binaryMatrix object in the Mercator package.

Usage

removeDuplicates(object)
removeDuplicateFeatures(object)

Arguments

object

An object of class binaryMatrix.

Details

In some analyses, it may be desirable to remove duplicate features to collapse a group of identical, related events to a single feature, to prevent overweighting when clustering. Historically, this funciton was called removeDuplicateFeatures. That name is still retained for backwards compatibility, but it may be deprecated in future versions in favor of removeDuplicates. In the same way, for some clustering applications, it may be usedful to remove duplicate samples, or those that have an identical feature set.

Removal of duplicates is not required for performance of the binaryMatrix or Mercator objects and associated functions.

The history slot of the binaryMatrix object documents the removal of duplicates.

Future versions of this package may include functionality to store the identities of any duplicates that were removed.

Value

Returns an object of class binaryMatrix with duplicate columns removed.

Note

Transposing the binaryMatrix can allow the removeDuplicates function to be applied to both features and observations, if desired.

Features containing exclusively 0s or 1s may interfere with performance of removeDuplicates.

Author(s)

Kevin R. Coombes <krc@silicovore.com>, Caitlin E. Coombes

Examples

my.matrix <- matrix(rbinom(50*100, 1, 0.15), ncol=50)
my.matrix <- cbind(my.matrix, my.matrix[, 1:5]) # add duplicates
dimnames(my.matrix) <- list(paste("R", 1:100, sep=''),
                            paste("C", 1:55, sep=''))
my.binmat <- BinaryMatrix(my.matrix)
dim(my.binmat)
my.binmat <- removeDuplicates(my.binmat)
dim(my.binmat)

Mercator documentation built on April 27, 2024, 3:01 a.m.