prokluster: Compute prokariotic clusters for the given matrices and...

Description Usage Arguments Value Examples

View source: R/ProKlust.R

Description

Obtains and filters clusters of the identity/similarity matrices, identifying MCE by configuring settable cut-off points for each of the multiple matrices entries. Returns relevant graph information.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
prokluster(
  files,
  cutoffs,
  nodesDictionary = NULL,
  nodesPreviousNames = NULL,
  nodesTranslatedNames = NULL,
  filterRemoveIsolated = FALSE,
  filterRemoveLargerComponent = FALSE,
  filterOnlyLargerComponent = FALSE,
  filterDifferentNamesConnected = FALSE,
  filterSameNamesNotConnected = FALSE
)

Arguments

files

Obligatory tabbed-delimited pairwise identity/similarity matrix input file(s). If it's a file name, it must be either a character vector or a vector of character vectors. If it's the second option, then the file name sequence must match the sequence of the cutoff list. It could also be the list/dataframe, or a list of dataframes/list. Be advised that vector of dataframes will have erroneous results, and that the list size must match the number of cutoffs.

cutoffs

Obligatory cutoff number for the given input file(s). Either a single number or a vector of numbers. If it's the second option, then the file name sequence must match the sequence of the cutoff list. Must be given special attention to format the cutoff with the given matrix.

nodesDictionary

Optional annotation table text file with a specific format: each line must be "(previous name)<Tab>(new name)<New Line>", where (previous name) and (new name) can be alphanumeric and some special characters.

nodesPreviousNames

Optional vector containing character vectors that represent the previous names (to be renamed) of the graph nodes. Must be of the same size as nodesTranslatedNames.

nodesTranslatedNames

Optional vector containing character vectors that represent the new names of the graph nodes. Must be of the same size as nodesPreviousNames

filterRemoveIsolated

Optional boolean parameter that allows the removal of isolated nodes (nodes that have no connection) of the graph. If more than one type of filter is chosen, the intersection of the filters (executed in the original graph) will be returned.

filterRemoveLargerComponent

Optional boolean parameter that allows the removal of the larger (with the most connections) component/group of the graph. If more than one type of filter is chosen, the intersection of the filters (executed in the original graph) will be returned.

filterOnlyLargerComponent

Optional boolean parameter that allows the preservation of the largest (with the most connections) component/group of the graph. If more than one type of filter is chosen, the intersection of the filters (executed in the original graph) will be returned.

filterDifferentNamesConnected

Optional boolean parameter that allows the preservation of components (complete graphs) containing more than one species name (binomial name). If more than one type of filter is chosen, the intersection of the filters (executed in the original graph) will be returned.

filterSameNamesNotConnected

Optional boolean parameter that allows the preservation of each nodes that contain the same species name (binomial name) but are not connected. If more than one type of filter is chosen, the intersection of the filters (executed in the original graph) will be returned.

Value

A list that holds relevant data of the clustering. Possible members of the list are described below.

graph: An igraph object graph. Could be a parameter of plotc() method to plot the desired cluster(s).

maxCliques: The largest subset of nodes in which each node is directly connected to every other node in the subset. An example would be all the possible species groups that could be delimited in the graph, which could result in groups having genomes in common.

components: Contains the isolated nodes or groups formed of complete graphs.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// Example 1.1
basicResult1.1 <- prokluster(files = "ANIb_percentage_identity.tab", cutoffs = 0.9)

// Example 1.2
percentage <- read.table(file = "ANIb_percentage_identity.tab", header = T, row.names = 1, sep = "\t")
basicResult1.2 <- prokluster(files = percentage, cutoffs = 0.9)

// Example 2.1
files <- c("ANIb_percentage_identity.tab", "ANIb_alignment_coverage.tab")
thresholds <- c(0.95, 0.70)
renamedResults1.1 <- prokluster(files = files, cutoffs = thresholds, nodesDictionary = "dictionary.tab", filterRemoveIsolated = TRUE)

// Example 2-2
coverage <- read.table(file = "ANIb_alignment_coverage.tab", header = T, row.names = 1, sep = "\t")
filesList <- list(percentage, coverage)
basicResult2.2 <- prokluster(files = filesList, cutoffs = thresholds)

// Example 3
renamedResults2 <- prokluster(files = files, cutoffs = thresholds, nodesDictionary = "dictionary.tab", filterDifferentNamesConnected = TRUE)

// Example 4
nodesNames <- read.table(file= "dictionary.tab", sep = "\t", header = F, stringsAsFactors=FALSE)
renamedResults3 <- prokluster(files = files, cutoffs = thresholds, nodesPreviousNames = nodesNames$V1, nodesTranslatedNames = nodesNames$V2, filterSameNamesNotConnected = T)

camilagazolla/ProKlust documentation built on May 22, 2021, 11:10 p.m.