Similarity Matrices

Description Usage Arguments Value Examples

View source: R/ProKlust.R

Obtains and filters clusters of the identity/similarity matrices, identifying MCE by configuring settable cut-off points for each of the multiple matrices entries. Returns relevant graph information.

prokluster(
  files,
  cutoffs,
  nodesDictionary = NULL,
  nodesPreviousNames = NULL,
  nodesTranslatedNames = NULL,
  filterRemoveIsolated = FALSE,
  filterRemoveLargerComponent = FALSE,
  filterOnlyLargerComponent = FALSE,
  filterDifferentNamesConnected = FALSE,
  filterSameNamesNotConnected = FALSE
)

`files`	Obligatory tabbed-delimited pairwise identity/similarity matrix input file(s). If it's a file name, it must be either a character vector or a vector of character vectors. If it's the second option, then the file name sequence must match the sequence of the cutoff list. It could also be the list/dataframe, or a list of dataframes/list. Be advised that vector of dataframes will have erroneous results, and that the list size must match the number of cutoffs.
`cutoffs`	Obligatory cutoff number for the given input file(s). Either a single number or a vector of numbers. If it's the second option, then the file name sequence must match the sequence of the cutoff list. Must be given special attention to format the cutoff with the given matrix.
`nodesDictionary`	Optional annotation table text file with a specific format: each line must be "(previous name)<Tab>(new name)<New Line>", where (previous name) and (new name) can be alphanumeric and some special characters.
`nodesPreviousNames`	Optional vector containing character vectors that represent the previous names (to be renamed) of the graph nodes. Must be of the same size as nodesTranslatedNames.
`nodesTranslatedNames`	Optional vector containing character vectors that represent the new names of the graph nodes. Must be of the same size as nodesPreviousNames
`filterRemoveIsolated`	Optional boolean parameter that allows the removal of isolated nodes (nodes that have no connection) of the graph. If more than one type of filter is chosen, the intersection of the filters (executed in the original graph) will be returned.
`filterRemoveLargerComponent`	Optional boolean parameter that allows the removal of the larger (with the most connections) component/group of the graph. If more than one type of filter is chosen, the intersection of the filters (executed in the original graph) will be returned.
`filterOnlyLargerComponent`	Optional boolean parameter that allows the preservation of the largest (with the most connections) component/group of the graph. If more than one type of filter is chosen, the intersection of the filters (executed in the original graph) will be returned.
`filterDifferentNamesConnected`	Optional boolean parameter that allows the preservation of components (complete graphs) containing more than one species name (binomial name). If more than one type of filter is chosen, the intersection of the filters (executed in the original graph) will be returned.
`filterSameNamesNotConnected`	Optional boolean parameter that allows the preservation of each nodes that contain the same species name (binomial name) but are not connected. If more than one type of filter is chosen, the intersection of the filters (executed in the original graph) will be returned.

A list that holds relevant data of the clustering. Possible members of the list are described below.

graph: An igraph object graph. Could be a parameter of plotc() method to plot the desired cluster(s).

maxCliques: The largest subset of nodes in which each node is directly connected to every other node in the subset. An example would be all the possible species groups that could be delimited in the graph, which could result in groups having genomes in common.

components: Contains the isolated nodes or groups formed of complete graphs.

// Example 1.1
basicResult1.1 <- prokluster(files = "ANIb_percentage_identity.tab", cutoffs = 0.9)

// Example 1.2
percentage <- read.table(file = "ANIb_percentage_identity.tab", header = T, row.names = 1, sep = "\t")
basicResult1.2 <- prokluster(files = percentage, cutoffs = 0.9)

// Example 2.1
files <- c("ANIb_percentage_identity.tab", "ANIb_alignment_coverage.tab")
thresholds <- c(0.95, 0.70)
renamedResults1.1 <- prokluster(files = files, cutoffs = thresholds, nodesDictionary = "dictionary.tab", filterRemoveIsolated = TRUE)

// Example 2-2
coverage <- read.table(file = "ANIb_alignment_coverage.tab", header = T, row.names = 1, sep = "\t")
filesList <- list(percentage, coverage)
basicResult2.2 <- prokluster(files = filesList, cutoffs = thresholds)

// Example 3
renamedResults2 <- prokluster(files = files, cutoffs = thresholds, nodesDictionary = "dictionary.tab", filterDifferentNamesConnected = TRUE)

// Example 4
nodesNames <- read.table(file= "dictionary.tab", sep = "\t", header = F, stringsAsFactors=FALSE)
renamedResults3 <- prokluster(files = files, cutoffs = thresholds, nodesPreviousNames = nodesNames$V1, nodesTranslatedNames = nodesNames$V2, filterSameNamesNotConnected = T)

camilagazolla/ProKlust documentation built on May 22, 2021, 11:10 p.m.

camilagazolla/ProKlust index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

camilagazolla/ProKlust
Prokariotic Clustering based on Genome and Sequence Identity/Similarity Matrices

prokluster: Compute prokariotic clusters for the given matrices and...
In camilagazolla/ProKlust: Prokariotic Clustering based on Genome and Sequence Identity/Similarity Matrices

Description

Usage

Arguments

Value

Examples

Related to prokluster in camilagazolla/ProKlust...

R Package Documentation

Browse R Packages

We want your feedback!

camilagazolla/ProKlust Prokariotic Clustering based on Genome and Sequence Identity/Similarity Matrices

prokluster: Compute prokariotic clusters for the given matrices and... In camilagazolla/ProKlust: Prokariotic Clustering based on Genome and Sequence Identity/Similarity Matrices

Description

Usage

Arguments

Value

Examples

Related to prokluster in camilagazolla/ProKlust...

R Package Documentation

Browse R Packages

We want your feedback!

camilagazolla/ProKlust
Prokariotic Clustering based on Genome and Sequence Identity/Similarity Matrices

prokluster: Compute prokariotic clusters for the given matrices and...
In camilagazolla/ProKlust: Prokariotic Clustering based on Genome and Sequence Identity/Similarity Matrices