cophenetic_generator: Cophenetic Correlation Coefficient Plot Generator
In rmoffitt/aged: Automatic Gene Expression Deconvolution Using NMF

View source: R/cophenetic_generator.R

cophenetic_generator

R Documentation

Cophenetic Correlation Coefficient Plot Generator

Description

cophenetic_generator will run non-negative matrix factorization (NMF) to determine the cophenetic correlation coefficient for each rank of factorization in a desired range of ranks decided by the user. The cophenetic correlation coefficient can be helpful for the user in deciding what rank to use when running NMF. The raw cophenetic correlation coefficient value, the elbow method, or any other applicable approach can help determine a desirable rank for NMF. The higher the cophenetic correlation coefficient is, the more stable and reproducible the NMF results are. In the plot returned by this graph, the rank with the highest cophenetic correlation coefficient will be highlighted in red. If the input vector for rank_range is continuous, the rank directly before the biggest drop in cophenetic correlation coefficient, before any positive slopes, will be highlighted in cyan. If these two points are the same, the point will be highlighted in magenta. In the extremely rare event of a tie in numerical values, the first index is selected. However, it is ultimately up to the user to decide what rank is best fit for NMF runs.

Usage

cophenetic_generator(
  data,
  rank_range = 2:20,
  nrun = 12,
  mvg = 1000,
  nmf_seed = 123456,
  cophenetic = TRUE,
  colors = TRUE,
  clv = 0,
  transformation = 0,
  blind = TRUE,
  ...
)

Arguments

`data`	Gene expression target data, a matrix-like object. The rows should represent genes, and each row must have a unique row name. Each column should represent a different sample.
`rank_range`	Any numeric vector containing ranks of factorization to try (does not need to be continuous). Duplicates are removed, and the vector will be sorted in increasing order before use. All values should be positive and greater than 1.
`nrun`	The desired number of NMF runs. For simply determing the cohpenetic correlation coefficient for each rank, it is not entirely necessary to perform a high number of runs or as many runs as normal when running NMF. This function defaults to 12, but any number of runs can be used.
`mvg`	A numerical argument determining how many of the most variable genes to look at during the first steps of FaStaNMF.
`nmf_seed`	The desired seed to be used for NMF
`cophenetic`	A boolean argument determining whether the cophenetic correlation coefficient of the dataset should be used, or the number of genes that cluster stably at different rank values.
`colors`	A boolean argument determining whether or not the specified points in the documentation (maximum value, point preceding the largest drop) should be highlighted in color. If TRUE, the points will be highlighted. If false, no points will be highlighted.
`clv`	A numerical value `x` that reduces the dataset by removing genes with variance < `x` across all samples. Our recommended value is to set this parameter to 1 if genes expression low variance across samples is desired. These genes will not be considered at all for the deconvolution. This is done before any type of transformation or other reduction is performed.
`transformation`	A numerical value that determines whether or not a log or VST transformation should be done on the original dataset. A value of 0 indicates no transformation, a value of 1 indicates a log transformation using log1p, a value of 2 indicates a VST transformation using varianceStabilizingTransformation If this argument is used, it should be "0", "1" or "2" only. Any other value will assume no transformation. For FaStaNMF, untransformed data should be log-transformed or VST-transformed.
`blind`	If a VST is to be done, this boolean value determines whether it is blind or not.