View source: R/cophenetic_generator.R
cophenetic_generator | R Documentation |
cophenetic_generator
will run non-negative matrix factorization (NMF) to determine the cophenetic correlation coefficient for each rank of factorization in a desired range of ranks decided by the user. The cophenetic correlation coefficient can be helpful for the user in deciding what rank to use when running NMF. The raw cophenetic correlation coefficient value, the elbow method, or any other applicable approach can help determine a desirable rank for NMF. The higher the cophenetic correlation coefficient is, the more stable and reproducible the NMF results are. In the plot returned by this graph, the rank with the highest cophenetic correlation coefficient will be highlighted in red. If the input vector for rank_range
is continuous, the rank directly before the biggest drop in cophenetic correlation coefficient, before any positive slopes, will be highlighted in cyan. If these two points are the same, the point will be highlighted in magenta. In the extremely rare event of a tie in numerical values, the first index is selected. However, it is ultimately up to the user to decide what rank is best fit for NMF runs.
cophenetic_generator( data, rank_range = 2:20, nrun = 12, mvg = 1000, nmf_seed = 123456, cophenetic = TRUE, colors = TRUE, clv = 0, transformation = 0, blind = TRUE, ... )
data |
Gene expression target data, a matrix-like object. The rows should represent genes, and each row must have a unique row name. Each column should represent a different sample. |
rank_range |
Any numeric vector containing ranks of factorization to try (does not need to be continuous). Duplicates are removed, and the vector will be sorted in increasing order before use. All values should be positive and greater than 1. |
nrun |
The desired number of NMF runs. For simply determing the cohpenetic correlation coefficient for each rank, it is not entirely necessary to perform a high number of runs or as many runs as normal when running NMF. This function defaults to 12, but any number of runs can be used. |
mvg |
A numerical argument determining how many of the most variable genes to look at during the first steps of FaStaNMF. |
nmf_seed |
The desired seed to be used for NMF |
cophenetic |
A boolean argument determining whether the cophenetic correlation coefficient of the dataset should be used, or the number of genes that cluster stably at different rank values. |
colors |
A boolean argument determining whether or not the specified points in the documentation (maximum value, point preceding the largest drop) should be highlighted in color. If TRUE, the points will be highlighted. If false, no points will be highlighted. |
clv |
A numerical value |
transformation |
A numerical value that determines whether or not a log or VST transformation should be done on the original dataset. A value of 0 indicates no transformation, a value of 1 indicates a log transformation using log1p, a value of 2 indicates a VST transformation using varianceStabilizingTransformation If this argument is used, it should be "0", "1" or "2" only. Any other value will assume no transformation. For FaStaNMF, untransformed data should be log-transformed or VST-transformed. |
blind |
If a VST is to be done, this boolean value determines whether it is blind or not. |
A line graph that displays the cophenetic coefficients for the values in the range of ranks you selected. This function also plots it for you.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.