Description Usage Arguments Value References See Also Examples
Bicluster data with non-random missing values
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | biclustermd(
data,
row_clusters = floor(sqrt(nrow(data))),
col_clusters = floor(sqrt(ncol(data))),
miss_val = mean(data, na.rm = TRUE),
miss_val_sd = 1,
similarity = "Rand",
row_min_num = floor(nrow(data)/row_clusters),
col_min_num = floor(ncol(data)/col_clusters),
row_num_to_move = 1,
col_num_to_move = 1,
row_shuffles = 1,
col_shuffles = 1,
max.iter = 100,
verbose = FALSE
)
|
data |
Dataset to bicluster. Must to be a data matrix with only numbers and missing values in the data set. It should have row names and column names. |
row_clusters |
The number of clusters to partition the rows into. The
default is |
col_clusters |
The number of clusters to partition the columns into. The
default is |
miss_val |
Value or function to put in empty cells of the prototype matrix.
If a value, a random normal variable with sd = |
miss_val_sd |
Standard deviation of the normal distribution |
similarity |
The metric used to compare two successive clusterings. Can be "Rand" (default), "HA" for the Hubert and Arabie adjusted Rand index or "Jaccard". See RRand for details. |
row_min_num |
Minimum row prototype size in order to be eligible to be
chosen when filling an empty row prototype. Default is |
col_min_num |
Minimum column prototype size in order to be eligible to be
chosen when filling an empty row prototype. Default is |
row_num_to_move |
Number of rows to remove from the sampled prototype to put in the empty row prototype. Default is 1. |
col_num_to_move |
Number of columns to remove from the sampled prototype to put in the empty column prototype. Default is 1. |
row_shuffles |
Number of times to shuffle rows in each iteration. Default is 1. |
col_shuffles |
Number of times to shuffle columns in each iteration. Default is 1. |
max.iter |
Maximum number of iterations to let the algorithm run for. |
verbose |
Logical. If TRUE, will report progress. |
A list of class biclustermd
:
params |
a list of all arguments passed to the function, including defaults. |
data |
the inputted two way table of data. |
P0 |
the initial column partition matrix. |
Q0 |
the initial row partition matrix. |
InitialSSE |
the SSE of the original partitioning. |
P |
the final column partition matrix. |
Q |
the final row partition matrix. |
SSE |
a matrix of class biclustermd_sse detailing the SSE recorded at the end of each iteration. |
Similarities |
a data frame of class biclustermd_sim detailing the
value of row and column similarity measures recorded at the end of each
iteration. Contains information for all three similarity measures.
This carries an attribute |
iteration |
the number of iterations the algorithm ran for, whether |
A |
the final prototype matrix which gives the average of each bicluster. |
Li, J., Reisner, J., Pham, H., Olafsson, S., and Vardeman, S. (2020) Biclustering with Missing Data. Information Sciences, 510, 304–316.
rep_biclustermd
, tune_biclustermd
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | data("synthetic")
# default parameters
bc <- biclustermd(synthetic)
bc
autoplot(bc)
# providing the true number of row and column clusters
bc <- biclustermd(synthetic, col_clusters = 3, row_clusters = 2)
bc
autoplot(bc)
# an example with the nycflights13::flights dataset
library(nycflights13)
data("flights")
library(dplyr)
flights_bcd <- flights %>%
select(month, dest, arr_delay)
flights_bcd <- flights_bcd %>%
group_by(month, dest) %>%
summarise(mean_arr_delay = mean(arr_delay, na.rm = TRUE)) %>%
spread(dest, mean_arr_delay) %>%
as.data.frame()
rownames(flights_bcd) <- flights_bcd$month
flights_bcd <- as.matrix(flights_bcd[, -1])
flights_bc <- biclustermd(data = flights_bcd, col_clusters = 6, row_clusters = 4,
row_min_num = 3, col_min_num = 5,
max.iter = 20, verbose = TRUE)
flights_bc
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.