Description Usage Arguments Value References See Also Examples
View source: R/tune_biclustermd.R
Bicluster data over a grid of tuning parameters
1 2 3 4 5 6 7 | tune_biclustermd(
data,
nrep = 10,
parallel = FALSE,
ncores = 2,
tune_grid = NULL
)
|
data |
Dataset to bicluster. Must to be a data matrix with only numbers and missing values in the data set. It should have row names and column names. |
nrep |
The number of times to repeat the biclustering for each set of parameters. Default 10. |
parallel |
Logical indicating if the user would like to utilize the
|
ncores |
The number of cores to use if parallel computing. Default 2. |
tune_grid |
A data frame of parameters to tune over. The column names of
this must match the arguments passed to |
A list of:
best_combn |
The best combination of parameters, |
best_bc |
The minimum SSE biclustering using the parameters in
|
grid |
|
runtime |
CPU runtime & elapsed time. |
Li, J., Reisner, J., Pham, H., Olafsson, S., and Vardeman, S. (2019) Biclustering for Missing Data. Information Sciences, Submitted
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 | library(dplyr)
library(ggplot2)
data("synthetic")
tg <- expand.grid(
miss_val = fivenum(synthetic),
similarity = c("Rand", "HA", "Jaccard"),
col_min_num = 2,
row_min_num = 2,
col_clusters = 3:5,
row_clusters = 2
)
tg
# in parallel: two cores:
tbc <- tune_biclustermd(synthetic, nrep = 2, parallel = TRUE, ncores = 2, tune_grid = tg)
tbc
tbc$grid %>%
group_by(miss_val, col_clusters) %>%
summarise(avg_sd = mean(sd_sse)) %>%
ggplot(aes(miss_val, avg_sd, color = col_clusters, group = col_clusters)) +
geom_line() +
geom_point()
tbc <- tune_biclustermd(synthetic, nrep = 2, tune_grid = tg)
tbc
boxplot(tbc$grid$mean_sse ~ tbc$grid$similarity)
boxplot(tbc$grid$sd_sse ~ tbc$grid$similarity)
# nycflights13::flights dataset
library(nycflights13)
data("flights")
library(dplyr)
flights_bcd <- flights %>%
select(month, dest, arr_delay)
flights_bcd <- flights_bcd %>%
group_by(month, dest) %>%
summarise(mean_arr_delay = mean(arr_delay, na.rm = TRUE)) %>%
spread(dest, mean_arr_delay) %>%
as.data.frame()
# months as rows
rownames(flights_bcd) <- flights_bcd$month
flights_bcd <- as.matrix(flights_bcd[, -1])
flights_grid <- expand.grid(
row_clusters = 4,
col_clusters = c(6, 9, 12),
miss_val = fivenum(flights_bcd),
similarity = c("Rand", "Jaccard")
)
# RUN TIME: approximately 40 seconds across two cores.
flights_tune <- tune_biclustermd(
flights_bcd,
nrep = 10,
parallel = TRUE,
ncores = 2,
tune_grid = flights_grid
)
flights_tune
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.