Description Usage Arguments Value Examples
In plant breeding, it is common to leave one population out when training a machine learning model. This package allows the user to run random forest using one population as the test set, as well as with other common CV strategies (e.g. 5-fold, 10-fold, etc.).
1 | RF(dat, cv_method)
|
dat |
a data.frame with columns "Y", "GE_ID", "Parent_A", "Parent_B", followed by one column per genetic marker. |
cv_method |
Either an integer, specifying the number of folds, or "family" to fold according to population. |
A list of two data.frames, one containing the predictions for each GE, and one containing the correlation coefficients between observed and predicted values for each fold.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | library(magrittr)
set.seed(76123)
row_num <- 1000
ex <- data.frame(
Y = runif(row_num, 30, 150),
GE_ID = runif(row_num, 1e8, 2e8),
Parent_A = sample(LETTERS[1:4], row_num, replace = TRUE),
Parent_B = sample(LETTERS[5:8], row_num, replace = TRUE)
) %>%
cbind(replicate(10, runif(row_num, 0, 1)) %>%
as.data.frame() %>%
magrittr::set_names(paste0("Marker", seq(10))))
result <- RF(ex, "family")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.