View source: R/parameter_estimation.R
hyperparameter_random_forest | R Documentation |
Estimate one or more hyperparameters using descriptions of GWAS p-values distributions from
simulated effect size distributions, such as those produced by sim_gen
using a random forest.
hyperparameter_random_forest(
x,
meta,
phenos,
sims,
hyperparameter_to_estimate = c("pi"),
center = T,
hold_percent = 0.25,
num_trees = 1000,
mtry = function(columns) columns,
num_threads = NULL,
parameter_transforms = reasonable_transform(hyperparameter_to_estimate)$forward,
parameter_back_transforms = reasonable_transform(hyperparameter_to_estimate)$back,
importance = "permutation",
scheme = "gwas",
peak_delta = 0.5,
peak_pcut = 5e-04,
window_sigma = 50,
quantiles = seq(0 + 0.001, 1 - 0.001, by = 0.001),
save_rf = FALSE,
pass_windows = NULL,
pass_G = NULL,
GMMAT_infile = NULL,
phased = FALSE,
maf = 0.05,
...
)
x |
numeric matrix. Input genotypes, SNPs as rows, columns as individuals. Genotypes formatted as 0,1,2 for the major homozygote, heterozygote, and minor homozygote, respectively. |
meta |
data.frame. Metadata for SNPs. First two columns must hold chromosome ID and position. Futher columns ignored. |
phenos |
numeric vector. Observed phenotypes, one per individual. |
sims |
data.frame. Data.frame that matches that produced by |
hyperparameter_to_estimate |
character vector, default "pi". Names of the hyperparameters to estimate via random forest. Must match column names in sims. |
center |
logical, default T. Determines if the phenotypes provided
should be centered (have their means set to 0).
This should match what was provided to |
hold_percent |
numeric < 1 and > 0, default .25. Proportion of sims to hold out from model estimation for use in cross-evalutation. |
num_trees |
numeric, default 1000. Number of trees to grow during the random forest. |
mtry |
function, default function(columns) columns. A function that, when given the number of columns containing distribution summary statistics,
returns the number of variables to possibly split at each node during random forest. For example, function(columns) columns/2 would have an mtry
equal to half the number of summary statistics. See |
num_threads |
numeric, default NULL. Number of processing threads to use for tree growth and cross-evaluation. |
importance |
character, default "permutation". Determines how variable importance is computed, if it is at all. See |
peak_delta |
numeric, default 0.5. Value used to determine spacing between called peaks during peak identification for distribution description. |
peak_pcut |
numeric, default 0.0005. Only p-values below this quantile will be used for peak detection during peak indentification for distribution description. |
window_sigma |
numeric, default = 50. Size of the windows in megabases to be used during distribution description. |
quantiles |
numeric, default seq(0 + 0.001, 1 - 0.001, .001). Density quantiles over which to estimate parameter values. |
save_rf |
logical, default FALSE. If true, the raw ranger random forest object is returned. Can be extremely large, and not needed unless different quantiles/predictions/etc are needed. |
... |
Extra arguments passed to |
parameter_transforms. |
Named list of parameter transformation functions or NULL, default the |
parameter_back_transforms. |
Named list of parameter back transformation functions or NULL, default |
William Hemstrom
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.