fit_estimate_partition | R Documentation |
Split the data, one one side train/fit the partition using fit_partition
and then on the other estimate subgroup effects.
Tests whether the object is an estimated_partition
object.
fit_estimate_partition( y, X, d = NULL, tr_split = 0.5, max_splits = Inf, max_cells = Inf, min_size = 3, cv_folds = 5, potential_lambdas = NULL, partition_i = NA, verbosity = 0, breaks_per_dim = NULL, bucket_min_n = NA, bucket_min_d_var = FALSE, ctrl_method = "", pr_cl = NULL, alpha = 0.05, bump_samples = 0, bump_ratio = 1, importance_type = "", ... ) is_estimated_partition(x)
y |
Nx1 matrix of outcome (label/target) data. With multiple core estimates see Details below. |
X |
NxK matrix of features (covariates). With multiple core estimates see Details below. |
d |
(Optional) NxP matrix (with colnames) of treatment data. If all equally important they should be normalized to have the same variance. With multiple core estimates see Details below. |
tr_split |
Number between 0 and 1 or vector of indexes. If Multiple effect #3 and using vector then pass in list of vectors. |
max_splits |
Maximum number of splits even if splits continue to improve OOS fit |
max_cells |
Maximum number of cells even if more splits continue to improve OOS fit |
min_size |
Minimum cell size when building full grid, cv_tr will use (F-1)/F*min_size, cv_te doesn't use any. |
cv_folds |
Number of CV Folds or a vector of foldids. If m_mode==DS.MULTI_SAMPLE, then a list with foldids per Dataset. Each must be over the training sample |
potential_lambdas |
potential lambdas to search through in CV |
partition_i |
Default NA. Use this to avoid CV |
verbosity |
0 print no message. 1 prints progress bar for high-level loops. 2 prints detailed output for high-level loops. Nested operations decrease verbosity by 1. |
breaks_per_dim |
NULL (for all possible breaks); K-length vector with # of break (chosen by quantiles); or K-dim list of vectors giving potential split points for non-categorical variables (can put c(0) for categorical). Similar to 'discrete splitting' in CausalTree though their they do separate split-points for treated and controls. |
bucket_min_n |
Minimum number of observations needed between different split checks |
bucket_min_d_var |
Ensure positive variance of d for the observations between different split checks |
ctrl_method |
Method for determining additional control variables. Empty ("") for nothing, "all", "LassoCV", or "RF" |
pr_cl |
Default NULL. Parallel cluster. Used for:
|
alpha |
Significance threshold for confidence intervals. Default=0.05 |
bump_samples |
Number of bump bootstraps (default 0), or list of such length where each items is a bootstrap sample. If m_mode==DS.MULTI_SAMPLE then each item is a sublist with such bootstrap samples over each dataset. Each bootstrap sample must be over the train split of the data |
bump_ratio |
For bootstraps the ratio of sample size to sample (between 0 and 1, default 1) |
importance_type |
Options: single - (smart) redo full fitting removing each possible dimension interaction - (smart) redo full fitting removing each pair of dimensions "" - Nothing |
... |
Additional params. |
x |
an R object |
An object with class "estimated_partition"
.
partition |
|
cell_stats |
Cell stats from |
importance_weights |
Importance weights for each feature |
interaction_weights |
Interaction weights for each pair of features |
lambda |
lambda used |
is_obj_val_seq |
In-sample objective function values for sequence of partitions |
complexity_seq |
Complexity #s (# cells-1) for sequence of partitions |
partition_i |
Index of Partition selected in sequence |
split_seq |
Sequence of |
index_tr |
Index of training sample (we might have generated it). Order N |
cv_foldid |
CV foldids for the training sample (Size of N_tr) |
varnames |
varnames (or c("X1", "X2",...) if X doesn't have colnames) |
est_plan |
Fitted |
full_stat_df |
Full sample average stats from |
True if x is an estimated_partition
is_estimated_partition
: is estimated_partition
With multiple core estimates (M) there are 3 options (the first two have the same sample across treatment effects).
DS.MULTI_SAMPLE: Multiple pairs of (Y_m,W_m). y,X,d are then lists of length M. Each element then has the typical size The N_m may differ across m. The number of columns of X will be the same across m.
DS.MULTI_D: Multiple treatments and a single outcome. d is then a NxM matrix.
DS.MULTI_Y: A single treatment and multiple outcomes. y is then a NXM matrix.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.