fit_estimate_partition: Fit Grid Partition and estimate cell stats
In microsoft/CausalGrid: Analysis of Subgroups

fit_estimate_partition

R Documentation

Fit Grid Partition and estimate cell stats

Description

Split the data, one one side train/fit the partition using fit_partition and then on the other estimate subgroup effects.

Tests whether the object is an estimated_partition object.

Usage

fit_estimate_partition(
  y,
  X,
  d = NULL,
  tr_split = 0.5,
  max_splits = Inf,
  max_cells = Inf,
  min_size = 3,
  cv_folds = 5,
  potential_lambdas = NULL,
  partition_i = NA,
  verbosity = 0,
  breaks_per_dim = NULL,
  bucket_min_n = NA,
  bucket_min_d_var = FALSE,
  ctrl_method = "",
  pr_cl = NULL,
  alpha = 0.05,
  bump_samples = 0,
  bump_ratio = 1,
  importance_type = "",
  ...
)

is_estimated_partition(x)

Arguments

`y`	Nx1 matrix of outcome (label/target) data. With multiple core estimates see Details below.
`X`	NxK matrix of features (covariates). With multiple core estimates see Details below.
`d`	(Optional) NxP matrix (with colnames) of treatment data. If all equally important they should be normalized to have the same variance. With multiple core estimates see Details below.
`tr_split`	Number between 0 and 1 or vector of indexes. If Multiple effect #3 and using vector then pass in list of vectors.
`max_splits`	Maximum number of splits even if splits continue to improve OOS fit
`max_cells`	Maximum number of cells even if more splits continue to improve OOS fit
`min_size`	Minimum cell size when building full grid, cv_tr will use (F-1)/F*min_size, cv_te doesn't use any.
`cv_folds`	Number of CV Folds or a vector of foldids. If m_mode==DS.MULTI_SAMPLE, then a list with foldids per Dataset. Each must be over the training sample
`potential_lambdas`	potential lambdas to search through in CV
`partition_i`	Default NA. Use this to avoid CV
`verbosity`	0 print no message. 1 prints progress bar for high-level loops. 2 prints detailed output for high-level loops. Nested operations decrease verbosity by 1.
`breaks_per_dim`	NULL (for all possible breaks); K-length vector with # of break (chosen by quantiles); or K-dim list of vectors giving potential split points for non-categorical variables (can put c(0) for categorical). Similar to 'discrete splitting' in CausalTree though their they do separate split-points for treated and controls.
`bucket_min_n`	Minimum number of observations needed between different split checks
`bucket_min_d_var`	Ensure positive variance of d for the observations between different split checks
`ctrl_method`	Method for determining additional control variables. Empty ("") for nothing, "all", "LassoCV", or "RF"
`pr_cl`	Default NULL. Parallel cluster. Used for: CVing the optimal lambda, fitting full tree (at each split going across dimensions), fitting trees over the bumped samples for importance weights to estimate models over limited X domains
`alpha`	Significance threshold for confidence intervals. Default=0.05
`bump_samples`	Number of bump bootstraps (default 0), or list of such length where each items is a bootstrap sample. If m_mode==DS.MULTI_SAMPLE then each item is a sublist with such bootstrap samples over each dataset. Each bootstrap sample must be over the train split of the data
`bump_ratio`	For bootstraps the ratio of sample size to sample (between 0 and 1, default 1)
`importance_type`	Options: single - (smart) redo full fitting removing each possible dimension interaction - (smart) redo full fitting removing each pair of dimensions "" - Nothing
`...`	Additional params.
`x`	an R object

Value

An object with class "estimated_partition".

`partition`	`grid_partition` obj defining cuts
`cell_stats`	Cell stats from `est_cell_stats$stats` on the est sample
`importance_weights`	Importance weights for each feature
`interaction_weights`	Interaction weights for each pair of features
`lambda`	lambda used
`is_obj_val_seq`	In-sample objective function values for sequence of partitions
`complexity_seq`	Complexity #s (# cells-1) for sequence of partitions
`partition_i`	Index of Partition selected in sequence
`split_seq`	Sequence of `partition_splits`s. Note that split i corresponds to partition i+1
`index_tr`	Index of training sample (we might have generated it). Order N
`cv_foldid`	CV foldids for the training sample (Size of N_tr)
`varnames`	varnames (or c("X1", "X2",...) if X doesn't have colnames)
`est_plan`	Fitted `EstimatorPlan` used.
`full_stat_df`	Full sample average stats from `est_full_stats`

True if x is an estimated_partition

Functions

is_estimated_partition: is estimated_partition

Multiple estimates

With multiple core estimates (M) there are 3 options (the first two have the same sample across treatment effects).

DS.MULTI_SAMPLE: Multiple pairs of (Y_m,W_m). y,X,d are then lists of length M. Each element then has the typical size The N_m may differ across m. The number of columns of X will be the same across m.
DS.MULTI_D: Multiple treatments and a single outcome. d is then a NxM matrix.
DS.MULTI_Y: A single treatment and multiple outcomes. y is then a NXM matrix.

microsoft/CausalGrid documentation built on Aug. 25, 2022, 9:30 a.m.