data_helpers: Data helpers

data_helpersR Documentation

Data helpers

Description

Various helpers to simulate data and to manipulate data types between compact and long forms.

collapse_data can be used to convert long form data to compact form data,

expand_data can be used to convert compact form data (one row per data type) to long form data (one row per observation).

make_data generates a dataset with one row per observation.

make_events generates a dataset with one row for each data type. Draws full data only. To generate various types of incomplete data see make_data.

Usage

collapse_data(
  data,
  model,
  drop_NA = TRUE,
  drop_family = FALSE,
  summary = FALSE
)

expand_data(data_events = NULL, model)

make_data(
  model,
  n = NULL,
  parameters = NULL,
  param_type = NULL,
  nodes = NULL,
  n_steps = NULL,
  probs = NULL,
  subsets = TRUE,
  complete_data = NULL,
  given = NULL,
  verbose = FALSE,
  ...
)

make_events(
  model,
  n = 1,
  w = NULL,
  P = NULL,
  A = NULL,
  parameters = NULL,
  param_type = NULL,
  include_strategy = FALSE,
  ...
)

Arguments

data

A data.frame. Data of nodes that can take three values: 0, 1, and NA. In long form as generated by make_events

model

A causal_model. A model object generated by make_model.

drop_NA

Logical. Whether to exclude strategy families that contain no observed data. Exceptionally if no data is provided, minimal data on data on first node is returned. Defaults to 'TRUE'

drop_family

Logical. Whether to remove column strategy from the output. Defaults to 'FALSE'.

summary

Logical. Whether to return summary of the data. See details. Defaults to 'FALSE'.

data_events

A 'compact' data.frame with one row per data type. Must be compatible with nodes in model. The default columns are event, strategy and count.

n

An integer. Number of observations.

parameters

A vector of real numbers in [0,1]. Values of parameters to specify (optional). By default, parameters is drawn from the parameters dataframe. See inspect(model, "parameters_df").

param_type

A character. String specifying type of parameters to make 'flat', 'prior_mean', 'posterior_mean', 'prior_draw', 'posterior_draw', 'define. With param_type set to define use arguments to be passed to make_priors; otherwise flat sets equal probabilities on each nodal type in each parameter set; prior_mean, prior_draw, posterior_mean, posterior_draw take parameters as the means or as draws from the prior or posterior.

nodes

A list. Which nodes to be observed at each step. If NULL all nodes are observed.

n_steps

A list. Number of observations to be observed at each step

probs

A list. Observation probabilities at each step

subsets

A list. Strata within which observations are to be observed at each step. TRUE for all, otherwise an expression that evaluates to a logical condition.

complete_data

A data.frame. Dataset with complete observations. Optional.

given

A string specifying known values on nodes, e.g. "X==1 & Y==1"

verbose

Logical. If TRUE prints step schedule.

...

Arguments to be passed to make_priors if param_type == define

w

A numeric matrix. A 'n_parameters x 1' matrix of event probabilities with named rows.

P

A data.frame. Parameter matrix. Not required but may be provided to avoid repeated computation for simulations. See inspect(model, "parameter_matrix").

A

A data.frame. Ambiguities matrix. Not required but may be provided to avoid repeated computation for simulations. inspect(model, "ambiguities_matrix")

include_strategy

Logical. Whether to include a 'strategy' vector. Defaults to FALSE. Strategy vector does not vary with full data but expected by some functions.

Details

Note that default behavior is not to take account of whether a node has already been observed when determining whether to select or not. One can however specifically request observation of nodes that have not been previously observed.

Value

A vector of data events

If summary = TRUE 'collapse_data' returns a list containing the following components:

data_events

A compact data.frame of event types and strategies.

observed_events

A vector of character strings specifying the events observed in the data

unobserved_events

A vector of character strings specifying the events not observed in the data

A data.frame with rows as data observation

A data.frame with simulated data.

A data.frame of events

See Also

Other data_generation: get_all_data_types(), make_data_single(), observe_data()

Other data_generation: get_all_data_types(), make_data_single(), observe_data()

Examples



model <- make_model('X -> Y')

df <- data.frame(X = c(0,1,NA), Y = c(0,0,1))

df |> collapse_data(model)

# Illustrating options

df |> collapse_data(model, drop_NA = FALSE)

df |> collapse_data(model, drop_family = TRUE)

df |> collapse_data(model, summary = TRUE)

# Appropriate behavior given restricted models

model <- make_model('X -> Y') |>
  set_restrictions('X[]==1')
df <- make_data(model, n = 10)
df[1,1] <- ''
df |> collapse_data(model)

df <- data.frame(X = 0:1)
df |> collapse_data(model)




model <- make_model('X->M->Y')
make_events(model, n = 5) |>
  expand_data(model)
make_events(model, n = 0) |>
  expand_data(model)
 


# Simple draws
model <- make_model("X -> M -> Y")
make_data(model)
make_data(model, n = 3, nodes = c("X","Y"))
make_data(model, n = 3, param_type = "prior_draw")
make_data(model, n = 10, param_type = "define", parameters =  0:9)

# Data Strategies
# A strategy in which X, Y are observed for sure and M is observed
# with 50% probability for X=1, Y=0 cases

model <- make_model("X -> M -> Y")
make_data(
  model,
  n = 8,
  nodes = list(c("X", "Y"), "M"),
  probs = list(1, .5),
  subsets = list(TRUE, "X==1 & Y==0"))

# n not provided but inferred from largest n_step (not from sum of n_steps)
make_data(
  model,
  nodes = list(c("X", "Y"), "M"),
  n_steps = list(5, 2))

# Wide then deep
  make_data(
  model,
  n = 8,
  nodes = list(c("X", "Y"), "M"),
  subsets = list(TRUE, "!is.na(X) & !is.na(Y)"),
  n_steps = list(6, 2))


make_data(
  model,
  n = 8,
  nodes = list(c("X", "Y"), c("X", "M")),
  subsets = list(TRUE, "is.na(X)"),
  n_steps = list(3, 2))

# Example with probabilities at each step

make_data(
  model,
  n = 8,
  nodes = list(c("X", "Y"), c("X", "M")),
  subsets = list(TRUE, "is.na(X)"),
  probs = list(.5, .2))

# Example with given data
make_data(model, given = "X==1 & Y==1", n = 5)

model <- make_model('X -> Y')
make_events(model = model)
make_events(model = model, param_type = 'prior_draw')
make_events(model = model, include_strategy = TRUE)



CausalQueries documentation built on April 3, 2025, 7:46 p.m.