format_data: Format the data in the appropriate way for the algorithm

View source: R/format_data.R

format_dataR Documentation

Format the data in the appropriate way for the algorithm

Description

Format the data in the appropriate way for the algorithm

Usage

format_data(data, beta = NULL)

Arguments

data

is an environment containing the observed data: - data$Y is a matrix of size n x Tmax containing the values of the dependent variable Y, - data$X is an array of size n x Tmax x dimX containing the values of the covariates X -data$clusterIndexes is a vector of size n x 1 that specifies the cluster each observation pertains to. If it does not exist, the function enforces the default setting of i.i.d. observations - the parameter takes value 1:n so that each observation is its own cluster.

beta

(default NULL) the slope parameter for the FE logit model. If specified, computes a few extra variables from the data.

Value

The algorithm does not return anything but data has been modified so that: - data$Y is a matrix of size n x Tmax containing the values of the dependent variable Y, - data$X is an array of size n x Tmax x dimX containing the values of the covariates X - data$S is a vector of size n x 1 that counts, for each row (individual), the number of columns (periods) for which Y is 1. - data$Tobsd is a vector of size n x 1 that counts, for each row (individual), the number of columns (periods) that are not NAs (observed). - data$clusterIndexes is a vector of size n x 1 that specifies the cluster each observation pertains to. If it does not exist, the function enforces the default setting of i.i.d. observations - the parameter takes value 1:n so that each observation is its own cluster. - data$V is a matrix of size n x Tmax containing, for each individual-period pair, the value of X'beta. It is NA for unobserved individual-period pairs. This parameter is undefined if beta was NULL (or not specified as input). - data$C_S_tab is a matrix of size n x (Tmax + 1). Each row (individual) has, in the j_th column, the value of C_(j-1)(X, beta). It is NA when j - 1 is larger than Tobsd. This parameter is undefined if beta was NULL (or not specified as input). - data$C_S_minus_one_one_period_out_array is a matrix of size n x Tmax x Tmax. Each row (individual) has, in its t-th third-dimensional slice and j-th column (period), the value of C_(j-1)(Z, beta) where Z is the same as the relevant row of X deprived from its t-th period.


cgaillac/MarginalFElogit documentation built on Dec. 24, 2024, 3:23 p.m.