tidymodl: Creates a model matrix style R6 class for modelling with long...

tidymodlR Documentation

Creates a model matrix style R6 class for modelling with long tidy data

Description

Creates a model matrix style R6 class for modelling with long tidy data

Public fields

data

(data.frame())
The original tidy long data frame

parent

(data.frame())
The parent identifiers of the original data

child

(data.frame())
The model matrix version of the data

key

(data.frame())
A ⁠key value⁠ table that links the parent and child data.frames.

Methods

Public methods


Method new()

Creates a new instance of this R6 class.

Create a new tidymodl object.

Usage
tidymodl$new(df, pivot_column, pivot_value)
Arguments
df

A tidy long data frame

pivot_column

The column name on which the pivot will occur

pivot_value

The column name of the values to be pivotted

Returns

A new tidymodl object.


Method assemble()

Adds a results matrix

Usage
tidymodl$assemble(newdata, format = "long")
Arguments
newdata

A new data set to append. Needs to be either:

  • A vector of length equal to the number of rows in the model matrix. For example, the output of predict() of a lm model. In this case the function returns a data.frame of dimensions c(nrow(parent), ncol(parent) + 1)

  • A data.frame/matrix of equal dimensions of the model matrix. For example, the output of xgb_impute(). In this case the function returns a data.frame of dimensions c(nrow(data), ncol(data) + 1)

format

The desired format of the returned data frame, can either be "long" or "wide".

Details

This returns a completed data.frame for four use cases based on user preference of the desired format.

  • Format "long":

    • Use Case 1 - "newdata" is a vector of length nrow(child): The function returns a combined data frame of the parent data and the "newdata" in a new column. Useful when the user wants to append an output of, for example, predict for a lm regression model.

    • Use Case 2 - "newdata" is a matrix of dimensions dim(child): The function returns a data.frame of the original data in long format with the "newdata" in a new column. Useful when the user wants to append an output of, for example, xgb_impute for all original data.

  • Format "wide":

    • Use Case 3 - "newdata" is a vector of length nrow(child): The function returns a combined data frame of the parent data and the "newdata" in a new column. Useful when the user wants to append an output of, for example, predict for a lm regression model.

    • Use Case 4 - "newdata" is a matrix of dimensions dim(child): The function returns a data.frame of the original data in wide format with the "newdata" as replacing the child matrix of the original data. Useful when the user is only interested in using the output of, for example, xgb_impute for all original data.

Returns

df A Data Frame


Method print()

Prints the key and the head matrix

Usage
tidymodl$print()

Method correlate()

Correlates and reutrns pearson values

Usage
tidymodl$correlate()
Returns

df A Correlation Matrix of class cor_df (see corrr)


Method pca()

Provides high level principal components analysis

Usage
tidymodl$pca()
Returns

df A principle components of class PCA (see FactoMineR


Method clone()

The objects of this class are cloneable with this method.

Usage
tidymodl$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

Note

Use Cases 1 and 3 return identical results.

Examples

data(wb)
mdl <- tidymodl$new(wb,
                   pivot_column = "indicator",
                  pivot_value = "value")
### Use mdl$child for modelling
fit <- lm(data = mdl$child, gni ~ gcu + ppt)

### Can be used to add a yhat value for processed data

nc <- ncol(mdl$child)
nr <- nrow(mdl$child)
dm <- nc * nr
dummy <- matrix(runif(dm),
                ncol = nc) |>
                data.frame()
names(dummy) = names(mdl$child)
tmp <- mdl$assemble(dummy)

# In built correlation function
mdl$correlate()

tmp <- mdl$pca()
plot(tmp, choix = "var")


tidymodlr documentation built on Sept. 11, 2024, 9:18 p.m.