collectAsTrainingSet: Converts a tidy dataframe into a Matrix::sparseMatrix of...

View source: R/dbplyHelper.R

collectAsTrainingSetR Documentation

Converts a tidy dataframe into a Matrix::sparseMatrix of features & associated outcome vector

Description

offloading the majority of processing onto sql if dbplyr tables are involved.

Usage

collectAsTrainingSet(
  df,
  sampleVar,
  outcomeVar,
  featureVar,
  valueVar = NULL,
  featureNameVar = NULL,
  factorise = TRUE,
  ...
)

Arguments

df

- a df

sampleVar

- the dataframe columns(s) which define the matrix row, quoted by vars(...) - typically this is the observation id ( see createSequentialIdentifier(...) )

outcomeVar

- the dataframe columns(s) which define the matrix columns, quoted by vars(...) - typically this is the feature id

featureVar

- the dataframe columns(s) which define the matrix columns, quoted by vars(...) - typically this is the feature id

valueVar

- the name of the value variable. (#TODO could be missing - in which case use binary)

...

- other parameters passes to Matrix::sparseMatrix & as.factor (for outcomes)

Value

a list with the following elements:

* rowLabels - the labels for each row in order * colLabels - the feature labels in order * matrix - a Matrix::sparseMatrix of the data, with values as doubles * outcome - a vector of outcomes (probably as a factor)


terminological/tidy-info-stats documentation built on Nov. 19, 2022, 11:23 p.m.