dataset-to-R: as.data.frame method for CrunchDataset

dataset-to-RR Documentation

as.data.frame method for CrunchDataset

Description

This method is defined principally so that you can use a CrunchDataset as a data argument to other R functions (such as stats::lm()) without needing to download the whole dataset. You can, however, choose to download a true data.frame.

Usage

## S3 method for class 'CrunchDataset'
as.data.frame(
  x,
  row.names = NULL,
  optional = FALSE,
  force = FALSE,
  categorical.mode = "factor",
  row.order = NULL,
  include.hidden = TRUE,
  ...
)

## S3 method for class 'CrunchDataFrame'
as.data.frame(
  x,
  row.names = NULL,
  optional = FALSE,
  include.hidden = attr(x, "include.hidden"),
  array_strategy = c("alias", "qualified_alias", "packed"),
  verbose = TRUE,
  ...
)

Arguments

x

a CrunchDataset or CrunchDataFrame

row.names

part of as.data.frame signature. Ignored.

optional

part of as.data.frame signature. Ignored.

force

logical: actually coerce the dataset to data.frame, or leave the columns as unevaluated promises. Default is FALSE.

categorical.mode

what mode should categoricals be pulled as? One of factor, numeric, id (default: factor)

row.order

vector of indices. Which, and their order, of the rows of the dataset should be presented as (default: NULL). If NULL, then the Crunch Dataset order will be used.

include.hidden

logical: should hidden variables be included? (default: TRUE)

...

additional arguments passed to as.data.frame (default method).

array_strategy

Strategy to import array variables: "alias" (the default) reads them as flat variables with the subvariable aliases, unless there are duplicate aliases in which case they are qualified in brackets after the array alias, like "array_alias[subvar_alias]". "qualified_alias" always uses the bracket notation. "packed" reads them in what the tidyverse calls "packed" data.frame columns, with the alias from the array variable, and subvariables as the columns of the data.frame.

verbose

Whether to output a message to the console when subvariable aliases are qualified when array_strategy="alias" (defaults to TRUE)

Details

By default, the as.data.frame method for CrunchDataset does not return a data.frame but instead CrunchDataFrame, which behaves like a data.frame without bringing the whole dataset into memory. When you access the variables of a CrunchDataFrame, you get an R vector, rather than a CrunchVariable. This allows modeling functions that require select columns of a dataset to retrieve only those variables from the remote server, rather than pulling the entire dataset into local memory.

If you call as.data.frame() on a CrunchDataset with force = TRUE, you will instead get a true data.frame. You can also get this data.frame by calling as.data.frame on a CrunchDataFrame (effectively calling as.data.frame on the dataset twice)

When a data.frame is returned, the function coerces Crunch Variable values into their R equivalents using the following rules:

  • Numeric variables become numeric vectors

  • Text variables become character vectors

  • Datetime variables become either Date or POSIXt vectors

  • Categorical variables become either factors with levels matching the Crunch Variable's categories (the default), or, if categorical.mode is specified as "id" or "numeric", a numeric vector of category ids or numeric values, respectively

  • Array variables (Categorical Array, Multiple Response) can be decomposed into their constituent categorical subvariables or put in 'packed' data.frame columns, see the array_strategy argument.

Column names in the data.frame are the variable/subvariable aliases.

Value

When called on a CrunchDataset, the method returns an object of class CrunchDataFrame unless force = TRUE, in which case the return is a data.frame. For CrunchDataFrame, the method returns a data.frame.

See Also

as.vector()


Crunch-io/rcrunch documentation built on Sept. 14, 2024, 11:13 p.m.