as.db.data.frame-methods: Convert other objects into a 'db.data.frame' object

as.db.data.frameR Documentation

Convert other objects into a db.data.frame object

Description

Methods for function as.db.data.frame in package GreenplumR. When x is a file name or data.frame, the method puts the data into a table in the database. When x is a db.Rquery object, it is converted into a table. When x is a db.data.frame object, a copy of the table/view that x points to is created.

Usage

 ## S4 method for signature 'character'
as.db.data.frame(x, table.name = NULL,
verbose = TRUE, conn.id = 1, add.row.names = FALSE, key = character(0),
distributed.by = NULL, append = FALSE, is.temp = FALSE, ...)

## S4 method for signature 'data.frame'
as.db.data.frame(x, table.name = NULL, verbose =
TRUE, conn.id = 1, add.row.names = FALSE, key = character(0),
distributed.by = NULL, append = FALSE, is.temp = FALSE, ...)

## S4 method for signature 'db.Rquery'
as.db.data.frame(x, table.name = NULL, verbose =
TRUE, is.view = FALSE, is.temp = FALSE, pivot = TRUE, distributed.by =
NULL, nrow = NULL, field.types = NULL, na.as.level = FALSE,
factor.full = rep(FALSE, length(names(x))))

## S4 method for signature 'db.data.frame'
as.db.data.frame(x, table.name = NULL, verbose
= TRUE, is.view = FALSE, is.temp = FALSE, distributed.by = NULL, nrow =
NULL, field.types = NULL)

as.db.Rview(x)

Arguments

x

The signature of this method.

When it is of type character, it should be a file name.

When it is of type data.frame, it is the data.frame that already exists in the current R session.

When it is of type db.Rquery, it represents a series of operations on a existing db.data.frame object. See db.Rquery for more.

For as.db.Rview, x must be a db.Rquery object.

table.name

A string, the name of the table to be created. The returned db.data.frame object is pointing to this table. When table.name is NULL, a random name is used, which also avoids the name conflicts.

verbose

A logical, default is TRUE, whether to print some prompt messages.

conn.id

An integer, default is 1. The ID of the connection. See db.list for more information.

add.row.names

A logical, default is FALSE, whether to add a column named "row.names" is added to the newly created table as the first column, which is just the row number of the original data.frame or file.

key

A string, default is character(0). The primary key column name. When it is not character(0), a primary key is created for this column.

distributed.by

A string, default is NULL. It is a column name or multiple column names separated by comma. When creating tables in a Greenplum database [1], the user can choose to specify whether he want to distributed the table onto multiple segments according the values of some columns. When this parameter is NULL, the data is distributed randomly, and when this parameter is an empty string code"", Greenplum database automatically chooses a column and distribute the data according to that column.

append

A logical, default is FALSE. Whether to append the content of a file or data.frame to an existing table in the database.

nrow

An integer, default is NULL. How many rows of data extracted from a db.Rquery object is used to create the new table. NULL means using all the rows.

is.temp

A logical, default is FALSE, whether the created table/view should be a temporary table/view.

...

Extra parameters used to create the table inside the database. We support the following parameters: header = FALSE, nrows = 50, sep = ",", eol = "\n", skip = 0.

header is a logical indicating whether the first data line (but see skip) has a header or not. If missing, it value is determined following read.table convention, namely, it is set to TRUE if and only if the first row has one fewer field that the number of columns.

nrowsWhen creating table from file or data.frame, the function will try to infer the data type of each column using the first nrows rows of the data.

sep specifies the field separator, and its default is ",".

eol specifies the end-of-line delimiter, and its default is "\n".

skip specifies number of lines to skip before reading the data, and it defaults to 0.

field.types A list of key=value pairs, where the value is a string of data type. Force the new table to use the data type for the column key.

is.view

A logical, default is FALSE, whether to create a view instead of a table.

pivot

A logical, default is TRUE, whether to create dummy columns for a column that has been denoted as "factor". See as.factor for more details.

na.as.level

A logical value, default is FALSE. Whether to treat NA value as a level in a categorical variable or just ignore it.

field.types

A list of key=value pairs, where the value is a string of data type. Force the new table to use the data type for the column key.

factor.full

A vector of logical values with the length of the column number. All FALSE by default. When the function creates dummy variables for a factor (categorical) variable, whether to create n dummies or n-1 dummies, where n is the number of levels of the factor. For some regression problem, we need to create dummy variables for all the distinct values of the categorical variable.

Value

A db.data.frame object. It points to a table whose name is given by table.name in connection conn.id.

Note

All the as.db.data.frame accept the option field.types.

Author(s)

Author: Predictive Analytics Team at Pivotal Inc.

Maintainer: Frank McQuillan, Pivotal Inc. fmcquillan@pivotal.io

References

[1] Greenplum database, http://www.greenplum.org

See Also

db.data.frame creates an object pointing to a table/view in the database.

lk looks at data from the table

db.Rquery this type of object represents operations on an existing db.data.frame object.

Examples

## Not run: 
## get the help for a method
## help("as.db.data.frame")
## help("as.db.data.frame,db.Rquery-method")



## set up the database connection
## Assume that .port is port number and .dbname is the database name
cid <- db.connect(port = .port, dbname = .dbname, verbose = FALSE)

## create a table from the example data.frame "abalone"
x <- as.db.data.frame(abalone, conn.id = cid, verbose = FALSE)

## preview of a table
lk(x, nrows = 10) # extract 10 rows of data

## do some operations and preview the result
y <- (x[,-2] + 1.2) * 2
lk(y, 20, FALSE)

## table abalone has a column named "id"
lk(sort(x, INDICES = x$id), 20) # the preview is ordered by "id" value

## create a copied table
## x[,] converts x from db.data.frame object to db.Rquery object
z <- as.db.data.frame(x[,])

## Force the data type, use random table name

z1 <- as.db.data.frame(x$rings, field.types = list(rings="integer"))

db.disconnect(cid, verbose = FALSE)

## End(Not run) 

greenplum-db/GreenplumR documentation built on Sept. 2, 2023, 8:09 a.m.