scale-methods: Scaling and centering of tables

scaleR Documentation

Scaling and centering of tables

Description

scale centers and/or scales the columns of a numeric table.

Usage

## S4 method for signature 'db.obj'
scale(x, center = TRUE, scale = TRUE)

Arguments

x

A db.obj object. It represents a table/view in the database if it is an db.data.frame object, or a series of operations applied on an existing db.data.frame object if it is a db.Rquery object.

center

either a logical value or a numeric vector of length equal to the number of columns of 'x'.

scale

either a logical value or a numeric vector of length equal to the number of columns of 'x'.

Details

The value of 'center' determines how column centering is performed. If 'center' is a numeric vector with length equal to the number of columns of 'x', then each column of 'x' has the corresponding value from 'center' subtracted from it. If 'center' is 'TRUE' then centering is done by subtracting the column means (omitting 'NA's) of 'x' from their corresponding columns, and if 'center' is 'FALSE', no centering is done.

The value of 'scale' determines how column scaling is performed (after centering). If 'scale' is a numeric vector with length equal to the number of columns of 'x', then each column of 'x' is divided by the corresponding value from 'scale'. If 'scale' is 'TRUE' then scaling is done by dividing the (centered) columns of 'x' by their standard deviations if 'center' is 'TRUE', and the root mean square otherwise. If 'scale' is 'FALSE', no scaling is done.

The root-mean-square for a (possibly centered) column is defined as sqrt(sum(x^2)/(n-1)), where x is a vector of the non-missing values and n is the number of non-missing values. In the case 'center = TRUE', this is the same as the standard deviation, but in general it is not. (To scale by the standard deviations without centering, use 'scale(x, center = FALSE, scale = lookat(sd(x)))'.)

Value

A db.Rquery object. It computes the centering and/or scaling of codex for each column including array elements. The result can be viewed using lk or lookat.

The numeric centering and scalings used (if any) are returned as attributes "scaled:center" and "scaled:scale". The number of rows in the table is also returned as the attribute "row.number".

Author(s)

Author: Predictive Analytics Team at Pivotal Inc.

Maintainer: Frank McQuillan, Pivotal Inc. fmcquillan@pivotal.io

See Also

db.array creates an array column for a db.Rquery object.

Examples

## Not run: 
## help("scale,db.obj-method") # display this doc


## set up the database connection
## Assume that .port is port number and .dbname is the database name
cid <- db.connect(port = .port, dbname = .dbname)

x <- as.db.data.frame(abalone, conn.id = cid)
lk(x, 10)

s <- scale(x[-c(1,2)]) # scale all numeric columns

centers <- attr(s, "scaled:center")
scales <- attr(s, "scaled:scale")

## create the scaled table
delete("scaled_abalone")
y <- as.db.data.frame(s, "scaled_abalone")

lk(y, 10)

db.disconnect(cid, verbose = FALSE)

## End(Not run)

greenplum-db/GreenplumR documentation built on Sept. 2, 2023, 8:09 a.m.