sqlrepsurvey: Large survey with replicate weights, based on SQL database

Description Usage Arguments Details Value See Also Examples

Description

Specifies a survey design with replicate weights based on a connection to a relational database (currently MonetDB).

Usage

1
2
3
4
5
sqlrepsurvey(weights, repweights, scale, rscales, driver=MonetDB.R(), database, table.name, key = "row_names", mse = FALSE, check.factors = 10, degf=NULL,...)
## S3 method for class 'sqlrepsurvey'
close(con, ...)
## S3 method for class 'sqlrepsurvey'
open(con, driver,...)

Arguments

weights

Character string naming the weight variable

repweights

Vector of character strings naming the replicate-weight variables, or a regular expression that selects the correct variable names from those in the table.

scale

A single number for scaling the sum of squared deviations of the replicates.

rscales

A vector of the same length as repweights giving per-replicate scaling for squared deviations of the replicates

driver

A database driver object (eg returned by MonetDB()) or NULL if the database argument is already a database connection.

database

Either a connection to a MonetDB database or a character string with the name (URL) of a database containing the data table

table.name

A character string with the name of data table containing the data and replicate weights

key

A character string with the name of a unique identified variable.

mse

If TRUE compute standard errors based on deviations of replicates from the point estimate, if FALSE use deviations from the mean of the replicates.

check.factors

If this is a non-zero number, R will attempt to determine which variables in the database table are factors based on having at most this many distinct values, and will store information on the levels in the survey design object. This can be slow for a very large survey. check.factors can also be a zero-row data frame with the correct factor levels for the factor variables.

degf

Optional user-specified degrees of freedom for the design. Defaults to one less than the number of replicates.

...

Other arguments to dbConnect, such as user and password

con

object of class sqlrepsurvey

Details

For the American Community Survey, scale is 4/80 and rscales is rep(1,80).

The check.factors operation can be slow (eg over an hour for an American Community Survey dataset with 9 million records and 300 variables). If the survey object is saved with save(), it can be reconnected to the database with open, so that it only needs to be created once. The most flexible and fastest approach is usually to create the zero-row data frame manually from the data documentation: only the columns for factor variables need to be supplied, as the code will assume other variables are numeric.

close closes the database connection, first attempting to garbage-collect any tables corresponding to non-existent R objects.

open re-opens the database connection.

Value

an object of class sqlrepsurvey

See Also

sqlsurvey, MonetDB

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
## Not run: 
## assumes data already in database
library(sqlsurvey)
monetdriver<-MonetDB(classPath="/usr/local/monetdb/share/monetdb/lib/monetdb-jdbc-2.4.jar")

alabama<-sqlrepsurvey("pwgtp",paste("pwgtp",1:80,sep=""),key="idkey",scale=4/80,rscales=rep(1,80),
mse=TRUE,database="jdbc:monetdb://localhost/ACS",
driver=monetdriver,user="monetdb",password="monetdb",table.name="alabama3yr",check.factors=TRUE)

## verify against Census Bureau totals
svytotal(~sex,alabama)
svytotal(~I(agep %in% 0:4)+I(agep %in% 5:9)+I(agep %in% 10:14)+I(agep %in% 15:19),alabama)
svytotal(~I(agep %in% 20:24)+I(agep %in% 25:34)+I(agep %in% 35:44)+I(agep %in% 45:54),alabama)
svytotal(~I(agep %in% 55:59)+I(agep %in% 60:64)+I(agep %in% 65:74)+I(agep %in% 75:84)+I(agep>84),alabama)

## other analyses
svymean(~wagp, subset(alabama, !is.na(wagp)), byvar=~sex,se=TRUE)
svyquantile(~agep, alabama,quantiles=0.5,se=TRUE)

plot(svysmooth(wagp~wkhp,alabama,sample.bandwidth=5000))


## with regular expression
alabama<-sqlrepsurvey("pwgtp",repweights="pwgtp[1-9]",key="idkey",scale=4/80,rscales=rep(1,80),
mse=TRUE,database="jdbc:monetdb://localhost/ACS",
driver=monetdriver,user="monetdb",password="monetdb",table.name="alabama3yr",check.factors=TRUE)


## End(Not run)

sqlsurvey documentation built on May 2, 2019, 4:53 p.m.