dpSet: Create and manage datapuppy 'sets' and 'batches'

Description Usage Arguments Details Value

Description

Datapuppy Batches are collections of records derived from a single data file (e.g., a data file downloaded from a data logger or a spreadsheet that contains field observations). Datapuppy sets are collections of data batches to be loaded into a single database.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
dpBatch(batchRecord, set, dataValues, rawValues = data.frame())

is.dpBatch(x)

dpSet(setPath, connectionArgs, batchRowColumnName, datumValueColumnName,
  datumTypeColumnName, batchNameColumnName, batchesTableName, dataTableName,
  typesTableName)

dpLoadSet(setPath)

is.dpSet(x)

Arguments

batchRecord

A named list of values describing the batch. Values in the batchRecord are stored as a record in the batches table. Names attribute of batchRecord must contain the name of the database table column where each value is to be stored.

set

A dpSet object describing the set to be operated on (or the set to which a batch will be added). Alternatively, a setPath (see 'setPath' argument in dpSet) from which a set will be loaded.

dataValues

A data.frame with the values that are to be loaded into the database.

x

An object to be tested.

setPath

A character string containing the path to the set (batch) directory. This directory contains the "dpSet.rData" (]code"dpBatch.rData) file and a subfolder for each batch included in the set. The folder must exist and be empty when a set is created using dpSet(). If passed to other functions as a 'set' argument, the folder specified by setPath must contain a 'dpSet.rData' file. If a path is not fully specified, it is assumed to be a subdirectory of the R working directory (see getwd).

batchRowColumnName

A character string containing the name of a column in the dataTable of the database; the column is used to store the row number of the dataValues data.frame that was the source of the datum.

batchNameColumnName

A character string containing the name of a column in the batchTable of the database that stores a unique name for the batch. This name should be meaningful to a human to identify the batch, not the serial number (primary key) assigned to the batch automatically by datapuppy.

batchesTableName

A character string containing the name of the batchesTable in the database.

dataTableName

A character string containing the name of the dataTable in the database.

typesTableName

A character string containing the name of the typesTable in the database.

batchPath

same as setPath, above.

validate

A boolean determining whether names in the argument list are validated against formals(dpSet) or formals(dpBatch).

conndectionArgs

A dpConnectionArgs object returned from dpConnectionArgs() describing the location and credentials for the database associated with a set.

Details

A batch is a collection of data points that have been imported from a single data file. For instance, when a data logger is downloaded, it creates a data file with many records, where each record may contain observations of several different metrics. All (or a subset) of those observations can be collected into a batch.

A set is a collections of data batches. A set is a collection of all of the batches stored in a particular database. Thus, each set is always associated with a single database. dpSets and dpBatches are S3 objects built atop lists. dpSet objects contains information about a set and the database associate with the set. dpBatch objects contain information about the batch and the set to which the batch belongs.

Because dpSets (and dpBatches) are lists, information in a dpSet (dpBatch) can be accessed with the $ operator. For instance, given a set named mySet, mySet$db$keys$dataPrimaryKey would return the name of the primary key column for the data table in the database. Generally, tho, the user should not have to investigate the contents of a dpSet (dpBatch) object. Instead, call dpSet() (dpBatch) to create the S3 object, store the object in a variable, and pass the variable to other functions.

Note that datapuppy makes a few assumptions about the database into which batches are loaded. There must be a "batch" table where each record contains infomation about a batch loaded into the database. There must be a "data" table where each record is a single datum. And there must be a "types" table where each record describes a metric that can be assoicated with any datum (e.g., the datum represents a temperature reading, a wind speed, a stock price, or whatever metrics are traked by the database). The columns in the batchesTable, dataTable, and typesTable of the database have some requirements:

1) Each table must contain an autonumber field that is designated as the primary key for the table

2) the dataTable must contain at least two foreign keys, one that refers to and is named the same as the primary key column of the batchesTable, and one that refers to and is named the same as the primary key column of the typesTable. In this way, each datum is associated with a batch that describes the source of the datum and with a datatype that describes what the number represents.

dpSet() (dpBatch()) creates a file called "dpSet.rData" ("dpBatch.rData") in the location specified by setPath (file.path(set$setPath, batchRecord$batchName)). The file contains a list of the arguments passed to dpSet() (dpBatch()). When a set (batch) is reloaded from disk using dpLoadSet() (codedpLoadBatch()), the list of the arguments is used to recreate the dpSet (dpBatch).

Value

dpBatch() returns a dpBatch object and saves the arguments passed to dpBatch() in a subdirectory of the associated set's directory. The subdirectory is named according to the value in batchRecord that has a name equal to set$db$batchNameColumnName.

is.dpBatch() returns TRUE if x is a dpBatch object.

dpSet() (dpBatch() returns a dpSet (dpBatch) object that describes the set (batch). This object should be assigned to a variable so that it can be passed to other datapuppy functions. The arguments passed to dpSet() (dpBatch()) are also saved in a file on disk (see Details, above).

dpLoadSet() (dpLoadBatch()) creates a dpSet (dpBatch) object described by the arguments stored in the "dpSet.rData" ("dpBatch.rData") file.


FluvialLandscapeLab/datapuppy documentation built on May 6, 2019, 5:05 p.m.