dataSets: Data Set Objects

data.setR Documentation

Data Set Objects


"data.set" objects are collections of "item" objects, with similar semantics as data frames. They are distinguished from data frames so that coercion by leads to a data frame that contains only vectors and factors. Nevertheless most methods for data frames are inherited by data sets, except for the method for the within generic function. For the within method for data sets, see the details section.

Thus data preparation using data sets retains all informations about item annotations, labels, missing values etc. While (mostly automatic) conversion of data sets into data frames makes the data amenable for the use of R's statistical functions.

dsView is a function that displays data sets in a similar manner as View displays data frames. (View works with data sets as well, but changes them first into data frames.)


data.set(...,row.names = NULL, check.rows = FALSE, check.names = TRUE,
    stringsAsFactors = FALSE, document = NULL), row.names=NULL, ...)
## S4 method for signature 'list',row.names=NULL,...)
## S3 method for class 'data.set', row.names = NULL, optional = FALSE, ...)
## S4 method for signature 'data.set'
within(data, expr, ...)


## S4 method for signature 'data.set'
## S4 method for signature 'data.set'



For the data.set function several vectors or items, for within further, ignored arguments.

row.names, check.rows, check.names, stringsAsFactors, optional

arguments as in data.frame or, respectively.


NULL or an optional character vector that contains documenation of the data.


for, any object; for,...) and dsView(x) a "data.set" object.


a data set, that is, an object of class "data.set".


an expression, or several expressions enclosed in curly braces.


integer; the number of rows to be shown by head or tail


The method for data sets is just a copy of the method for list. Consequently, all items in the data set are coerced in accordance to their measurement setting, see as.vector,item-method and measurement.

The within method for data sets has the same effect as the within method for data frames, apart from two differences: all results of the computations are coerced into items if they have the appropriate length, otherwise, they are automatically dropped.

Currently only one method for the generic function is defined: a method for "importer" objects.


data.set and the within method for data sets returns a "data.set" object, returns a logical value, and returns a data frame.


Data <- data.set(
          vote = sample(c(1,2,3,8,9,97,99),size=300,replace=TRUE),
          region = sample(c(rep(1,3),rep(2,2),3,99),size=300,replace=TRUE),
          income = exp(rnorm(300,sd=.7))*2000

Data <- within(Data,{
  description(vote) <- "Vote intention"
  description(region) <- "Region of residence"
  description(income) <- "Household income"
  wording(vote) <- "If a general election would take place next tuesday,
                    the candidate of which party would you vote for?"
  wording(income) <- "All things taken into account, how much do all
                    household members earn in sum?"
    measurement(x) <- "nominal"
  measurement(income) <- "ratio"
  labels(vote) <- c(
                    Conservatives         =  1,
                    Labour                =  2,
                    "Liberal Democrats"   =  3,
                    "Don't know"          =  8,
                    "Answer refused"      =  9,
                    "Not applicable"      = 97,
                    "Not asked in survey" = 99)
  labels(region) <- c(
                    England               =  1,
                    Scotland              =  2,
                    Wales                 =  3,
                    "Not applicable"      = 97,
                    "Not asked in survey" = 99)
    annotation(x)["Remark"] <- "This is not a real survey item, of course ..."
  missing.values(vote) <- c(8,9,97,99)
  missing.values(region) <- c(97,99)

  # These to variables do not appear in the
  # the resulting data set, since they have the wrong length.
  junk1 <- 1:5
  junk2 <- matrix(5,4,4)
# Since data sets may be huge, only a
# part of them are 'show'n

## Not run: 

# If we insist on seeing all, we can use 'print' instead

## End(Not run)


## Not run: 
# If we want to 'View' a data set we can use 'dsView'
# Works also, but changes the data set into a data frame first:

## End(Not run)


EnglandData <- subset(Data,region == "England")

xtabs(~vote+region,data=within(Data, vote <- include.missings(vote)))

memisc documentation built on March 31, 2023, 7:29 p.m.