tablefreq: Create a table of frequencies

Description Usage Arguments Details Value Note See Also Examples

View source: R/tablefreq.R

Description

Create a table of frequencies

Usage

1
2
3
4
tablefreq(tbl, vars = NULL, freq = NULL)

## S3 method for class 'tablefreq'
update(object, ...)

Arguments

tbl

an object that can be coerced to a tbl. It must contain all variables in vars and in freq

vars

variables to count unique values of. It may be a character vector

freq

a name of a variable of the tbl object specifying frequency weights. See Details

object

a tablefreq object

...

more data

Details

Based on the count function, it can also work with matrices or external data bases and the result may be updated.

It creates a frequency table of the data, or just of the columns specified in vars.

If you provide a freq formula, the cases are weighted by the result of the formula. Any variables in the formula are removed from the data set. If the data set is a matrix, the freq formula is a classic R formula. Otherwise, the expresion of freq is treated as a mathematical expression.

This function uses all the power of dplyr to create frequency tables. The main advantage of this function is that it works with on-disk data stored in data bases, whereas count only works with in-memory data sets.

In general, in order to use the functions of this package, the frequency table obtained by this function should fit in memory. Otherwise you must use the 'chunk' versions (link{clarachunk}, link{biglmfreq}).

The code of this function are adapted from a wish list of the devel page of dplyr (See references). Prof. Wickham also provides a nice introduction about how to use it with databases.

Value

A tbl object a with label and freq columns. When it is possible, the last column is named freq and it represents the frequency counts of the cases. This object of class tablefreq, has two attributes:

freq

the weighting variable used to create the frequency table

colweights

Name of the column with the weighting counts

Note

The author would like to thank Prof. Hadley Wickham who allowed the reutilisation of part of his code. When using the update function, be careful with non-integer weights: The precision of the final weights may be wrong due to the multiple sums.

See Also

count, tbl

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
tablefreq(iris)
tablefreq(iris, c("Sepal.Length","Species"))
a <- tablefreq(iris,freq="Sepal.Length")
tablefreq(a, freq="Sepal.Width")

library(dplyr)
iris %>% tablefreq("Species")

tfq <- tablefreq(iris[,c(1:2)])

chunk1 <- iris[1:10,c(1:2)]
chunk2 <- iris[c(11:20),]
chunk3 <- iris[-c(1:20),]
a <- tablefreq(chunk1)
a <- update(a,chunk2)
a <- update(a,chunk3)
a

## Not run: 

## External databases
library(dplyr)
if(require(RSQLite)){
  hflights_sqlite <- tbl(hflights_sqlite(), "hflights")
  hflights_sqlite
  tbl_vars(hflights_sqlite)
  tablefreq(hflights_sqlite,vars=c("Year","Month"),freq="DayofMonth")
}

##
## Graphs
##
if(require(ggplot2) && require(hflights)){
  library(dplyr)

  ## One variable
  ## Bar plot
  tt <- as.data.frame(tablefreq(hflights[,"ArrDelay"]))
  p <- ggplot() + geom_bar(aes(x=x, y=freq), data=tt, stat="identity")
  print(p)

  ## Histogram
  p <- ggplot() + geom_histogram(aes(x=x, weight= freq), data = tt)
  print(p)

  ## Density
  tt <- tt[complete.cases(tt),] ## remove missing values
  tt$w <- tt$freq / sum(tt$freq) ## weights must sum 1
  p <- ggplot() + geom_density(aes(x=x, weight= w), data = tt)
  print(p)

  ##
  ## Two distributions
  ##
  ## A numeric and a factor variable
  td <- tablefreq(hflights[,c("TaxiIn","Origin")])
  td <- td[complete.cases(td),]

  ## Bar plot
  p <- ggplot() + geom_bar(aes(x=TaxiIn, weight= freq, colour = Origin),
                           data = td, position ="dodge")
  print(p)

  ## Density
  ## compute the relative frequencies for each group
  td <- td %>% group_by(Origin) %>%
               mutate( ngroup= sum(freq), wgroup= freq/ngroup)
  p <- ggplot() + geom_density(aes(x=TaxiIn, weight=wgroup, colour = Origin),
                               data = td)
  print(p)

  ## For each group, plot its values
  p <- ggplot() + geom_point(aes(x=Origin, y=TaxiIn, size=freq),
                             data = td, alpha= 0.6)
  print(p)

  ## Two numeric variables
  tc <- tablefreq(hflights[,c("TaxiIn","TaxiOut")])
  tc <- tc[complete.cases(tc),]
  p <- ggplot() + geom_point(aes(x=TaxiIn, y=TaxiOut, size=freq),
                             data = tc, color = "red", alpha=0.5)
  print(p)

  ## Two factors
  tf <- tablefreq(hflights[,c("UniqueCarrier","Origin")])
  tf <- tf[complete.cases(tf),]

  ## Bar plot
  p <- ggplot() + geom_bar(aes(x=Origin, fill=UniqueCarrier, weight= freq),
                           data = tf)
  print(p)
}

## End(Not run)

freqweights documentation built on May 29, 2017, 12:01 p.m.