Description Usage Arguments Details Value Note See Also Examples
Create a table of frequencies
1 2 3 4 |
tbl |
an object that can be coerced to a |
vars |
variables to count unique values of. It may be a character vector |
freq |
a name of a variable of the tbl object specifying frequency weights. See Details |
object |
a |
... |
more data |
Based on the count
function,
it can also work with matrices or external data bases and the result may be updated.
It creates a frequency table of the data
, or just of the columns specified in vars
.
If you provide a freq
formula, the cases are weighted by the result of the formula. Any variables in the formula are removed from the data set. If the data set is a matrix, the freq
formula is a classic R formula. Otherwise, the expresion of freq
is treated as a mathematical expression.
This function uses all the power of dplyr
to create frequency tables. The main advantage of this function is that it works with on-disk data stored in data bases, whereas count
only works with in-memory data sets.
In general, in order to use the functions of this package, the frequency table obtained by this function should fit in memory. Otherwise you must use the 'chunk' versions (link{clarachunk}
, link{biglmfreq}
).
The code of this function are adapted from a wish list of the devel page of dplyr
(See references). Prof. Wickham also provides a nice introduction about how to use it with databases.
A tbl
object a with label and freq columns. When it is possible, the last column is named freq
and it represents the frequency counts of the cases. This object of class tablefreq
, has two attributes:
freq |
the weighting variable used to create the frequency table |
colweights |
Name of the column with the weighting counts |
The author would like to thank Prof. Hadley Wickham who allowed the reutilisation of part of his code. When using the update function, be careful with non-integer weights: The precision of the final weights may be wrong due to the multiple sums.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 | tablefreq(iris)
tablefreq(iris, c("Sepal.Length","Species"))
a <- tablefreq(iris,freq="Sepal.Length")
tablefreq(a, freq="Sepal.Width")
library(dplyr)
iris %>% tablefreq("Species")
tfq <- tablefreq(iris[,c(1:2)])
chunk1 <- iris[1:10,c(1:2)]
chunk2 <- iris[c(11:20),]
chunk3 <- iris[-c(1:20),]
a <- tablefreq(chunk1)
a <- update(a,chunk2)
a <- update(a,chunk3)
a
## Not run:
## External databases
library(dplyr)
if(require(RSQLite)){
hflights_sqlite <- tbl(hflights_sqlite(), "hflights")
hflights_sqlite
tbl_vars(hflights_sqlite)
tablefreq(hflights_sqlite,vars=c("Year","Month"),freq="DayofMonth")
}
##
## Graphs
##
if(require(ggplot2) && require(hflights)){
library(dplyr)
## One variable
## Bar plot
tt <- as.data.frame(tablefreq(hflights[,"ArrDelay"]))
p <- ggplot() + geom_bar(aes(x=x, y=freq), data=tt, stat="identity")
print(p)
## Histogram
p <- ggplot() + geom_histogram(aes(x=x, weight= freq), data = tt)
print(p)
## Density
tt <- tt[complete.cases(tt),] ## remove missing values
tt$w <- tt$freq / sum(tt$freq) ## weights must sum 1
p <- ggplot() + geom_density(aes(x=x, weight= w), data = tt)
print(p)
##
## Two distributions
##
## A numeric and a factor variable
td <- tablefreq(hflights[,c("TaxiIn","Origin")])
td <- td[complete.cases(td),]
## Bar plot
p <- ggplot() + geom_bar(aes(x=TaxiIn, weight= freq, colour = Origin),
data = td, position ="dodge")
print(p)
## Density
## compute the relative frequencies for each group
td <- td %>% group_by(Origin) %>%
mutate( ngroup= sum(freq), wgroup= freq/ngroup)
p <- ggplot() + geom_density(aes(x=TaxiIn, weight=wgroup, colour = Origin),
data = td)
print(p)
## For each group, plot its values
p <- ggplot() + geom_point(aes(x=Origin, y=TaxiIn, size=freq),
data = td, alpha= 0.6)
print(p)
## Two numeric variables
tc <- tablefreq(hflights[,c("TaxiIn","TaxiOut")])
tc <- tc[complete.cases(tc),]
p <- ggplot() + geom_point(aes(x=TaxiIn, y=TaxiOut, size=freq),
data = tc, color = "red", alpha=0.5)
print(p)
## Two factors
tf <- tablefreq(hflights[,c("UniqueCarrier","Origin")])
tf <- tf[complete.cases(tf),]
## Bar plot
p <- ggplot() + geom_bar(aes(x=Origin, fill=UniqueCarrier, weight= freq),
data = tf)
print(p)
}
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.