statsfreq: Descriptive statistics of a frequency table.

Description Usage Arguments Details Value Note References See Also Examples

Description

Computes the descriptive statistics of a frequency table.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
meanfreq(data, freq = NULL)

.meanfreq(tfq)

quantilefreq(data, probs = c(0, 0.25, 0.5, 0.75, 1), freq = NULL)

.quantilefreq(tfq, probs = c(0, 0.25, 0.5, 0.75, 1))

covfreq(data, freq = NULL)

.covfreq(tfq)

sdfreq(data, freq = NULL)

.sdfreq(tfq)

scalefreq(data, freq = NULL)

.scalefreq(tfq)

corfreq(data, freq = NULL)

.corfreq(tfq)

Arguments

data

any object that can be processed by link{tablefreq}.

freq

a single name of the variable specifying frequency weights.

tfq

a tablefreq object, or a matrix, data frame with the last column being the frequency wweights

probs

A vector of quantiles to compute. Default is 0 (min), .25, .5, .75, 1 (max).

Details

These functions compute various weighted versions of standard estimators.

meanfreq, sdfreq, quantilefreq, covfreq, corfreq estimate the mean, standard desviation, quantiles, covariances and correlation matrix, respectively. In this last two cases, resulst are equals to the pairwise.complete.obs option of cov and cor of the desaggregated data, respectively.

Missing values or cases with non-positive frequency weights are automatically removed.

If freq is not null, the data set must contain a column with that name. These variable are removed from the data set in order to calculate the descriptive statistics.

The dot versions are intented to be used when programing. The tfq may be a tablefreq object or a matrix or a data frame with the last column being the frequency weights.

The algorithm of quantilefreq are based on wtd.quantile.

The intern functions are for programming purpose. It does not check the data.

Value

meanfreq and sdfreq return vectors. quantilefreq returns a vector or matrix. covfreq and corfreq the estimated covariance matrix and correlation matrix, respectively. scalefreq return a data frame or matrix

Note

The author would like to thank Prof. Frank E. Harrell Jr. who allowed the reutilisation of part of his code.

References

Andrews, Chris, https://stat.ethz.ch/pipermail/r-help/2014-March/368350.html

See Also

tablefreq, wtd.quantile

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
if(require(hflights)) {
  meanfreq(hflights[,c("ArrDelay","DepDelay")])
  sdfreq(hflights[,c("ArrDelay","DepDelay")])
  corfreq(hflights[,c("ArrDelay","DepDelay")])
}

tfq <- tablefreq(iris$Sepal.Length)
tfq

meanfreq(iris$Sepal.Length)
meanfreq(tfq,freq="freq")
.meanfreq(tfq)

dat <- iris[,1:4]
quantilefreq(dat)
corfreq(dat)

tfq <- tablefreq(dat)
.meanfreq(tfq)
.quantilefreq(tfq)
.corfreq(tfq)

## dplyr integration
library(dplyr)
tfq  %>% 
  summarise( mean = .meanfreq(cbind(Sepal.Length,freq)),
            sd = .sdfreq(cbind(Sepal.Length,freq)))

tfq <- tablefreq(iris)
tfq %>% group_by(Species) %>% 
  summarise( mean = .meanfreq(cbind(Sepal.Length,freq)),
            sd = .sdfreq(cbind(Sepal.Length,freq)))

Example output

Loading required package: hflights
          ArrDelay  DepDelay
ArrDelay 1.0000000 0.9292181
DepDelay 0.9292181 1.0000000
# A tibble: 35 x 2
     tbl  freq
   <dbl> <int>
 1   4.3     1
 2   4.4     3
 3   4.5     1
 4   4.6     4
 5   4.7     2
 6   4.8     5
 7   4.9     6
 8   5      10
 9   5.1     9
10   5.2     4
11   5.3     1
12   5.4     6
13   5.5     7
14   5.6     6
15   5.7     8
16   5.8     7
17   5.9     3
18   6       6
19   6.1     6
20   6.2     4
21   6.3     9
22   6.4     7
23   6.5     5
24   6.6     2
25   6.7     8
26   6.8     3
27   6.9     4
28   7       1
29   7.1     1
30   7.2     3
31   7.3     1
32   7.4     1
33   7.6     1
34   7.7     4
35   7.9     1
[1] 5.843333
[1] 5.843333
[1] 5.843333
               0%   25%  50%   75% 100%
Sepal.Length  4.3 5.100  5.8 6.400  7.9
Sepal.Width   3.0 3.725  2.7 3.050  3.8
Petal.Length  1.1 1.500  4.0 5.200  6.4
Petal.Width   0.1 0.375  1.1 1.675  2.0
             Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length    1.0000000  -0.1175698    0.8717538   0.8179411
Sepal.Width    -0.1175698   1.0000000   -0.4284401  -0.3661259
Petal.Length    0.8717538  -0.4284401    1.0000000   0.9628654
Petal.Width     0.8179411  -0.3661259    0.9628654   1.0000000
$Sepal.Length
[1] 5.843333

$Sepal.Width
[1] 3.057333

$Petal.Length
[1] 3.758

$Petal.Width
[1] 1.199333

               0%   25%  50%   75% 100%
Sepal.Length  4.3 5.100  5.8 6.400  7.9
Sepal.Width   3.0 3.725  2.7 3.050  3.8
Petal.Length  1.1 1.500  4.0 5.200  6.4
Petal.Width   0.1 0.375  1.1 1.675  2.0
             Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length    1.0000000  -0.1175698    0.8717538   0.8179411
Sepal.Width    -0.1175698   1.0000000   -0.4284401  -0.3661259
Petal.Length    0.8717538  -0.4284401    1.0000000   0.9628654
Petal.Width     0.8179411  -0.3661259    0.9628654   1.0000000

Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

# A tibble: 1 x 2
   mean    sd
  <dbl> <dbl>
1  5.84 0.828
# A tibble: 3 x 3
  Species     mean    sd
  <fct>      <dbl> <dbl>
1 setosa      5.01 0.352
2 versicolor  5.94 0.516
3 virginica   6.59 0.636

freqweights documentation built on May 29, 2017, 12:01 p.m.