plotCont: graphical comparison of the estimated distributions for the...
In StatMatch: Statistical Matching or Data Fusion

plotCont

R Documentation

graphical comparison of the estimated distributions for the same continuous variable.

Description

Compares graphically the estimated distributions for the same continuous variable using data coming from two different data sources.

Usage

plotCont(data.A, data.B, xlab.A, xlab.B=NULL, w.A=NULL, w.B=NULL,
         type="density", ref=FALSE)

Arguments

`data.A`	A dataframe or matrix containing the variable of interest `xlab.A` and eventual associated survey weights `w.A`.
`data.B`	A dataframe or matrix containing the variable of interest `xlab.B` and eventual associated survey weights `w.B`.
`xlab.A`	Character string providing the name of the variable in `data.A` whose distribution should be represented graphically and compared with that estimated from `data.B`.
`xlab.B`	Character string providing the name of the variable in `data.B` whose distribution should be represented graphically and compared with that estimated from `data.A`. If `xlab.B=NULL` (default) then it assumed `xlab.B=xlab.A`.
`w.A`	Character string providing the name of the optional weighting variable in `data.A` that, in case, should be used to estimate the distribution of `xlab.A`
`w.B`	Character string providing the name of the optional weighting variable in `data.B` that, in case, should be used to estimate the distribution of `xlab.B`
`type`	A character string indicating the type of graphical representation that should be used to compare the estimated distributions of `xlab.A` and `xlab.B`. By default (`type="density"`) density plots are used. Other possible options are “ecdf”, “qqplot”, “qqshift” and “hist”. See Details for more information.
`ref`	Logical, indicating whether the distribution estimated from `data.B` should be considered the reference or not. Default `ref=FALSE`. when Default `ref=TRUE` the estimation of the histograms, the density and the empirical cumulative distribution function are guided by data in `data.B`

Details

This function graphically compares the distribution of the same variable, but estimated from data from two different data sources. The graphical comparison can be performed in several ways. With type="hist", the continuous variable is categorized and the corresponding histograms estimated from data.A and data.B are compared. If present, the weights are used to estimate the relative frequencies. Note that the breaks to categorize the variable are decided according to the Freedman-Diaconis rule (nclass), and in this case with ref=TRUE the IQR is estimated from data.B alone, while with ref=FALSE it is estimated by joining the two data sources.

With type="density" the density plots are drawn; when available the weights are used in the estimation of the density based on the histograms (as suggested by Bellhouse and Stafford, 1999). Whentype="ecdf" the comparison relies on the empirical cumulative distribution function, that can be estimated considering the weights. Note that when ref=TRUE the estimation of the density and the empirical cumulative distribution are guided by the data in data.B.

The comparison is based on percentiles with type="qqplot" and type="qqshift". In the first case, the function draws a scatterplot (red dots) of the estimated percentiles of xlab.A against those of xlab.B; the dashed line indicates the ideal situation of equality of percentiles (points lying on the line). When type="qqshift", the scatterplot refers to (percentiles.A - percentiles.B) vs. percentiles.B; in this case, the points lying on the horizontal line passing through 0 indicate equality (difference equal to 0). Note that the number of estimated percentiles depends on the minimum between the two sample sizes. Only quartiles are calculated if min(n.A, n.B)<=50; quintiles are estimated if min(n.A, n.B)>50 and min(n.A, n.B)<=150; deciles are estimated if min(n. A, n.B)>150 and min(n.A, n.B)<=250; finally, quantiles for probs=seq(from = 0.05,to = 0.95,by = 0.05) are estimated when min(n.A, n.B)>250. If survey weights are available (indicated by w.A and/or w.B), they are used to estimate the quantiles by calling the function wtd.quantile in the package Hmisc.

Value

The required graphical representation is drawn using the ggplot2 facilities.

Author(s)

Marcello D'Orazio mdo.statmatch@gmail.com

References

Bellhouse D.R. and J. E. Stafford (1999). “Density Estimation from Complex Surveys”. Statistica Sinica, 9, 407–424.

Examples


# plotCont(data.A = samp.A, data.B = samp.B, xlab.A="age")
# plotCont(data.A = samp.A, data.B = samp.B, xlab.A="age", w.A = "ww")

StatMatch documentation built on April 3, 2025, 10:03 p.m.

StatMatch index

Package overview

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

StatMatch
Statistical Matching or Data Fusion

plotCont: graphical comparison of the estimated distributions for the...
In StatMatch: Statistical Matching or Data Fusion

graphical comparison of the estimated distributions for the same continuous variable.

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to plotCont in StatMatch...

R Package Documentation

Browse R Packages

We want your feedback!

StatMatch Statistical Matching or Data Fusion

plotCont: graphical comparison of the estimated distributions for the... In StatMatch: Statistical Matching or Data Fusion

graphical comparison of the estimated distributions for the same continuous variable.

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to plotCont in StatMatch...

R Package Documentation

Browse R Packages

We want your feedback!

StatMatch
Statistical Matching or Data Fusion

plotCont: graphical comparison of the estimated distributions for the...
In StatMatch: Statistical Matching or Data Fusion