simpsons_paradox: Create a Simpson's Paradox

Description Usage Arguments Details Author(s) Examples

Description

Create a Simpson's Paradox

Usage

1
2
3
simpsons_paradox(r_tot, r_sub, ngroups = NULL, nsubgroups = 50,
  means_subgroups = NULL, sd_subgroups = 1, sd_subgroups_y = NULL,
  scaling = 1, ymin = NULL)

Arguments

r_tot

Desired Pearson correlation of the overall data. Will not always be met, see details.

r_sub

Pearson correlation within the subgroups.

ngroups

Number of subgroups present in the data. Must not be provided when a vector of subgroup means is used instead (see means_subgroups and details).

nsubgroups

Number of cases in each subgroup. Defaults to 50.

means_subgroups

A vector providing the x-mean for each subgroup. Must not be used when ngroups is used instead (see details).

sd_subgroups

Standard deviation for the x value of each subgroup. Can be either a vector containing standard deviations for each subgroup or an integer. In the latter case, each subgroup will have the standard deviation specified by the integer. Defaults to sd = 1 for each group.

sd_subgroups_y

Standard deviation for the y value of each subgroup. Can be either a vector containing standard deviations for each subgroup or an integer. In the latter case, each subgroup will have the standard deviation specified by the integer. Per default identical to sd_subgroups.

scaling

Argument determining how much the x- and y-coordinates of subgroups should be shifted in order to create the desired overall correlation. The larger scaling is, the larger will be the overall correlation. That is, if scaling is set to 0, the subgroup coordinates will not be shifted at all. Note that the presence of subgroups will be easier to notice if scaling is large. Defaults to 1.

ymin

Optional argument determining the smallest y-coordinate present in your data.

Details

Creates a Simpson's Paradox by creating a correlation within a number of subgroups and then altering the groups' y-coordinates to create a correlation of the overall data. For the first step, the function sim_cor_param is used. The second step relies on the function sim_cor_vec. If the two correlations specified for each step (r_sub and r_tot, respectively) have opposite directions, a Simpson's Paradox is created.

One of either ngroups or mean_subgroups needs to be provided. That is, if the number of subgroups is specified via ngroups, the x-means for each group will be 1:ngroups. If the x-means for each subgroup are specified via the vector means_subgroups, the number of subgroups equals the number of means provided in the means_subgroups. Note that you cannot specify the y-means of each group since they will be altered in order to create the desired overall correlation. You can, however, specify the smallest y-coordinate present in your data via ymin. All data points will be shifted along the y-axis, perserving the correlations of the overall data and the subgroups. That way, you have some degree of control over the range of the y-axis.

Based on linear correlations (pearson).

Returns a data.frame that holds the x and y coordinates of the data, a group column containing the subgroup each case belongs to and the correlation within each subgroup.

Note that due to the correlations within subgroups, the overall correlation as specified in r_tot will not always be achieved. You can toy around with the group mean parameters and the scaling to get more satisfying results, but when subgroup correlations and overall correlation differ widely, this will be harder to achieve.

Author(s)

Juli Tkotz juliane.tkotz@hhu.de

Examples

1
2
3
4
5
simpson <- simpsons_paradox(r_tot = -.8, r_sub = .4, ngroups = 5, nsubgroups = 40,
                            scaling = 2)

simpson2 <- simpsons_paradox(r_tot = .4, r_sub = .-7, ngroups = 4, nsubgroups = 100,
                             scaling = 3, ymin = 10)

einGlasRotwein/cmonCorr documentation built on May 6, 2019, 8:29 p.m.