Description Usage Arguments Details Author(s) Examples
Create a Simpson's Paradox
1 2 3 | simpsons_paradox(r_tot, r_sub, ngroups = NULL, nsubgroups = 50,
means_subgroups = NULL, sd_subgroups = 1, sd_subgroups_y = NULL,
scaling = 1, ymin = NULL)
|
r_tot |
Desired Pearson correlation of the overall data. Will not always be met, see details. |
r_sub |
Pearson correlation within the subgroups. |
ngroups |
Number of subgroups present in the data. Must not be
provided when a vector of subgroup means is used
instead (see |
nsubgroups |
Number of cases in each subgroup. Defaults to 50. |
means_subgroups |
A vector providing the x-mean for each subgroup.
Must not be used when |
sd_subgroups |
Standard deviation for the x value of each subgroup. Can be either a vector containing standard deviations for each subgroup or an integer. In the latter case, each subgroup will have the standard deviation specified by the integer. Defaults to sd = 1 for each group. |
sd_subgroups_y |
Standard deviation for the y value of each subgroup.
Can be either a vector containing standard
deviations for each subgroup or an integer. In the
latter case, each subgroup will have the standard
deviation specified by the integer. Per default
identical to |
scaling |
Argument determining how much the x- and y-coordinates of
subgroups should be shifted in order to create the desired
overall correlation. The larger |
ymin |
Optional argument determining the smallest y-coordinate present in your data. |
Creates a Simpson's Paradox by creating a correlation within a number of
subgroups and then altering the groups' y-coordinates to create a
correlation of the overall data. For the first step, the function
sim_cor_param
is used. The second step relies on the
function sim_cor_vec
. If the two correlations specified for
each step (r_sub
and r_tot
, respectively) have opposite
directions, a Simpson's Paradox is created.
One of either ngroups
or mean_subgroups
needs to be provided.
That is, if the number of subgroups is specified via ngroups
, the
x-means for each group will be 1:ngroups
. If the x-means for each
subgroup are specified via the vector means_subgroups
, the number
of subgroups equals the number of means provided in the
means_subgroups
. Note that you cannot specify the y-means of each
group since they will be altered in order to create the desired overall
correlation. You can, however, specify the smallest y-coordinate present
in your data via ymin
. All data points will be shifted along the
y-axis, perserving the correlations of the overall data and the subgroups.
That way, you have some degree of control over the range of the y-axis.
Based on linear correlations (pearson).
Returns a data.frame
that holds the x and y coordinates of the
data, a group
column containing the subgroup each case belongs
to and the correlation within each subgroup.
Note that due to the correlations within subgroups, the overall
correlation as specified in r_tot
will not always be achieved.
You can toy around with the group mean parameters and the scaling to
get more satisfying results, but when subgroup correlations and
overall correlation differ widely, this will be harder to achieve.
Juli Tkotz juliane.tkotz@hhu.de
1 2 3 4 5 | simpson <- simpsons_paradox(r_tot = -.8, r_sub = .4, ngroups = 5, nsubgroups = 40,
scaling = 2)
simpson2 <- simpsons_paradox(r_tot = .4, r_sub = .-7, ngroups = 4, nsubgroups = 100,
scaling = 3, ymin = 10)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.