CI.diffmean: Confidence intervals for the difference between means

View source: R/UBStats_Main_Visible_ALL_202406.R

CI.diffmeanR Documentation

Confidence intervals for the difference between means

Description

CI.diffmean() builds confidence intervals for the difference between the means of two independent or paired populations.

Usage

CI.diffmean(
  x,
  y,
  type = "independent",
  sigma.x = NULL,
  sigma.y = NULL,
  conf.level = 0.95,
  by,
  sigma.by = NULL,
  sigma.d = NULL,
  var.test = FALSE,
  digits = 2,
  force.digits = FALSE,
  use.scientific = FALSE,
  data,
  ...
)

Arguments

x, y

Unquoted strings identifying two numeric variables with the same length whose means have to be compared. x and y can be the names of vectors in the workspace or the names of columns in the data frame specified in the data argument. It is possible to use a mixed specification (e.g, one vector and one column in data).

type

A length-one character vector specifying the type of samples. Allowed values are "independent" or "paired".

sigma.x, sigma.y

Optional numeric values specifying the possibly known populations' standard deviations (when x and y are specified). If NULL (default) standard deviations are estimated using the data.

conf.level

Numeric value specifying the required confidence level; default to 0.95.

by

Optional unquoted string, available only when type = "independent", identifying a variable (of any type), defined same way as x, taking only two values used to split x into two independent samples. Given the two ordered values taken by by (alphabetical or numerical order, or order of the levels for factors), say by1 and by2, the confidence interval is built for the difference between the populations means in the by1- and in the by2-group. Note that only one between y and by can be specified.

sigma.by

Optional numeric value specifying the possibly known standard deviations for the two independent samples identified via by (when x and by are specified). sigma.by can be a single value indicating the same standard deviation in the two by-groups, or a vector with two values, specifying the standard deviations in the two by-groups. To avoid errors, in the latter case the vector should be named, with names coinciding with the two levels of by.

sigma.d

Optional numeric value specifying the possibly known standard deviation of the difference when samples are paired.

var.test

Logical value indicating whether to run a test on the equality of variance for two (independent) samples or not (default).

digits

Integer value specifying the number of decimals used to round statistics; default to 2. If the chosen rounding formats some non-zero values as zero, the number of decimals is increased so that all values have at least one significant digit, unless the argument force.digits is set to TRUE.

force.digits

Logical value indicating whether reported values should be forcedly rounded to the number of decimals specified in digits even if non-zero values are rounded to zero (default to FALSE).

use.scientific

Logical value indicating whether numbers in tables should be displayed using scientific notation (TRUE); default to FALSE.

data

An optional data frame containing x and/or y. If not found in data, the variables are taken from the environment from which CI.diffmean() is called.

...

Additional arguments to be passed to low level functions.

Value

A table reporting the confidence intervals for the difference between the populations' means. For independent samples in the case of unknown variances, the intervals are built both under the assumption that the variances are equal and under the assumption that they differ, using percentiles from both the normal and the Student's t distribution. If

Author(s)

Raffaella Piccarreta raffaella.piccarreta@unibocconi.it

See Also

TEST.diffmean() to test hypotheses on the difference between two populations' means.

Examples

data(MktDATA, package = "UBStats")

# Independent samples (default type), UNKNOWN variances
#  CI for the difference between means of males and females
#  - Using x,y: build vectors with data on the two groups
AOV_M <- MktDATA$AOV[MktDATA$Gender == "M"]
AOV_F <- MktDATA$AOV[MktDATA$Gender == "F"]
CI.diffmean(x = AOV_M, y = AOV_F)
#  - Change confidence level
CI.diffmean(x = AOV_M, y = AOV_F, conf.level = 0.99)
#  - Using x,by: groups identified by ordered levels of by
CI.diffmean(x = AOV, by = Gender, conf.level = 0.99, data = MktDATA)
#    Since order is F, M, CI is for mean(F) - mean(M)
#    To get the interval for mean(M) - mean(F)
Gender.R <- factor(MktDATA$Gender, levels = c("M", "F"))
CI.diffmean(x = AOV, by = Gender.R, conf.level = 0.99,  
            data = MktDATA)
#  - Testing hypotheses on equality of unknown variances
CI.diffmean(x = AOV_M, y = AOV_F, conf.level = 0.99, 
            var.test = TRUE)

#  - Output results: only information on the CI
out.ci_diffM<-CI.diffmean(x = AOV_M, y = AOV_F)
#  - Output results: list with information on CI and test on var
out.ci_diffM.V<-CI.diffmean(x = AOV_M, y = AOV_F, var.test = TRUE)

# Independent samples (default type), KNOWN variances
#  CI for the difference between means of males and females
#  - Using x,y: build vectors with data on the two groups
AOV_M <- MktDATA$AOV[MktDATA$Gender == "M"]
AOV_F <- MktDATA$AOV[MktDATA$Gender == "F"]
CI.diffmean(x = AOV_M, y = AOV_F, 
            sigma.x = 10, sigma.y = 20)
#  - Using x,by: groups identified by ordered levels of by
CI.diffmean(x = AOV, by = Gender, 
            sigma.by = c("M" = 10, "F"=20), data = MktDATA)
#    To change the sign, order levels as desired
Gender.R <- factor(MktDATA$Gender, levels = c("M", "F"))
CI.diffmean(x = AOV, by = Gender.R, 
            sigma.by = c("M" = 10, "F"=20), data = MktDATA)
#  - Output results 
out.ci_diffM<-CI.diffmean(x = AOV_M, y = AOV_F, 
                          sigma.x = 10, sigma.y = 20)

# Paired samples: UNKNOWN variances
# - Default settings
CI.diffmean(x = NStore_Purch, y = NWeb_Purch,
            type = "paired", data=MktDATA)
# - Change confidence level
CI.diffmean(x = NStore_Purch, y = NWeb_Purch,
            type = "paired", conf.level = 0.9, data = MktDATA)
# Paired: KNOWN variances
CI.diffmean(x = NStore_Purch, y = NWeb_Purch,
            type = "paired", conf.level = 0.9, 
            sigma.d = 2, data = MktDATA)
#  - Output results 
out.ci_diffM<-CI.diffmean(x = NStore_Purch, y = NWeb_Purch,
                          type = "paired", conf.level = 0.9, 
                          sigma.d = 2, data = MktDATA)

# Arguments force.digits and use.scientific
#  An input variable taking very low values
SmallX<-MktDATA$AOV/5000
SmallX_M <- SmallX[MktDATA$Gender == "M"]
SmallX_F <- SmallX[MktDATA$Gender == "F"]
# - Default: manages possible excess of rounding
CI.diffmean(x = SmallX_M, y = SmallX_F)
# - Force to the requested nr of digits (default, 2)
CI.diffmean(x = SmallX_M, y = SmallX_F,
            force.digits = TRUE)
# - Allow scientific notation
CI.diffmean(x = SmallX_M, y = SmallX_F, 
            use.scientific = TRUE)


UBStats documentation built on Sept. 11, 2024, 6:52 p.m.