TEST.diffmean: Tests on the difference between means

View source: R/UBStats_Main_Visible_ALL_202406.R

TEST.diffmeanR Documentation

Tests on the difference between means

Description

TEST.diffmean() tests hypotheses on the difference between the means of two independent or paired populations.

Usage

TEST.diffmean(
  x,
  y,
  type = "independent",
  mdiff0 = 0,
  alternative = "two.sided",
  sigma.x = NULL,
  sigma.y = NULL,
  by,
  sigma.by = NULL,
  sigma.d = NULL,
  var.test = FALSE,
  digits = 2,
  force.digits = FALSE,
  use.scientific = FALSE,
  data,
  ...
)

Arguments

x, y

Unquoted strings identifying the numeric variables with the same length whose means have to be compared. x and y can be the names of vectors in the workspace or the names of columns in the data frame specified in the data argument. It is possible to use a mixed specification (e.g, one vector and one column in data).

type

A length-one character vector specifying the type of samples. Allowed values are "independent" or "paired".

mdiff0

Numeric value that specifies the null hypothesis to test for (default is 0).

alternative

A length-one character vector specifying the direction of the alternative hypothesis. Allowed values are "two.sided" (difference between populations' means differs from mdiff0; default), or "less" (difference between populations' means is lower than mdiff0), or "greater" (difference between populations' means is higher than mdiff0).

sigma.x, sigma.y

Optional numeric values specifying the possibly known populations' standard deviations (when x and y are specified). If NULL (default) standard deviations are estimated using the data.

by

Optional unquoted string, available only when type = "independent", identifying a variable (of any type), defined same way as x, taking only two values used to split x into two independent samples. Given the two ordered values taken by by (alphabetical or numerical order, or order of the levels for factors), say by1 and by2, hypotheses are tested on the difference between the populations means in the by1- and in the by2-group. Note that only one between y and by can be specified.

sigma.by

Optional numeric value specifying the possibly known standard deviations for the two independent samples identified via by (when x and by are specified). sigma.by can be a single value indicating the same standard deviation in the two by-groups, or a vector with two values, specifying the standard deviations in the two by-groups. To avoid errors, in the latter case the vector should be named, with names coinciding with the two levels of by.

sigma.d

Optional numeric value specifying the possibly known standard deviation of the difference when samples are paired.

var.test

Logical value indicating whether to run a test on the equality of variance for two (independent) samples or not (default).

digits

Integer value specifying the number of decimals used to round statistics; default to 2. If the chosen rounding formats some non-zero values as zero, the number of decimals is increased so that all values have at least one significant digit, unless the argument force.digits is set to TRUE.

force.digits

Logical value indicating whether reported values should be forcedly rounded to the number of decimals specified in digits even if non-zero values are rounded to zero (default to FALSE).

use.scientific

Logical value indicating whether numbers in tables should be displayed using scientific notation (TRUE); default to FALSE.

data

An optional data frame containing x and/or y or by. If not found in data, the variables are taken from the environment from which TEST.diffmean() is called.

...

Additional arguments to be passed to low level functions.

Value

A table reporting the results of the test on the difference between the populations' means. For independent samples in the case of unknown variances the test is run both under the assumption that the variances are equal and under the assumption that they differ, using percentiles from both the normal and the Student's t distribution.

Author(s)

Raffaella Piccarreta raffaella.piccarreta@unibocconi.it

See Also

CI.diffmean() to build confidence intervals for the difference between two populations' means.

Examples

data(MktDATA, package = "UBStats")

# Independent samples (default type), UNKNOWN variances
#  Bilateral test on difference between means of males and females
#  - Using x,y: build vectors with data on the two groups
AOV_M <- MktDATA$AOV[MktDATA$Gender == "M"]
AOV_F <- MktDATA$AOV[MktDATA$Gender == "F"]
TEST.diffmean(x = AOV_M, y = AOV_F, mdiff0 = 0)
#  - Using x,by: groups identified by ordered levels of by
TEST.diffmean(x = AOV, by = Gender, mdiff0 = 0, data = MktDATA)
#    Since order is F, M, hypothesis are on mean(F) - mean(M)
#    To test hypotheses on mean(M) - mean(F)
Gender.R <- factor(MktDATA$Gender, levels = c("M", "F"))
TEST.diffmean(x = AOV, by = Gender.R , mdiff0 = 0, 
              data = MktDATA)
#  - Testing also hypotheses on equality of unknown variances
TEST.diffmean(x = AOV_M, y = AOV_F, mdiff0 = 0, 
              var.test = TRUE)

#  - Output results: test on differences
out.test_diffM<-TEST.diffmean(x = AOV_M, y = AOV_F)
#  - Output results: list with both test on means and variances
out.test_diffM.V<-TEST.diffmean(x = AOV_M, y = AOV_F, var.test = TRUE)

# Independent samples (default type), KNOWN variances
#  Test hypotheses on the difference between means of males and females
#  - Using x,y: build vectors with data on the two groups
AOV_M <- MktDATA$AOV[MktDATA$Gender == "M"]
AOV_F <- MktDATA$AOV[MktDATA$Gender == "F"]
TEST.diffmean(x = AOV_M, y = AOV_F, mdiff0 = 10, 
              alternative = "greater", sigma.x = 10, sigma.y = 20)
#  - Using x,by: groups identified by ordered levels of by
#    Adjust considering the ordering of levels
TEST.diffmean(x = AOV, by = Gender, mdiff0 = -10,
              alternative = "less",
              sigma.by = c("M" = 10, "F"=20), data = MktDATA)
#    To change the sign, order levels as desired
Gender.R <- factor(MktDATA$Gender, levels = c("M", "F"))
TEST.diffmean(x = AOV, by = Gender.R, mdiff0 = 10,
              alternative = "greater",
              sigma.by = c("M" = 10, "F"=20), data = MktDATA)
#  - Output results 
out.test_diffM<-TEST.diffmean(x = AOV_M, y = AOV_F, mdiff0 = 10, 
                              alternative = "greater", 
                              sigma.x = 10, sigma.y = 20)

# Paired samples: UNKNOWN variances
# - Default settings
TEST.diffmean(x = NStore_Purch, y = NWeb_Purch, 
              type = "paired", 
              mdiff0 = 1.5, alternative = "greater", data=MktDATA)
# Paired: KNOWN variances
TEST.diffmean(x = NStore_Purch, y = NWeb_Purch,
              type = "paired", mdiff0 = 1.5, alternative = "greater",
              sigma.d = 2, data = MktDATA)
#  - Output results 
out.test_diffM<-TEST.diffmean(x = NStore_Purch, 
                              y = NWeb_Purch,
                              type = "paired", mdiff0 = 1.5, alternative = "greater",
                              sigma.d = 2, data = MktDATA)

# Arguments force.digits and use.scientific
#  An input variable taking very low values
SmallX<-MktDATA$AOV/50000
SmallX_M <- SmallX[MktDATA$Gender == "M"]
SmallX_F <- SmallX[MktDATA$Gender == "F"]
#  - Default output
TEST.diffmean(x = SmallX_M, y = SmallX_F)
#  - Request to use the exact number of digits (default, 2)
TEST.diffmean(x = SmallX_M, y = SmallX_F,
              force.digits = TRUE)
#  - Request to allow scientific notation
TEST.diffmean(x = SmallX_M, y = SmallX_F, 
              use.scientific = TRUE)


UBStats documentation built on Sept. 11, 2024, 6:52 p.m.