TEST.diffmean: Tests on the difference between means
In UBStats: Basic Statistics

View source: R/UBStats_Main_Visible_ALL_202406.R

TEST.diffmean

R Documentation

Tests on the difference between means

Description

TEST.diffmean() tests hypotheses on the difference between the means of two independent or paired populations.

Usage

TEST.diffmean(
  x,
  y,
  type = "independent",
  mdiff0 = 0,
  alternative = "two.sided",
  sigma.x = NULL,
  sigma.y = NULL,
  by,
  sigma.by = NULL,
  sigma.d = NULL,
  var.test = FALSE,
  digits = 2,
  force.digits = FALSE,
  use.scientific = FALSE,
  data,
  ...
)

Arguments

`x`, `y`	Unquoted strings identifying the numeric variables with the same length whose means have to be compared. `x` and `y` can be the names of vectors in the workspace or the names of columns in the data frame specified in the `data` argument. It is possible to use a mixed specification (e.g, one vector and one column in data).
`type`	A length-one character vector specifying the type of samples. Allowed values are `"independent"` or `"paired"`.
`mdiff0`	Numeric value that specifies the null hypothesis to test for (default is 0).
`alternative`	A length-one character vector specifying the direction of the alternative hypothesis. Allowed values are `"two.sided"` (difference between populations' means differs from `mdiff0`; default), or `"less"` (difference between populations' means is lower than `mdiff0`), or `"greater"` (difference between populations' means is higher than `mdiff0`).
`sigma.x`, `sigma.y`	Optional numeric values specifying the possibly known populations' standard deviations (when `x` and `y` are specified). If `NULL` (default) standard deviations are estimated using the data.
`by`	Optional unquoted string, available only when `type = "independent"`, identifying a variable (of any type), defined same way as `x`, taking only two values used to split `x` into two independent samples. Given the two ordered values taken by `by` (alphabetical or numerical order, or order of the levels for factors), say by1 and by2, hypotheses are tested on the difference between the populations means in the by1- and in the by2-group. Note that only one between `y` and `by` can be specified.
`sigma.by`	Optional numeric value specifying the possibly known standard deviations for the two independent samples identified via `by` (when `x` and `by` are specified). `sigma.by` can be a single value indicating the same standard deviation in the two by-groups, or a vector with two values, specifying the standard deviations in the two by-groups. To avoid errors, in the latter case the vector should be named, with names coinciding with the two levels of `by`.
`sigma.d`	Optional numeric value specifying the possibly known standard deviation of the difference when samples are paired.
`var.test`	Logical value indicating whether to run a test on the equality of variance for two (independent) samples or not (default).
`digits`	Integer value specifying the number of decimals used to round statistics; default to 2. If the chosen rounding formats some non-zero values as zero, the number of decimals is increased so that all values have at least one significant digit, unless the argument `force.digits` is set to `TRUE`.
`force.digits`	Logical value indicating whether reported values should be forcedly rounded to the number of decimals specified in `digits` even if non-zero values are rounded to zero (default to `FALSE`).
`use.scientific`	Logical value indicating whether numbers in tables should be displayed using scientific notation (`TRUE`); default to `FALSE`.
`data`	An optional data frame containing `x` and/or `y` or `by`. If not found in `data`, the variables are taken from the environment from which `TEST.diffmean()` is called.
`...`	Additional arguments to be passed to low level functions.

Value

A table reporting the results of the test on the difference between the populations' means. For independent samples in the case of unknown variances the test is run both under the assumption that the variances are equal and under the assumption that they differ, using percentiles from both the normal and the Student's t distribution.

Author(s)

Raffaella Piccarreta raffaella.piccarreta@unibocconi.it

Examples

data(MktDATA, package = "UBStats")

# Independent samples (default type), UNKNOWN variances
#  Bilateral test on difference between means of males and females
#  - Using x,y: build vectors with data on the two groups
AOV_M <- MktDATA$AOV[MktDATA$Gender == "M"]
AOV_F <- MktDATA$AOV[MktDATA$Gender == "F"]
TEST.diffmean(x = AOV_M, y = AOV_F, mdiff0 = 0)
#  - Using x,by: groups identified by ordered levels of by
TEST.diffmean(x = AOV, by = Gender, mdiff0 = 0, data = MktDATA)
#    Since order is F, M, hypothesis are on mean(F) - mean(M)
#    To test hypotheses on mean(M) - mean(F)
Gender.R <- factor(MktDATA$Gender, levels = c("M", "F"))
TEST.diffmean(x = AOV, by = Gender.R , mdiff0 = 0, 
              data = MktDATA)
#  - Testing also hypotheses on equality of unknown variances
TEST.diffmean(x = AOV_M, y = AOV_F, mdiff0 = 0, 
              var.test = TRUE)

#  - Output results: test on differences
out.test_diffM<-TEST.diffmean(x = AOV_M, y = AOV_F)
#  - Output results: list with both test on means and variances
out.test_diffM.V<-TEST.diffmean(x = AOV_M, y = AOV_F, var.test = TRUE)

# Independent samples (default type), KNOWN variances
#  Test hypotheses on the difference between means of males and females
#  - Using x,y: build vectors with data on the two groups
AOV_M <- MktDATA$AOV[MktDATA$Gender == "M"]
AOV_F <- MktDATA$AOV[MktDATA$Gender == "F"]
TEST.diffmean(x = AOV_M, y = AOV_F, mdiff0 = 10, 
              alternative = "greater", sigma.x = 10, sigma.y = 20)
#  - Using x,by: groups identified by ordered levels of by
#    Adjust considering the ordering of levels
TEST.diffmean(x = AOV, by = Gender, mdiff0 = -10,
              alternative = "less",
              sigma.by = c("M" = 10, "F"=20), data = MktDATA)
#    To change the sign, order levels as desired
Gender.R <- factor(MktDATA$Gender, levels = c("M", "F"))
TEST.diffmean(x = AOV, by = Gender.R, mdiff0 = 10,
              alternative = "greater",
              sigma.by = c("M" = 10, "F"=20), data = MktDATA)
#  - Output results 
out.test_diffM<-TEST.diffmean(x = AOV_M, y = AOV_F, mdiff0 = 10, 
                              alternative = "greater", 
                              sigma.x = 10, sigma.y = 20)

# Paired samples: UNKNOWN variances
# - Default settings
TEST.diffmean(x = NStore_Purch, y = NWeb_Purch, 
              type = "paired", 
              mdiff0 = 1.5, alternative = "greater", data=MktDATA)
# Paired: KNOWN variances
TEST.diffmean(x = NStore_Purch, y = NWeb_Purch,
              type = "paired", mdiff0 = 1.5, alternative = "greater",
              sigma.d = 2, data = MktDATA)
#  - Output results 
out.test_diffM<-TEST.diffmean(x = NStore_Purch, 
                              y = NWeb_Purch,
                              type = "paired", mdiff0 = 1.5, alternative = "greater",
                              sigma.d = 2, data = MktDATA)

# Arguments force.digits and use.scientific
#  An input variable taking very low values
SmallX<-MktDATA$AOV/50000
SmallX_M <- SmallX[MktDATA$Gender == "M"]
SmallX_F <- SmallX[MktDATA$Gender == "F"]
#  - Default output
TEST.diffmean(x = SmallX_M, y = SmallX_F)
#  - Request to use the exact number of digits (default, 2)
TEST.diffmean(x = SmallX_M, y = SmallX_F,
              force.digits = TRUE)
#  - Request to allow scientific notation
TEST.diffmean(x = SmallX_M, y = SmallX_F, 
              use.scientific = TRUE)

UBStats documentation built on Sept. 11, 2024, 6:52 p.m.