oneSample: One Sample Inferential Methods

Description Usage Arguments Details Value Author(s) Examples

Description

Produces point estimates, interval estimates, and p values for an arbitrary functional (mean, geometric mean, proportion, median, quantile, odds) of a variable of class integer, numeric, Surv, or Date. A variety of inferential methods are provided, with the choices depending on the functional and the data type.

Usage

1
2
3
4
5
6
oneSample(fnctl, y, null.hypothesis = NA, test.type = "two.sided", subset = rep(TRUE, N), 
          conf.level = 0.95, na.rm = TRUE, probs = 0.5, replaceZeroes = NULL, 
          restriction = Inf, subjTime = rep(1, N), method = NULL, above = NULL, 
          below = NULL, labove = NULL, rbelow = NULL, interval = NULL, linterval = NULL, 
          rinterval = NULL, lrinterval = NULL, g1 = 1, g2 = 0, dispersion = 1, 
          nbstrap = 10000, resample = "pairs", seed = 42, ..., version = FALSE)

Arguments

fnctl

a character string indicating the functional (summary measure of the distribution) for which inference is desired. Choices include "mean", "geometric mean", "proportion", "median", "quantile", "odds", "rate". The character string may be shortened to a unique substring. Hence "mea" will suffice for "mean".

y

a variable for which inference is desired. The variable may be of class numeric, Surv, or Date.

null.hypothesis

a numeric scalar indicating any null hypothesis to be tested.

test.type

a character string indicating whether a hypothesis test is to be of a one sided test of a lesser alternative hypothesis ("less"), a one sided test of a greater alternative hypothesis ("greater"), or a test of a two sided alternative hypothesis ("two.sided"). The default value is "two.sided".

subset

a vector indicating a subset to be used for all inference.

conf.level

a numeric scalar indicating the level of confidence to be used in computing confidence intervals. The default is 0.95.

na.rm

an indicator that missing data is to be removed prior to computation of the descriptive statistics.

probs

a vector of probabilities between 0 and 1 indicating quantile estimates to be included in the descriptive statistics. Default is to the 50th (median) percentile.

replaceZeroes

if not FALSE, this indicates a value to be used in place of zeroes when computing a geometric mean. If TRUE, a value equal to one-half the lowest nonzero value is used. If a numeric value is supplied, that value is used for all variables.

restriction

a value used for computing restricted means, standard deviations, and geometric means with censored time to event data. The default value of Inf will cause restrictions at the highest observation. Note that the same value is used for all variables of class Surv.

subjTime

a vector of values for use with rates.

method

a character string used to indicate inferential methods. Allowed choices depend on the variable type and the functional. Default values are "t.test" for means and geometric means, and "exact" for proportions of uncensored data, and "KM" for censored survival data.

above

a vector of values used to dichotomize variables. The descriptive statistics will include an estimate for each variable of the proportion of measurements with values greater than each element of above.

below

a vector of values used to dichotomize variables. The descriptive statistics will include an estimate for each variable of the proportion of measurements with values less than each element of below.

labove

a vector of values used to dichotomize variables. The descriptive statistics will include an estimate for each variable of the proportion of measurements with values greater than or equal to each element of labove.

rbelow

a vector of values used to dichotomize variables. The descriptive statistics will include an estimate for each variable of the proportion of measurements with values less than or equal to each element of rbelow.

interval

a two column matrix of values in which each row is used to define intervals of interest to categorize variables. The descriptive statistics will include an estimate for each variable of the proportion of measurements with values between two elements in a row, with neither endpoint included in each interval.

linterval

a two column matrix of values in which each row is used to define intervals of interest to categorize variables. The descriptive statistics will include an estimate for each variable of the proportion of measurements with values between two elements in a row, with the left hand endpoint included in each interval.

rinterval

a two column matrix of values in which each row is used to define intervals of interest to categorize variables. The descriptive statistics will include an estimate for each variable of the proportion of measurements with values between two elements in a row, with the right hand endpoint included in each interval.

lrinterval

a two column matrix of values in which each row is used to define intervals of interest to categorize variables. The descriptive statistics will include an estimate for each variable of the proportion of measurements with values between two elements in a row, with both endpoints included in each interval.

g1

used in method="mean-variance".

g2

used in method="mean-variance".

dispersion

dispersion, used in method="mean-variance".

nbstrap

number of bootstrap iterations to perform, used with method="bootstrap".

resample

character string specifying how the bootstrap should resample, used with method="bootstrap".

seed

sets the seed (for random number generation), used with method="bootstrap".

...

other arguments.

version

if TRUE, the version of the function will be returned. No other computations will be performed.

Details

Default values for inference correspond to the most commonly implemented methods. Additional methods are provided more for educational purposed than for purposes of statistical analysis.

Value

An object of class uOneSample is returned. Inferential statistics are contained in a vector named $Inference that includes the sample size, the point estimate, the lower and upper bounds of a confidence interval, any null hypothesis that was specified, and the p-value. Also included is a vector named $Statistics that includes more technical information. There is a print method that will format the descriptive statistics for the Date and Surv objects.

Author(s)

Scott S. Emerson, M.D., Ph.D., Andrew J. Spieker, Brian D. Williamson, Travis Y. Hee Wai, Solomon Lim

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# Load required libraries
library(survival)

# Reading in a dataset
mri <- read.table("http://www.emersonstatistics.com/datasets/mri.txt",header=TRUE)

# Creating a Surv object to reflect time to death
mri$ttodth <- Surv(mri$obstime,mri$death)

# Reformatting an integer MMDDYY representation of date to be a Date object
mri$mridate <- as.Date(paste(trunc(mri$mridate/10000),trunc((mri$mridate %% 10000)/100),
mri$mridate %% 100,sep="/"),"%m/%d/%y")

# Inference about the mean LDL: a two sample t test that mean LDL is 135 mg/dl
oneSample ("mean", mri$ldl, null.hypothesis=125)

# Inference about the mean LDL: a one sample t test of a lesser alternative
# that mean LDL is 125 mg/dl
oneSample ("mean", mri$ldl, null.hypothesis=125, test.type="less")

# Inference about the mean LDL: a one sample t test of a greater alternative
# that mean LDL is 125 mg/dl
oneSample ("mean", mri$ldl, null.hypothesis=125, test.type="greater")

# Inference about the geometric mean LDL: a one sample t test of a greater
# alternative that geometric mean LDL is 125 mg/dl
oneSample ("geom", mri$ldl, null.hypothesis=125, test.type="greater")

# Inference about the proportion of subjects with LDL greater than 128: exact binomial
# inference that 50% of subjects have LDL greater than 128 mg/dl
oneSample ("prop", mri$ldl, null.hypothesis=0.5, above=128)
oneSample ("prop",mri$ldl>128, null.hypothesis=0.5)

uwIntroStats documentation built on May 2, 2019, 4:34 a.m.