pcsslm: Approximate a linear model using PCSS

View source: R/calculate_models.R

pcsslmR Documentation

Approximate a linear model using PCSS

Description

pcsslm approximates a linear model of a combination of variables using precomputed summary statistics.

Usage

pcsslm(formula, pcss = list(), ...)

Arguments

formula

an object of class formula whose dependent variable is a combination of variables and logical | operators. All model terms must have appropriate PCSS in pcss.

pcss

a list of precomputed summary statistics. In all cases, this should include n: the sample size, means: a named vector of predictor and response means, and covs: a named covariance matrix including all predictors and responses. See Details for more information.

...

additional arguments. See Details for more information.

Details

pcsslm parses the input formula's dependent variable for functions such as sums (+), products (*), or logical operators (| and &). It then identifies models the combination of variables using one of model_combo, model_product, model_or, model_and, or model_prcomp.

Different precomputed summary statistics are needed inside pcss depending on the function that combines the dependent variable.

  • For linear combinations (and principal component analysis), only n, means, and covs are required

  • For products and logical combinations, the additional items predictors and responses are required. These are named lists of objects of class predictor generated by new_predictor, with a predictor object for each independent variable in predictors and each dependent variable in responses. However, if only modeling the product or logical combination of only two variables, responses can be NULL without consequence.

If modeling a principal component score of a set of variables, include the argument comp where comp is an integer indicating which principal component score to analyze. Optional logical arguments center and standardize determine if responses should be centered and standardized before principal components are calculated.

If modeling a linear combination, include the argument phi, a named vector of linear weights for each variable in the dependent variable in formula.

If modeling a product, include the argument response, a character equal to either "continuous" or "binary". If "binary", specialized approximations are performed to estimate means and variances.

Value

an object of class "pcsslm".

An object of class "pcsslm" is a list containing at least the following components:

call

the matched call

terms

the terms object used

coefficients

a p x 4 matrix with columns for the estimated coefficient, its standard error, t-statistic and corresponding (two-sided) p-value.

sigma

the square root of the estimated variance of the random error.

df

degrees of freedom, a 3-vector p, n-p, p*, the first being the number of non-aliased coefficients, the last being the total number of coefficients.

fstatistic

a 3-vector with the value of the F-statistic with its numerator and denominator degrees of freedom.

r.squared

R^2, the 'fraction of variance explained by the model'.

adj.r.squared

the above R^2 statistic 'adjusted', penalizing for higher p.

cov.unscaled

a p x p matrix of (unscaled) covariances of the coef[j], j=1,...p.

Sum Sq

a 3-vector with the model's Sum of Squares Regression (SSR), Sum of Squares Error (SSE), and Sum of Squares Total (SST).

References

\insertRef

wolf_using_2021pcsstools

\insertRef

wolf_computationally_2020pcsstools

\insertRef

gasdaska_leveraging_2019pcsstools

See Also

model_combo, model_product, model_or, model_and, and model_prcomp.

Examples

## Principal Component Analysis
ex_data <- pcsstools_example[c("g1", "x1", "y1", "y2", "y3")]
pcss <- list(
  means = colMeans(ex_data),
  covs = cov(ex_data),
  n = nrow(ex_data)
)

pcsslm(y1 + y2 + y3 ~ g1 + x1, pcss = pcss, comp = 1)

## Linear combination of variables
ex_data <- pcsstools_example[c("g1", "g2", "y1", "y2")]
pcss <- list(
  means = colMeans(ex_data),
  covs = cov(ex_data),
  n = nrow(ex_data)
)

pcsslm(y1 + y2 ~ g1 + g2, pcss = pcss, phi = c(1, -1))
summary(lm(y1 - y2 ~ g1 + g2, data = ex_data))

## Product of variables
ex_data <- pcsstools_example[c("g1", "x1", "y4", "y5", "y6")]

pcss <- list(
  means = colMeans(ex_data),
  covs = cov(ex_data),
  n = nrow(ex_data),
  predictors = list(
    g1 = new_predictor_snp(maf = mean(ex_data$g1) / 2),
    x1 = new_predictor_normal(mean = mean(ex_data$x1), sd = sd(ex_data$x1))
  ),
  responses = lapply(
    colMeans(ex_data)[3:length(colMeans(ex_data))], 
    new_predictor_binary
  )
)

pcsslm(y4 * y5 * y6 ~ g1 + x1, pcss = pcss, response = "binary")
summary(lm(y4 * y5 * y6 ~ g1 + x1, data = ex_data))

## Disjunct (OR statement) of variables
ex_data <- pcsstools_example[c("g1", "x1", "y4", "y5")]

pcss <- list(
  means = colMeans(ex_data),
  covs = cov(ex_data),
  n = nrow(ex_data),
  predictors = list(
    g1 = new_predictor_snp(maf = mean(ex_data$g1) / 2),
    x1 = new_predictor_normal(mean = mean(ex_data$x1), sd = sd(ex_data$x1))
  )
)
pcsslm(y4 | y5 ~ g1 + x1, pcss = pcss) 
summary(lm(y4 | y5 ~ g1 + x1, data = ex_data))


jackmwolf/pcsstools documentation built on July 7, 2024, 7:46 p.m.