phewas_ext: Function to perform a PheWAS analysis with multiple methods

View source: R/phewas_ext.R

phewas_extR Documentation

Function to perform a PheWAS analysis with multiple methods

Description

This function will perform a PheWAS analysis, optionally adjusting for other variables. It is parallelized using the base package parallel.

Usage

phewas_ext(phenotypes, genotypes, data, covariates=NA, outcomes, predictors, 
  cores=1, additive.genotypes=T,
  method="glm", strata=NA, factor.contrasts=contr.phewas,
  return.models=F, min.records=20, MASS.confint.level=NA, quick.confint.level)
  

Arguments

phenotypes

The names of the outcome variables in data under study. These can be logical (for logistic regression) or continuous (for linear regression) columns.

genotypes

The names of the prediction variables in data under study.

data

Data frame containing all variables for the anaylsis.

covariates

The names of the covariates to appear in every analysis.

outcomes

An alternate to phenotypes. It will be ignored if phenotypes exists.

predictors

An alternate to genotypes. It will be ignored if genotypes exists.

cores

The number of cores to use in the parallel socket cluster implementation. If cores=1, lapply will be used instead.

additive.genotypes

Are additive genotypes being supplied? If so, it will attempt to calculate allele frequencies and HWE values. Default is TRUE.

method

Determines the statistical method to check associations. One of: 'glm', 'clogit', 'lrt', or 'logistf'.

If clogit, requires the strata parameter to be defined.

If lrt, an atomic vector of genotypes will test all at once. A list of vectors in genotypes will perform each vector as a test (EG, provide a list of single items to see glm with LRT p-values).

strata

Name of the grouping / strat column necessary for clogit.

factor.contrasts

Contrasts used for factors to generate names used in clogit.

return.models

Return a list the complete models, with the names equal to the string formula used to create them, as well as the results. Default is FALSE.

min.records

The minimum number of records to perform a test. For logistic regression, there must be at least this number of each cases and controls, for linear regression this total number of records. Default is 20.

MASS.confint.level

Uses the MASS package and the confint function to calculate a confidence interval at the specified level. confint uses a profile likelihood method, which takes some time to compute. Output is stored in the lower and upper columns. Logistic models will report OR CIs and linear models will report beta CIs. Default is NA, which does not calculate confidence intervals.

quick.confint.level

Calculate a confidence interval based on beta + or - qnorm * SE. Output is stored in the lower.q and upper.q columns. Logistic models will return have the exponentiated OR confidence intervals.

Details

These results can be directly plotted using the phewasManhattan function, assuming that models are not returned. If they are, the results item of the returned list needs to be used.

Value

The following are the default rows included in the returned data frame. The attributes of the returned data frame contain additional information about the anaylsis. If a model did not have sufficient cases or controls for analysis or failed to converge, NAs will be reported and a note will be added in the note field.

phenotype

The outcome under study

snp

The predictor under study

adjustment

The one off adjustment used

beta

The beta coefficient for the predictor

SE

The standard error for the beta coefficient

lower.p

The lower bound of the quick confidence interval, if requested

upper.p

The upper bound of the quick confidence interval, if requested

lower

The lower bound of the confint confidence interval, if requested

upper

The upper bound of the confint confidence interval, if requested

OR

For logistic regression, the odds ratio for the predictor

p

The p-value for the predictor

type

The type of regression model used

n_total

The total number of records in the analysis

n_cases

The number of cases in the analysis (logical outcome only)

n_controls

The number of controls in the analysis (logical outcome only)

HWE_p

The Hardy-Weinberg equilibrium p-value for the predictor, assuming 0,1,2 allele coding

allele_freq

The allele frequency in the predictor for the coded allele

n_no_snp

The number of records with a missing predictor

note

Additional warning or error information

If there are any requested significance thresholds, boolean variables will be included reporting significance. If return.models=T, a list is returned. The named item results contains the above data frame. The named item models contains a list of the models generated in the analysis. To distinguish models, the list is named by the full formula used in generation.

Author(s)

Robert Carroll

See Also

createPhewasTable

Examples

  
    #Generate some example data
    ex=generateExample(hit="335")
    #Extract the two parts from the returned list
    id.icd9.count=ex$id.icd9.count
    genotypes=ex$genotypes
    #Create the PheWAS code table- translates the icd9s, adds exclusions, 
    #and reshapes to a wide format
    phenotypes=createPhewasTable(id.icd9.count)
    #Join the data
    data=inner_join(phenotypes,genotypes)
    #Run the PheWAS
    results=phewas_ext(phenotypes=names(phenotypes)[-1],
      genotypes=names(genotypes)[-1],data=data,cores=4)
  

PheWAS/PheWAS documentation built on July 3, 2023, 3:40 p.m.