logisticRidgeGenotypes: Fits logistic ridge regression models for genomoe-wide SNP...

View source: R/logisticRidgeGenotypes.R

logisticRidgeGenotypesR Documentation

Fits logistic ridge regression models for genomoe-wide SNP data.

Description

Fits logistic ridge regression models for genome-wide SNP data. The SNP genotypes are not read into R but file names are passed to the code directly, enabling the analysis of genome-wide SNP data sets which are too big to be read into R.

Usage

logisticRidgeGenotypes(genotypesfilename, phenotypesfilename, lambda = -1, 
thinfilename = NULL, betafilename = NULL, approxfilename = NULL,
permfilename = NULL, intercept =
TRUE, verbose = FALSE)

Arguments

genotypesfilename

character string: path to file containing SNP genotypes coded 0, 1, 2. See Input file formats.

phenotypesfilename

character string: path to file containing phenotypes. See Input file formats.

lambda

(optional) shrinkage parameter. If not provided, the default denotes automatic choice of the shrinkage parameter using the method of Cule & De Iorio (2012).

thinfilename

(optional) character string: path to file containing three columns: SNP name, chromosme and SNP psotion. See Input file formats. (See details.)

betafilename

(optional) character string: path to file where the output will be written. See Output file formats.

approxfilename

(optional) character string: path to fine where the approximate test p-values will be written. Approximate p-values are not computed unless this argument is given. Approximate p-values are computed using the method of Cule et al (2011). See Output file formats.

permfilename

(optional) character string: path to file where the permutation test p-values will be written. Permutation test p-values are not computed unless this argument is given. (See warning). See Output file formats.

intercept

Logical: Should the ridge regression model be fitted with an intercept? Defaults to TRUE.

verbose

Logical: If TRUE, additional information is printed to the R output as the code runs. Defaults to FALSE.

Details

If a file thin is supplied, and the shrinkage parameter lambda is being computed automatically based on the data, then this file is used to thin the SNP data by SNP position. If this file is not supplied, SNPs are thinned automatically based on number of SNPs.

Value

The vector of fitted ridge regression coefficients. If betafilename is given, the fitted coefficients are written to this file as well as being returned. If approxfilename and/or permfilename are given, results of approximate test p-values and/or permutation test p-values are written to the files given in their arguments.

Input file formats

genotypesfilename:

A header row, plus one row for each individual, one SNP per column. The header row contains SNP names. SNPs are coded as 0, 1, 2 for minor allele count. Missing values are not accommodated. Invariant SNPs in the data cause an error, please remove these from the file before calling the function.

phenofilename:

A single column of phenotypes with the individuals in the same order as those in the file genotypesfilename. Phenotypes must be coded as 0 or 1.

thin:

(optional) Three columns and the same number of rows as there are SNPs in the file genotypesfilename, one row per SNP. First column: SNP names (must match names in genotypesfilename); second column: chromosome; third column: SNP position in BP.

Output file formats

All output files are optional. Whether or not betafilename is provided, fitted coefficients are returned to the R workshpace. If betafilename is provided, fitted coefficients are written to the file specified (in addition).

betafilename:

Two columns: First column is SNP names in same order as in genotypesfilename, second column is fitted coefficients. If intercept = TRUE (the default) then the first row is the fitted intercept (with the name Intercept in the first column).

approxfilename:

Two columns: First column is SNP names in same order as in genotypesfilename, second column is approximate p-values.

permfilename:

Two columns: First column is SNP names in same order as in genotypesfilename, second column is permutation p-values.

Warning

When data are large, the permutation test p-values may take a very long time to compute. It is recommended not to request permutation test p-values (using the argument permfilename) when data are large.

Author(s)

Erika Cule

References

Significance testing in ridge regression for genetic data. Cule, E. et al (2011) BMC Bioinformatics, 12:372 A semi-automatic method to guide the choice of ridge parameter in ridge regression. Cule, E. and De Iorio, M. (2012) arXiv:1205.0686v1 [stat.AP]

See Also

logisticRidge for fitting logistic ridge regression models when the data are small enough to be read into R. linearRidge and linearRidgeGenotypes for fitting linear ridge regression models.

Examples

## Not run: 
    genotypesfile <- system.file("extdata","GenBin_genotypes.txt",package = "ridge")
    phenotypesfile <- system.file("extdata","GenBin_phenotypes.txt",package = "ridge")
    beta_logisticRidgeGenotypes <-
logisticRidgeGenotypes(genotypesfilename = genotypesfile, phenotypesfilename = phenotypesfile)
    ## compare to output of logisticRidge
    data(GenBin) ## Same data as in GenBin_genotypes.txt and GenBin_phenotypes.txt
    beta_logisticRidge <- logisticRidge(Phenotypes ~ ., data = as.data.frame(GenBin))
    cbind(round(coef(beta_logisticRidge), 6), beta_logisticRidgeGenotypes)

## End(Not run)
  

SteffenMoritz/ridge documentation built on April 17, 2022, 3:14 a.m.