pgsfit: Penalized Generalized Estimation Equation with Grid Search
In YinanZheng/PGS: Penalized GEE with Grid Search

Description Usage Arguments Value See Also Examples

View source: R/pgsfit.R

pgsfit is used to fit and determine the best results from penalized GEE method across tunning parameter grid.

pgsfit(y.vect, id.vect, M, COV = NULL, sis.obj, lambda.n = 30,
  lambda.lim = c(2, 5), pm.n = 10, pm.max = NULL, fold = 10,
  nonzero.eps = 1e-05, eps = 1e-05, iter.n = 50, corstr = "ar1",
  parallel = TRUE, ncore = detectCores(), seed = NULL)

`y.vect`	a vector of dependent variable.
`id.vect`	a vector of subjuect ID.
`M`	a data frame or matrix of genomic dataset. Rows represent samples, columns represent variables.
`COV`	a data frame or matrix of covariates dataset.
`sis.obj`	a `sis` object. See `sis`.
`lambda.n`	an integer specifying the number of tunning parameter lambda, the range of lambda is specifyied by `lambda.lim`. Default = 30.
`lambda.lim`	a vector with two numbers specifying the limit of changing lambda for PGS to tune lambda. The lambda sequence is generated by `exp(-seq(lambda.lim[1], lambda.lim[2], length = lambda.n))`. Default = c(2,5).
`pm.n`	an integer specifying the number of Pm levels, starting from 10 to `pm.n`. Default = 10. Pm is the number of top ranking variables from `sis`.
`pm.max`	an integer specifying the maximum Pm. Default = `NULL`. If `NULL`, `n/log10(n)` will be used (n is the number of total observations).
`fold`	k-fold cross-validation in calculating grid error. Default = 10.
`nonzero.eps`	non-zero beta threshold. During iteration, if beta estimation is shrinked down below this threshold, it will be forced to be zero. Default = `1e-5`.
`eps`	convergence threshold. Iteration stops when the sum of beta estimation errors less than this threshold. Default = `1e-5`.
`iter.n`	maximum iteration number. Iteration will stop anyway even if the `eps` is not met and throw a warning. Default = 50.
`corstr`	a character string specifying the working correlation structure. The following are permitted: independence (`"indep"`), exchangeable (`"exch"`), autoregressive(1) (`"ar1"`), and unstructured (`"un"`). Default = `"ar1"`.
`parallel`	logical. Enable parallel computing feature. Default = `TRUE`.
`ncore`	number of cores to run parallel computation. Effective when `parallel` = `TRUE`. By default, max number of cores will be used.
`seed`	an integer specifying seed for cross-validation. If not specified `pgsfit` will generate one.

variables selection and model fitting results in a pgsfit.obj object.

see sis to obtain proper ranked variables; see pgsfit.obj for class methods.

### Dataset preview
BJdata()

### Convert binary variables into factor type
BJlung$gender = factor(BJlung$gender)
BJlung$heat = factor(BJlung$heat)
BJlung$cigwear = factor(BJlung$cigwear)

### Merge miRNA and lung function dataset
BJdata <- merge(BJmirna, BJlung, by=c("SID","WD"))

### Data must be sorted by study subject ID and multiple measurements indicator
BJdata <- BJdata[with(BJdata, order(SID, WD)), ]

### Extract dependent variable (lung function)
y.vect<-BJdata$FEV1

### Extract subjuect ID variable indicating repeated measures            
id.vect<-BJdata$SID        

### Extract microRNA data matrix   
M<-BJdata[,3:168]   

### Extract covariate data matrix       
COV<-BJdata[,170:179]
           
### In the example we use linear mixed-effect model (default) for sure independent screening, ranked by p-values
sis_LMM_par = sis(y.vect, id.vect, M, COV)

### If your computer have multiple cores, it is recommended to enable parallel option (default)
PGSfit = pgsfit(y.vect, id.vect, M, COV, sis_LMM_par, lambda.lim = c(3,5), pm.n = 12, pm.max = 120, seed = 1)

PGSfit        # print PGSfit summary
plot(PGSfit)  # plot cross-validation error grid
coef(PGSfit)  # return PGSfit coefficients

#For more information, please visit: https://github.com/YinanZheng/PGS/wiki/Example:-miRNA-expression-and-lung-function