summary.sgee: Coefficient Path summary
In sgee: Stagewise Generalized Estimating Equations

Description Usage Arguments Details Value Author(s) Examples

Function to analyze and summarize a path of coefficent values by comparing them using prediction error on a \"new\" data set (or fold in CV), or the original data set if no comparison data is provided. The best point along the path in terms of the prediction error is identified. All of the prediction errors for each point along the path, the minimum prediction error, and the index of the minimum are returned.

## S3 method for class 'sgee'
summary(object, newX = NULL, newY = NULL, newOffset = NULL,
  trueBeta = NULL, trueIntercept = NULL, scale = NULL,
  classification = 0.5, averaged = TRUE, ...)

`object`	Object of class `sgee`, from which various path information is pulled.
`newX`	Design matrix to be used for model testing. It is assumed that `newX` does not contain an intercept column. An intercept column is appended by`sgee.summary` if an intercept was used to make `object`.
`newY`	Response vector to be used for model testing.
`newOffset`	Vector of offsets to be used for model testing. Must be same length as newY.
`trueBeta`	For simulation use; true coefficient values can be provided to get certain metrics.
`trueIntercept`	For simulation use; true intercept value to be used in conjunction with trueBeta.
`scale`	Scale value can be passed to allow for standardized error measurements (poisson case only).
`classification`	A numeric parameter from 0 to 1 indicating cutoff to be used to determine classification rate in Binomial setting. Default is 0.5. Values below 0 indicate that the squared error, in either the observation or the true linear predictor is the trueBeta is given, is to be used instead of the classification rate.
`averaged`	Logical parameter indicating whether the mean of the total error is to be used; assumed TRUE.
`...`	Currently not used.

The prediction error used is dependent on the input. If the true Beta is not given, then the sum squared error (or MSE; see parameter averaged) in the response is used for gaussian (or non-poisson); for poisson if the scale (or an estimate) is also given, then the sum squared Pearson residuals are used, otherwise the deviance is used. If the true Beta is provided then the sum squared error in the linear predictor is used instead.

Furthermore, when true Beta is supplied, additional model selection metrics are produced, including: False Positive Rate, False Discovery Rate, False Negative Rate.

The function is provided to allow for model selection; given a path generated by a sgee function, the path can be fed into this function with a testing data set to identify an optimal point along the path. Cross validation can be performed by dividing the original data set into k folds before hand and generating multiple coefficient paths and applying this function to each path generated.

A list containing 1) a vector of prediction errors with testing data set, 2) the smallest prediction error found along path, 3) the index of the smallest error, and if the trueBeta parameter is provided the False Positive, False Discovery, and false negative rates, and True positive and False Positive counts at the index of the smallest error, along with the minimum mis-classification and corresponding index, where the mis-classification is the total of the coefficients incorrectly marked as important/unimportant.

Gregory Vaughan

## Initialize covariate values
p <- 50 
beta <- c(rep(2.4,5),
          c(1.3, 0, 1.7, 0, .5),
          rep(0.5,5),
          rep(0,p-15))
groupSize <- 1
numGroups <- length(beta)/groupSize



trainingData <- genData(numClusters = 50,
                        clusterSize = 4,
                        clusterRho = 0.6,
                        clusterCorstr = "exchangeable",
                        yVariance = 1,
                        xVariance = 1,
                        numGroups = numGroups,
                        groupSize = groupSize,
                        groupRho = 0.3,
                        beta = beta,
                        family = gaussian(),
                        intercept = 1)

testingData <- genData(numClusters = 50,
                       clusterSize = 4,
                       clusterRho = 0.6,
                       clusterCorstr = "exchangeable",
                       yVariance = 1,
                       xVariance = 1,
                       numGroups = numGroups,
                       groupSize = groupSize,
                       groupRho = 0.3,
                       beta = beta,
                       family = gaussian(),
                       intercept = 1)

coefMat <- see(y = trainingData$y,
               x = trainingData$x,
                    family = gaussian(),
                    clusterID = trainingData$clusterID, 
                    corstr="exchangeable", 
                    maxIt = 200,
                    epsilon = .1)

analysisResults <- summary(coefMat,
                           newX = testingData$x,
                           newY = testingData$y)