summary.sgee: Coefficient Path summary

Description Usage Arguments Details Value Author(s) Examples

View source: R/summary.sgee.R

Description

Function to analyze and summarize a path of coefficent values by comparing them using prediction error on a \"new\" data set (or fold in CV), or the original data set if no comparison data is provided. The best point along the path in terms of the prediction error is identified. All of the prediction errors for each point along the path, the minimum prediction error, and the index of the minimum are returned.

Usage

1
2
3
4
## S3 method for class 'sgee'
summary(object, newX = NULL, newY = NULL, newOffset = NULL,
  trueBeta = NULL, trueIntercept = NULL, scale = NULL,
  classification = 0.5, averaged = TRUE, ...)

Arguments

object

Object of class sgee, from which various path information is pulled.

newX

Design matrix to be used for model testing. It is assumed that newX does not contain an intercept column. An intercept column is appended bysgee.summary if an intercept was used to make object.

newY

Response vector to be used for model testing.

newOffset

Vector of offsets to be used for model testing. Must be same length as newY.

trueBeta

For simulation use; true coefficient values can be provided to get certain metrics.

trueIntercept

For simulation use; true intercept value to be used in conjunction with trueBeta.

scale

Scale value can be passed to allow for standardized error measurements (poisson case only).

classification

A numeric parameter from 0 to 1 indicating cutoff to be used to determine classification rate in Binomial setting. Default is 0.5. Values below 0 indicate that the squared error, in either the observation or the true linear predictor is the trueBeta is given, is to be used instead of the classification rate.

averaged

Logical parameter indicating whether the mean of the total error is to be used; assumed TRUE.

...

Currently not used.

Details

The prediction error used is dependent on the input. If the true Beta is not given, then the sum squared error (or MSE; see parameter averaged) in the response is used for gaussian (or non-poisson); for poisson if the scale (or an estimate) is also given, then the sum squared Pearson residuals are used, otherwise the deviance is used. If the true Beta is provided then the sum squared error in the linear predictor is used instead.

Furthermore, when true Beta is supplied, additional model selection metrics are produced, including: False Positive Rate, False Discovery Rate, False Negative Rate.

The function is provided to allow for model selection; given a path generated by a sgee function, the path can be fed into this function with a testing data set to identify an optimal point along the path. Cross validation can be performed by dividing the original data set into k folds before hand and generating multiple coefficient paths and applying this function to each path generated.

Value

A list containing 1) a vector of prediction errors with testing data set, 2) the smallest prediction error found along path, 3) the index of the smallest error, and if the trueBeta parameter is provided the False Positive, False Discovery, and false negative rates, and True positive and False Positive counts at the index of the smallest error, along with the minimum mis-classification and corresponding index, where the mis-classification is the total of the coefficients incorrectly marked as important/unimportant.

Author(s)

Gregory Vaughan

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
## Initialize covariate values
p <- 50 
beta <- c(rep(2.4,5),
          c(1.3, 0, 1.7, 0, .5),
          rep(0.5,5),
          rep(0,p-15))
groupSize <- 1
numGroups <- length(beta)/groupSize



trainingData <- genData(numClusters = 50,
                        clusterSize = 4,
                        clusterRho = 0.6,
                        clusterCorstr = "exchangeable",
                        yVariance = 1,
                        xVariance = 1,
                        numGroups = numGroups,
                        groupSize = groupSize,
                        groupRho = 0.3,
                        beta = beta,
                        family = gaussian(),
                        intercept = 1)

testingData <- genData(numClusters = 50,
                       clusterSize = 4,
                       clusterRho = 0.6,
                       clusterCorstr = "exchangeable",
                       yVariance = 1,
                       xVariance = 1,
                       numGroups = numGroups,
                       groupSize = groupSize,
                       groupRho = 0.3,
                       beta = beta,
                       family = gaussian(),
                       intercept = 1)

coefMat <- see(y = trainingData$y,
               x = trainingData$x,
                    family = gaussian(),
                    clusterID = trainingData$clusterID, 
                    corstr="exchangeable", 
                    maxIt = 200,
                    epsilon = .1)

analysisResults <- summary(coefMat,
                           newX = testingData$x,
                           newY = testingData$y)

sgee documentation built on May 1, 2019, 7:10 p.m.