q2: Model prediction power calculation.

Description Usage Arguments Details Value Author(s) Examples

View source: R/q2.R

Description

Determines the prediction power of model M. Therefore M is applied to an external data set and its observations are compared to the model predictions. If an external data set is not available, the prediction power is calculated while performing a cross-validation to the model data set.

Usage

1
2
3
4
5
6
7
8
  looq2( modelData, formula = NULL, nu = 1, round = 4, 
  extOut = FALSE, extOutFile = NULL )

  cvq2( modelData, formula = NULL, nFold = N, nRun = 1, nu = 1, 
  round = 4, extOut = FALSE, extOutFile = NULL )

  q2( modelData, predictData, formula = NULL, nu = 0, round = 4, 
  extOut = FALSE, extOutFile = NULL )

Arguments

modelData

The model data set consists of parameters x_1, x_2, ..., x_n and an observation y

predictData

The prediction data set consists of parameters x_1, x_2, ..., x_n and an observation y

formula

The formula used to predict the observation: y ~ x_1 + x_2 + … + x_n DEFAULT: NULL
If NULL, a generic formula is derived from the data set, assuming that the last column contains observations whereas the others contain parameters x_1, x_2, …, x_n

nFold

The data set modelData is randomly partitioned into nFold equal sized subsets (test sets) during each run, DEFAULT: N, 2 <= nFold <= N

nRun

Number of iterations, the cross-validation is repeated for this data set. This corresponds to the number of individual predictions per observation, 1 <= nRun, DEFAULT: 1 Must be 1, if nFold = N.

nu

The degrees of freedom used in rmse calculation in relation to the prediction power, DEFAULT: 1 (looq2(),cvq2()), 0 (else)

round

The rounding value used in the output, DEFAULT: 4

extOut

Extended output, DEFAULT: FALSE
If extOutFile is not specified, write to stdout()

extOutFile

Write extended output into file (implies extOut = TRUE), DEFAULT: NULL

Details

The calibration of model M with modelData is done with a linear regression.

q2()


Alias: qsq(), qsquare()

The model described by modelData is used to predict the observations of predictData. These predictions are used to calculate the predictive squared correlation coefficient, q^2.

cvq2()


Alias: cvqsq(), cvqsquare()

A cross-validation is performed for modelData, whereas modelData (N elements) is split into nFold disjunct and equal sized test sets. Each test set consists of k elements:

k=ceil(N/nFold)

In case k=N/nFold is a decimal number, some test sets consist of k-1 elements. The remaining N-k elements are merged together as training set for this test set and describe the model M'. This model is used to predict the observations in the test set. Note, that M' is slighlty different to model M, which is a result of the missing k values.
Each observation from modelData is predicted once. The difference between the prediction and the observation within the test sets is used to calculate the PREdictive residual Sum of Squares (PRESS). Furthermore for any training set the mean of the observed values in this training set, y_mean^N-k,i, is calculated. PRESS and y_mean^N-k,i are required to calculate the predictive squared correlation coefficient, q^2_cv.
In case k>1 one can repeat the cross-validation to overcome biasing. Therefore in each iteration (nRun = 1,2 …, x) the test sets are compiled individually by random. Within one iteration, each observation is predicted once. If nFold = N, one iteration is necessary only.

looq2()


Same procedure as cvq2() (see above), but implicit nFold = N to perform a Leave-One-Out cross-validation. For Leave-One-Out cross-validation one iteration (nRun = 1) is necessary only.

Value

q2()


The method q2 returns an object of class "q2". It contains information about the model calibration and its prediction performance on the external data set, predictData.

cvq2(), looq2()


The methods cvq2 and looq2 return an object of class "cvq2". It contains information about the model calibration and its prediction performance as well as data about the cross-validation applied to modelData.

Author(s)

Torsten Thalheim <torstenthalheim@gmx.de>

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
  require(methods)
  require(stats)
  library(cvq2)
  
  data(cvq2.sample.A)
  result <- cvq2( cvq2.sample.A )
  result
  
  data(cvq2.sample.B)
  result <- cvq2( cvq2.sample.B, y ~ x, nFold = 3 )
  result
  
  data(cvq2.sample.B)
  result <- cvq2( cvq2.sample.B, y ~ x, nFold = 3, nRun = 5 )
  result
  
  data(cvq2.sample.A)
  result <- looq2( cvq2.sample.A, y ~ x1 + x2 )
  result
  
  data(cvq2.sample.A)
  data(cvq2.sample.A_pred)
  result <- q2( cvq2.sample.A, cvq2.sample.A, y ~ x1 + x2 )
  result 

Example output

---- CALL ----
cvq2(modelData = cvq2.sample.A)

---- RESULTS ----

-- MODEL CALIBRATION (linear regression)
#Elements: 	4

mean (observed): 	3.0900
mean (predicted): 	3.0900
rmse (nu = 0): 		0.2441
r^2: 			0.9726

-- PREDICTION PERFORMANCE (cross validation)
#Runs: 				1
#Groups: 			4
#Elements Training Set: 	3
#Elements Test Set: 		1

mean (observed): 	3.0900
mean (predicted): 	3.1619
rmse (nu = 1): 		1.3286
q^2: 			0.6571

---- CALL ----
cvq2(modelData = cvq2.sample.B, formula = y ~ x, nFold = 3)

---- RESULTS ----

-- MODEL CALIBRATION (linear regression)
#Elements: 	6

mean (observed): 	5.4600
mean (predicted): 	5.4600
rmse (nu = 0): 		1.4989
r^2: 			0.8179

-- PREDICTION PERFORMANCE (cross validation)
#Runs: 				1
#Groups: 			3
#Elements Training Set: 	4
#Elements Test Set: 		2

mean (observed): 	5.4600
mean (predicted): 	5.5392
rmse (nu = 1): 		2.7163
q^2: 			0.5776

---- CALL ----
cvq2(modelData = cvq2.sample.B, formula = y ~ x, nFold = 3, nRun = 5)

---- RESULTS ----

-- MODEL CALIBRATION (linear regression)
#Elements: 	6

mean (observed): 	5.4600
mean (predicted): 	5.4600
rmse (nu = 0): 		1.4989
r^2: 			0.8179

-- PREDICTION PERFORMANCE (cross validation)
#Runs: 				5
#Groups: 			3
#Elements Training Set: 	4
#Elements Test Set: 		2

mean (observed): 	5.4600
mean (predicted): 	5.0761
rmse (nu = 1): 		2.8511
q^2: 			0.5482

---- CALL ----
looq2(modelData = cvq2.sample.A, formula = y ~ x1 + x2)

---- RESULTS ----

-- MODEL CALIBRATION (linear regression)
#Elements: 	4

mean (observed): 	3.0900
mean (predicted): 	3.0900
rmse (nu = 0): 		0.2441
r^2: 			0.9726

-- PREDICTION PERFORMANCE (cross validation)
#Runs: 				1
#Groups: 			4
#Elements Training Set: 	3
#Elements Test Set: 		1

mean (observed): 	3.0900
mean (predicted): 	3.1619
rmse (nu = 1): 		1.3286
q^2: 			0.6571

---- CALL ----
q2(modelData = cvq2.sample.A, predictData = cvq2.sample.A, formula = y ~ 
    x1 + x2)

---- RESULTS ----

-- MODEL CALIBRATION (linear regression)
#Elements: 	4

mean (observed): 	3.0900
mean (predicted): 	3.0900
rmse (nu = 0): 		0.2441
r^2: 			0.9726

-- PREDICTION PERFORMANCE (model and prediction set available)
#Elements Model Set: 		4
#Elements Prediction Set: 	4

mean (observed): 	3.0900
mean (predicted): 	3.0900
rmse (nu = 0): 		0.2441
q^2: 			0.9726

cvq2 documentation built on May 2, 2019, 8:29 a.m.