cv: Cross Validation RMSE for BHPMF

Description Usage Arguments Value Author(s) References See Also Examples

Description

Calculate the average RMSE and Standard deviation using cross validation.

Usage

1
2
3
4
CalculateCvRmse(X, hierarchy.info, num.folds=10, prediction.level,
        used.num.hierarchy.levels, num.samples=1000, burn=100, gaps=2,
        num.latent.feats=15, tuning=FALSE, num.folds.tuning, tmp.dir,
        verbose=FALSE)

Arguments

X

The data matrix (including missing values). Rows are the observations, columns are the features. Missing values must be filled by NA.

hierarchy.info

A matrix containing the hierarchical information in the same order as matrix X. Data should be ordered in descending order (the lowest level being in the last column). The first column is the observation index.

num.folds

Number of cross validation folds. The default value is 10.

prediction.level

The level at which gaps are filled. For example if set to 4, gaps are filled at the 3rd highest hierarchy level. The default value is at the observation level (filling gaps/missing values at the observation level).

used.num.hierarchy.levels

Number of hierarchy levels used for gap filling. For example, if set to 2, only information from the first and second levels of the hierarchy are used. The default value equals the total number of hierarchy levels.

num.samples

This is aparameter for Gibbs sampling: total number of generated samples at each fold using gibbs sampling. Note: this is not the effective number of samples. The default value is 1000.

burn

This is aparameter for Gibbs sampling: burn in period. Number of initially sampled parameters discarded. The default value is 200. Should be smaller than num.samples.

gaps

This is aparameter for Gibbs sampling: gap between sampled parameters kept. The default value is 2.

num.latent.feats

Size of latent vectors in BHPMF. If tuning is FALSE, set to value (default value: 10)

tuning

If set to TRUE, BHPMF is tuned to determine the best value of num.latent.feats. The default value is False.

num.folds.tuning

Number of cross validation folds for tuning. The default value is 10 if tuning set to TRUE.

tmp.dir

A temporary directory used to save preprocessing files. If not provided, a tmp directory in /R/tmp will be created. If provided, each time a function from this package is called, the preprocessing files saved in this directory will be used. Helps to avoid having to re-run preprocessing functions for the same dataset. WARNING: This directory should be empty for the first time call on a new dataset!

verbose

If TRUE, progress of the sampler details are printed to the screen. Otherwise, nothing is printed to the screen. Default is FALSE

Value

Return a list with the following tags:

avg.rmse

Average of cross validation RMSE among all folds.

std.rmse

Standard deviation cross validation RMSE among all folds.

best.number.latent.features

The best value for latent vectors length if tuning is TRUE

min.rmse

The minimum RMSE for the latent vectors of size best.number.latent.features if tuning is TRUE

Author(s)

Farideh Fazayeli farideh@cs.umn.edu

References

F. Fazayeli, A. Banerjee, J. Kattge, F. Schrodt, P.B. Reich (2014) Uncertainty quantified matrix completion using Bayesian Hierarchical Matrix factorization. International Conference on Machine Learning and Applications (ICMLA)

F. Schrodt, J. Kattge, H. Shan, F. Fazayeli, A. Karpatne, A. Banerjee, M. Reichstein, M. Boenisch, S. Diaz, J. Dickie, A. Gillison, V. Kumar, S. Lavorel, P.W. Leadley, C. Wirth, I. Wright, S.J. Wright, P.B. Reich (2015) HPMF - a hierarchical Bayesian approach to gap-filling and trait prediction for macroecology and functional biogeography. Global Ecology and Biogeography 24, 1510–1521

H. Shan, J. Kattge, P. B. Reich, A. Banerjee, F. Schrodt, and M. Reichstein. (2012) Gap Filling in the Plant Kingdom – Trait Prediction Using Hierarchical Probabilistic Matrix Factorization . International Conference on Machine Learning (ICML)

See Also

GapFilling, TuneBhpmf

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
## Not run: 
    # Read the input data
    data(X)  # class matrix
    data(hierarchy.info) # Read the hierarchy information; class data.frame

    # Calculate average RMSE with the default values
    out1 <- CalculateCvRmse(X, hierarchy.info)
    avg.rmse <- out1$AvgRMSE
    std.rmse <- out1$StdRMSE

    # Calculate average RMSE using partial hierarch: using only the first 2 level of hierarchy
    out2 <- CalculateCvRmse(X, hierarchy.info, used.num.hiearchy.levels=2,
    	                    num.samples=500, burn=20, gaps=3, tuning=TRUE)
    avg.rmse <- out2$AvgRMSE
    std.rmse <-	out2$StdRMSE

    out3 <- CalculateCvRmse(X, hierarchy.info,
    	                    num.samples=100, burn=20, gaps=3, tuning=FALSE)
    avg.rmse <- out3$AvgRMSE
    std.rmse <-	out3$StdRMSE
    
## End(Not run)

BHPMF documentation built on June 20, 2017, 9:10 a.m.