BHPMF-package: An R Package to fill missing values in a data matrix.

Description Details References See Also Examples

Description

An R package to filling gaps in a data matrix with hierarchical information (e.g. taxonomy) using Gibbs sampling (BHPMF).

Details

Package: BHPMF
Type: Package
Version: 1.0
Date: 2016-11-23
License: GPL-2

Inputs to the GapFilling function are a matrix X containing the missing values (e.g., plant trait information), and a matrix containing the hierarchy information. Output is a gap filled matrix (mean of the predicted distribution) in the same format as the matrix X along with another matrix containing the standard deviation informaiton in the same format as X, both are saved into given file paths. The standard deviation will be used for uncertainity quantification. Two parameters "used.num.hierarchy.levels" and "prediction.level" control the usage of hierarchy information and the level to predict missing data. In below, we explain how to use these two parameters effectively.

Figure 1 shows the inputs and output of the GapFilling function. Here the hierarchy has 4 levels where the observation level (e.g. plants) is the 4th level. In this example, the GapFilling function will predict the missing values at the observation level i.e., filling the missing values in matrix X by setting the parameter "prediction.level" to 4 (observation level - default value). Also, here we use all the hierarchy information (information in levels 1, 2, and 3) by setting the parameter "used.num.hierarchy.levels" to 3 (the default value). The usage is shown in the examples.

Figure: Figure1.jpg

Figure 2 shows the inputs and output of the GapFilling function where we use part of the hierarchy information (information in levels 1, 2) by setting the parameter "used.num.hierarchy.levels" to 2. In this example, the GapFilling function will predict the missing values at the observation level i.e., filling the missing values in matrix X since we set the parameter "prediction.level" to 4 (observation level - default value). The usage is shown in the examples.

Figure: Figure2.jpg

Figure 3 shows the inputs and output of the GapFilling function where we use part of the hierarchy information (information in levels 1, 2) by setting the parameter "used.num.hierarchy.levels" to 2. In this example, the GapFilling function will predict the missing values at level 3 i.e., filling the missing values in matrix at level 3 (smaller dimension) by setting the parameter "prediction.level" to 3. The usage is shown in the examples.

Figure: Figure3.jpg

The paramter "used.num.hierarchy.levels" is useful to generate results similart to Figure 4 in F. Schrodt et al. 2015 and Figure 4 and Table 2 in H. Shan et al. 2012.

References

F. Fazayeli, A. Banerjee, J. Kattge, F. Schrodt, P.B. Reich (2014) Uncertainty quantified matrix completion using Bayesian Hierarchical Matrix factorization. International Conference on Machine Learning and Applications (ICMLA)

F. Schrodt, J. Kattge, H. Shan, F. Fazayeli, A. Karpatne, A. Banerjee, M. Reichstein, M. Boenisch, S. Diaz, J. Dickie, A. Gillison, V. Kumar, S. Lavorel, P.W. Leadley, C. Wirth, I. Wright, S.J. Wright, P.B. Reich (2015) BHPMF - a hierarchical Bayesian approach to gap-filling and trait prediction for macroecology and functional biogeography. Global Ecology and Biogeography 24, 1510–1521

H. Shan, J. Kattge, P. B. Reich, A. Banerjee, F. Schrodt, and M. Reichstein. (2012) Gap Filling in the Plant Kingdom – Trait Prediction Using Hierarchical Probabilistic Matrix Factorization . International Conference on Machine Learning (ICML)

See Also

GapFilling, CalculateCvRmse, TuneBhpmf

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
## Not run: 
    # Read the input data
    library(BHPMF)
    data(trait.info)  # Read the matrix X
    data(hierarchy.info) # Read the hierarchy information
    
    ###################
    # Usage 1: Figure 1
    ###################
    GapFilling(trait.info, hierarchy.info,
  	          mean.gap.filled.output.path = "/tmp/mean_gap_filled.txt",
    	       	std.gap.filled.output.path="/tmp/std_gap_filled.txt")
    
    ###################
    # Usage 2: Figure 2
    ###################
    GapFilling(trait.info, hierarchy.info,
              prediction.level = 4,
              used.num.hierarchy.levels = 2,
  	          mean.gap.filled.output.path = "/tmp/mean_gap_filled.txt",
    	       	std.gap.filled.output.path="/tmp/std_gap_filled.txt")
    
    ###################
    # Usage 3: Figure 3
    ###################
    GapFilling(trait.info, hierarchy.info,
              prediction.level = 3,
              used.num.hierarchy.levels = 2,
  	          mean.gap.filled.output.path = "/tmp/mean_gap_filled.txt",
    	       	std.gap.filled.output.path="/tmp/std_gap_filled.txt")  
    	       	
    ######################
    # Usage 4: Running PMF
    # not using hierarchy
    #####################
    GapFilling(trait.info, hierarchy.info,
              used.num.hierarchy.levels = 0,
  	          mean.gap.filled.output.path = "/tmp/mean_gap_filled.txt",
    	       	std.gap.filled.output.path="/tmp/std_gap_filled.txt")  
	

    ##########################################
    # Usage 4: Calculate cross validation RMSE
    ##########################################
    # Calculate average RMSE with the default values
    out1 <- CalculateCvRmse(trait.info, hierarchy.info)
    avg.rmse <- out1$avg.rmse
    std.rmse <- out1$std.rmse

    # Calculate average RMSE using partial hierarch: using only the first 2 level of hierarchy
    out2 <- CalculateCvRmse(X, hierarchy.info, used.num.hiearchy.levels=2,
    	                    num.samples=500, burn=20, gaps=3, tuning=TRUE)
    avg.rmse <- out2$avg.rmse
    std.rmse <-	out2$std.rmse

## End(Not run)

BHPMF documentation built on June 20, 2017, 9:10 a.m.