h2o.prcomp: Principal component analysis of an H2O data frame
In h2o: R Interface for the 'H2O' Scalable Machine Learning Platform

h2o.prcomp

R Documentation

Principal component analysis of an H2O data frame

Description

Principal components analysis of an H2O data frame using the power method to calculate the singular value decomposition of the Gram matrix.

Usage

h2o.prcomp(
  training_frame,
  x,
  model_id = NULL,
  validation_frame = NULL,
  ignore_const_cols = TRUE,
  score_each_iteration = FALSE,
  transform = c("NONE", "STANDARDIZE", "NORMALIZE", "DEMEAN", "DESCALE"),
  pca_method = c("GramSVD", "Power", "Randomized", "GLRM"),
  pca_impl = c("MTJ_EVD_DENSEMATRIX", "MTJ_EVD_SYMMMATRIX", "MTJ_SVD_DENSEMATRIX",
    "JAMA"),
  k = 1,
  max_iterations = 1000,
  use_all_factor_levels = FALSE,
  compute_metrics = TRUE,
  impute_missing = FALSE,
  seed = -1,
  max_runtime_secs = 0,
  export_checkpoints_dir = NULL
)

Arguments

`training_frame`	Id of the training data frame.
`x`	A vector containing the `character` names of the predictors in the model.
`model_id`	Destination id for this model; auto-generated if not specified.
`validation_frame`	Id of the validation data frame.
`ignore_const_cols`	`Logical`. Ignore constant columns. Defaults to TRUE.
`score_each_iteration`	`Logical`. Whether to score during each iteration of model training. Defaults to FALSE.
`transform`	Transformation of training data Must be one of: "NONE", "STANDARDIZE", "NORMALIZE", "DEMEAN", "DESCALE". Defaults to NONE.
`pca_method`	Specify the algorithm to use for computing the principal components: GramSVD - uses a distributed computation of the Gram matrix, followed by a local SVD; Power - computes the SVD using the power iteration method (experimental); Randomized - uses randomized subspace iteration method; GLRM - fits a generalized low-rank model with L2 loss function and no regularization and solves for the SVD using local matrix algebra (experimental) Must be one of: "GramSVD", "Power", "Randomized", "GLRM". Defaults to GramSVD.
`pca_impl`	Specify the implementation to use for computing PCA (via SVD or EVD): MTJ_EVD_DENSEMATRIX - eigenvalue decompositions for dense matrix using MTJ; MTJ_EVD_SYMMMATRIX - eigenvalue decompositions for symmetric matrix using MTJ; MTJ_SVD_DENSEMATRIX - singular-value decompositions for dense matrix using MTJ; JAMA - eigenvalue decompositions for dense matrix using JAMA. References: JAMA - http://math.nist.gov/javanumerics/jama/; MTJ - https://github.com/fommil/matrix-toolkits-java/ Must be one of: "MTJ_EVD_DENSEMATRIX", "MTJ_EVD_SYMMMATRIX", "MTJ_SVD_DENSEMATRIX", "JAMA".
`k`	Rank of matrix approximation Defaults to 1.
`max_iterations`	Maximum training iterations Defaults to 1000.
`use_all_factor_levels`	`Logical`. Whether first factor level is included in each categorical expansion Defaults to FALSE.
`compute_metrics`	`Logical`. Whether to compute metrics on the training data Defaults to TRUE.
`impute_missing`	`Logical`. Whether to impute missing entries with the column mean Defaults to FALSE.
`seed`	Seed for random numbers (affects certain parts of the algo that are stochastic and those might or might not be enabled by default). Defaults to -1 (time-based random number).
`max_runtime_secs`	Maximum allowed runtime in seconds for model training. Use 0 to disable. Defaults to 0.
`export_checkpoints_dir`	Automatically export generated models to this directory.

Value

an object of class H2ODimReductionModel.

References

N. Halko, P.G. Martinsson, J.A. Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions[http://arxiv.org/abs/0909.4061]. SIAM Rev., Survey and Review section, Vol. 53, num. 2, pp. 217-288, June 2011.

Examples

## Not run: 
library(h2o)
h2o.init()
australia_path <- system.file("extdata", "australia.csv", package = "h2o")
australia <- h2o.uploadFile(path = australia_path)
h2o.prcomp(training_frame = australia, k = 8, transform = "STANDARDIZE")

## End(Not run)

h2o documentation built on May 29, 2024, 4:26 a.m.