The SPEAR package has been written using the R6
package. This means
the syntax for using SPEAR is more similar to object oriented
programming (OOP) languages (i.e. Python, C++…) rather than typical R
programming.
To use the SPEAR package, first generate a SPEARobject with the
make.SPEARobject
function…
SPEARobject <- SPEAR::make.SPEARobject(X = ...,
Y = ...,
family = ...,
seed = ...,
...
)
This SPEARobject will be essential to run SPEAR and analyze the results. Below are some key parameters…
X - a list of matrices of explanatory data (see the Preparing Data vignette for more information on setting up X)
Y - a matrix of response data (see the Preparing Data vignette for more information on setting up Y)
Z - a complete matrix of explanatory data (defaults to
do.call("cbind", X)
). Usually, this can be left as NULL (default).
(see the Preparing Data vignette
for more information on setting up Z)
family - what type of response data? Accepts gaussian
(default),
binomial
, ordinal
, and multinomial
(see the Preparing Data
vignette for more information on
the family parameter)
num_factors - how many factors to reconstruct? Defaults to 5, but
it is recommended to use the estimate_num_factors
function to
ensure SPEAR has sufficient factors for prediction.
inits_type - how should factors be initialized? Defaults to pca
,
but can be None
, pca
, or sparsepca
.
sparsity_upper - What is the maximum fraction of features allowed to have magnitude? Used to implement sparsity amongst the regression.coefficients. Defaults to .1 (can be between 0 and 1)
thres_elbo and thres_count - threshold for ELBO gain in between
iterations. If thres_count
iterations occur before increasing ELBO
by thres_elbo
, will return results. Defaults to 0.01 (with
thres_count
= 5)
warm_up - how many iterations of warm up should the Bayesian model
be allowed before counting thres_count
? Defaults to 100 iterations
max_iter - how many iterations maximum is the Bayesian model allowed to run if not yet converged? Defaults to 1000 iterations
Many coefficients (a0, b0, … a2, b2) - usually leave these as NULL unless you know what you’re doing, as they typically don’t require tuning
Loss (L0, L1, L2) - Again, usually no need to tune. Default to L0 = 1, L1 = N (N = num.samples), and L2 = N/log(P) (P = num.features)
SPEARobjects have the following structure:
data
- stored data for SPEAR. data$train
will always be required
for the training data, but other datasets can be added and accessed
here via add.data(...)
. Specifically, the user may be interested
in Xlist
(list of matrices), X
(concatenated Xlist
), and Y
(response).
fit
- generated after the SPEARobject is trained. Stores relevant
information for all trained values of wx, so
accessing the coefficients directly may prove difficult without
knowing the current.weight.idx
(see options
below or the
set.weights(...)
function)
params
- parameters used to initialize SPEAR (includes fold.ids
,
weights
, family
, and num_factors
, as well as other parameters)
inits
- initial coefficients used in run.cv.spear()
options
- Options for SPEAR, including:
remove.formatting
- If TRUE
, will remove coloring from SPEAR
output (useful in HTML documents where it isn’t supported)
quiet
- If TRUE
, will not output any messages/updates
current.response.idx
- Which column in Y
(see data
above)
should be shown in plotting functions? Only relevant for
"gaussian"
, "binomial"
, and "ordinal"
cases where more
than one response are being predicted ("multinomial"
needs to
be one-hot encoded, and thus will always have more than one
column)
current.weight.idx
- Which weight index is being used? See
params$weights
to get the full matrix, where each row
corresponds to a weight index. More easily, this can be
controlled via the set.weights(...)
function (type
?set.weights
for help)
color.scheme
- Color scheme for SPEARobject plots, split into
color.scheme$X
and color.scheme$Y
. Use
SPEARobject$set.color.scheme
, or change them directly here.
Needs to be named lists in each, where names correspond to
datasets (for X
) and response columns (for Y
). When using a
"multinomial"
response, color.scheme$Y
will need one color
for each of the classes (each one-hot encoded column).
parallel.method
- Which method for parallelization should be
used? Options include "parLapply"
, "mclapply"
, or
"lapply"
. Windows users cannot use "mclapply"
, they must use
"parLapply"
.
Because SPEAR can take time to run with large datasets, it is wise to
save the SPEARobject in a script after running $run.cv.spear()
or
$run.spear()
…
SPEARobject$save.model("_name_to_save_object_.rds")
# or more simply, just use
saveRDS(SPEARobject, "_name_to_save_object_.rds")
Then, when you would like to analyze the results, just load the trained SPEARobject…
SPEARobject <- SPEAR::load.SPEARobject("_name_to_save_object_.rds")
# or more simply, just use
SPEARobject <- readRDS("_name_to_save_object_.rds")
To return to the main SPEAR vignette, click here
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.