sibp_param_search: Search Parameter Configurations for Supervised Indian Buffet...

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/sibp_param_search.R

Description

sibp_param_search runs sibp for a variety of parameter configurations, so that the user can then test the effects fo the most interesting treatments.

Usage

1
2
3
	  sibp_param_search(X, Y, K, alphas, sigmasq.ns, iters, 
	  a = 0.1, b = 0.1, sigmasq.A = 5, train.ind = train.ind,
	  G = NULL, seed = 0)

Arguments

X

The covariates for the full data set. The division between the training and test set is handled inside the function.

Y

The outcomes for the full data set. The division between the training and test set is handled inside the function.

K

The number of treatments to be discovered.

alphas

A vector of values of alpha to try.

sigmasq.ns

A vector of values of sigmasq.n to try.

iters

The number of starting values to attempt for each combination of alpha and sigmasq.n.

a

A parameter.

b

A parameter.

sigmasq.A

A parameter.

train.ind

The indices of the observations in the training set, usually obtained from get_training_set().

G

An optional group membership matrix. The AMCE for a given treatment is permitted to vary as a function of the individual's group.

seed

The seed to be used, so the result can be replicated.

Details

Fits a supervised Indian Buffet Process using variational inference for combinations of alpha and sigmasq.n. alpha influences how common the treatments are (where larger alphas imply more common treatments) and sigmasq.n influences how much of the variation of the outcome must be explained by the treatments. These parameters are the most important for determining the quality of the treatments discovered, so it is usually a good idea to experiment with many combinations. Because the treatments discovered can be sensitive to starting values, it is also usually a good idea to try each combination of alpha and sigmasq.n several times by setting iters > 1.

Because this function uses only the training data, the user can experiment with many parameter configurations without corrupting the inferences made with the test set. The choice of parameters is equivalent to the choice of hypotheses to test, so the analyst should choose the parameter configuration that leads to the most substantively interesting treatments. sibp_top_words can be applied to each element of the list returned by this function to determine which parameter configurations lead to interesting treatments. Often, it will be impractical to manually investigate every parameter configuration. In such cases, sibp_rank_runs can be used to automatically identify some of the most promising candidates.

Value

paramslist

Author(s)

Christian Fong

References

Fong, Christian and Justin Grimmer. 2016. “Discovery of Treatments from Text Corpora” Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. https://aclweb.org/anthology/P/P16/P16-1151.pdf

See Also

sibp_rank_runs, sibp_top_words, sibp_amce

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
##Load the sample of Wikipedia biography data
data(BioSample)

# Divide into training and test sets
Y <- BioSample[,1]
X <- BioSample[,-1]
set.seed(1)
train.ind <- sample(1:nrow(X), size = 0.5*nrow(X), replace = FALSE)

# Search sIBP for several parameter configurations; fit each to the training set
sibp.search <- sibp_param_search(X, Y, K = 2, alphas = c(2,4),
                                 sigmasq.ns = c(0.8, 1), iters = 1,
							     train.ind = train.ind)
								 
## Not run: 
# Get metric for evaluating most promising parameter configurations
sibp_rank_runs(sibp.search, X, 10)

# Qualitatively look at the top candidates
sibp_top_words(sibp.search[["4"]][["0.8"]][[1]], colnames(X), 10, verbose = TRUE)
sibp_top_words(sibp.search[["4"]][["1"]][[1]], colnames(X), 10, verbose = TRUE)

## End(Not run)

texteffect documentation built on May 2, 2019, 12:05 p.m.