sfa: Sparse factor analysis for mixed binary and count data.

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Scaling mixed binary and count data while estimating the underlying latent dimensionality.

Usage

1
2
3
sfa(M, missing.mat=NULL, gibbs=100, burnin=100, max.optim=50, 
  thin=1, save.curr="UDV_curr", save.each=FALSE, thin.save=25, 
  maxdim=NULL)

Arguments

M

Matrix to be scaled.

missing.mat

Matrix indicating missing data. Should be the same size as M, with a 1 denoting a missing observation and a 0 otherwise. Defaults to all zeroes.

gibbs

Number of posterior samples to draw

burnin

Number of burnin samples.

max.optim

Number of iterations to fit the cutpoints using optim. This is generally faster than the Hamiltonian Monte Carlo estimates, and is useful for the first part of the burnin phase.

thin

Extent of thinning of the MCMC chain. Only every thin draw is saved to the output.

save.curr

Name of file in which to save object.

save.each

Whether to save with a new name at each thinned draw.

thin.save

How many thinned draws to wait between saving output.

maxdim

Number of latent dimensions to fit. Should be greater than the number of estimated dimensions.

Details

The function sfa is the main function in the package, SparseFactorAnalysis. It takes in a matrix which in rows has the same data type–either binary or count. For example, every row may consist of roll call votes or word counts, and the columns may correspond with legislators. The method combines the two data types, scales both, and selects the underlying latent dimensionality.

Value

dim.sparse

Output for sparse estimates of dimensionality.

dim.mean

Non-sparse estimates of posterior mean of dimensionality.

rowdim1

Posterior samples of first dimension of spatial locations for each observation i.

rowdim2

Posterior samples of second dimension of spatial locations for row unit of observation.

coldim1

Posterior samples of first dimension of spatial locations for column unit of observation.

coldim2

Posterior samples of second dimension of spatial locations for column unit of observation.

lambda.lasso

Posterior samples for tuning parameter used for dimension selection.

Z

Posterior mean of fitted values, on a z-scale.

rowdims.all

Posterior mean of all row spatial locations.

coldims.all

Posterior mean of all column spatial locations.

Author(s)

Marc Ratkovic and Yuki Shiraito

References

In Song Kim, John Londregan, and Marc Ratkovic. 2015. "Voting, Speechmaking, and the Dimensions of Conflict in the US Senate." Working paper.

See Also

plot.sfa, summary.sfa

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
## Not run: 
##Sample size and dimensions.
 set.seed(1)
 n.sim<-50
 k.sim<-500
 
##True vector of dimension weights.
 d.sim<-rep(0,n.sim)
 d.sim[1:3]<-c(2, 1.5, 1)*3

##Formulate true latent dimensions.
 U.sim<-matrix(rnorm(n.sim^2,sd=.5), nr=n.sim, nc=n.sim)
 V.sim<-matrix(rnorm(n.sim*k.sim,sd=.5), nr=k.sim, nc=n.sim)
 Theta.sim<-U.sim%*%diag(d.sim)%*%t(V.sim)

##Generate binary outcome and count data.
 probs.sim<-pnorm((-1+Theta.sim+rep(1,n.sim)%*%t(rnorm(k.sim,sd=.5)) + 
   rnorm(n.sim,sd=.5)%*%t(rep(1,k.sim))   ))
 votes.mat<- 
    apply(probs.sim[1:25,],c(1,2),FUN=function(x) rbinom(1,1,x))
 count.mat<- 
    apply(probs.sim[26:50, ],c(1,2),FUN=function(x) rpois(1,20*x))
 M<-rbind(votes.mat,count.mat)
 
## Run sfa
 sparse1<-sfa(M, maxdim=10)
 
##Analyze results.
 summary(sparse1)
 plot(sparse1,type="dim")
 plot(sparse1,type="scatter")

##Compare to true data generating process

plot(sparse1$Z,Theta.sim)
abline(c(0,1))


## End(Not run)

SparseFactorAnalysis documentation built on May 2, 2019, 6 a.m.