sfa: Sparse factor analysis for mixed binary and count data.
In SparseFactorAnalysis: Scaling Count and Binary Data with Sparse Factor Analysis

Description Usage Arguments Details Value Author(s) References See Also Examples

Scaling mixed binary and count data while estimating the underlying latent dimensionality.

1
2
3

sfa(M, missing.mat=NULL, gibbs=100, burnin=100, max.optim=50, 
  thin=1, save.curr="UDV_curr", save.each=FALSE, thin.save=25, 
  maxdim=NULL)

`M`	Matrix to be scaled.
`missing.mat`	Matrix indicating missing data. Should be the same size as M, with a 1 denoting a missing observation and a 0 otherwise. Defaults to all zeroes.
`gibbs`	Number of posterior samples to draw
`burnin`	Number of burnin samples.
`max.optim`	Number of iterations to fit the cutpoints using optim. This is generally faster than the Hamiltonian Monte Carlo estimates, and is useful for the first part of the burnin phase.
`thin`	Extent of thinning of the MCMC chain. Only every thin draw is saved to the output.
`save.curr`	Name of file in which to save object.
`save.each`	Whether to save with a new name at each thinned draw.
`thin.save`	How many thinned draws to wait between saving output.
`maxdim`	Number of latent dimensions to fit. Should be greater than the number of estimated dimensions.

The function sfa is the main function in the package, SparseFactorAnalysis. It takes in a matrix which in rows has the same data type–either binary or count. For example, every row may consist of roll call votes or word counts, and the columns may correspond with legislators. The method combines the two data types, scales both, and selects the underlying latent dimensionality.

`dim.sparse`	Output for sparse estimates of dimensionality.
`dim.mean`	Non-sparse estimates of posterior mean of dimensionality.
`rowdim1`	Posterior samples of first dimension of spatial locations for each observation i.
`rowdim2`	Posterior samples of second dimension of spatial locations for row unit of observation.
`coldim1`	Posterior samples of first dimension of spatial locations for column unit of observation.
`coldim2`	Posterior samples of second dimension of spatial locations for column unit of observation.
`lambda.lasso`	Posterior samples for tuning parameter used for dimension selection.
`Z`	Posterior mean of fitted values, on a z-scale.
`rowdims.all`	Posterior mean of all row spatial locations.
`coldims.all`	Posterior mean of all column spatial locations.

Marc Ratkovic and Yuki Shiraito

In Song Kim, John Londregan, and Marc Ratkovic. 2015. "Voting, Speechmaking, and the Dimensions of Conflict in the US Senate." Working paper.

plot.sfa, summary.sfa

## Not run: 
##Sample size and dimensions.
 set.seed(1)
 n.sim<-50
 k.sim<-500
 
##True vector of dimension weights.
 d.sim<-rep(0,n.sim)
 d.sim[1:3]<-c(2, 1.5, 1)*3

##Formulate true latent dimensions.
 U.sim<-matrix(rnorm(n.sim^2,sd=.5), nr=n.sim, nc=n.sim)
 V.sim<-matrix(rnorm(n.sim*k.sim,sd=.5), nr=k.sim, nc=n.sim)
 Theta.sim<-U.sim%*%diag(d.sim)%*%t(V.sim)

##Generate binary outcome and count data.
 probs.sim<-pnorm((-1+Theta.sim+rep(1,n.sim)%*%t(rnorm(k.sim,sd=.5)) + 
   rnorm(n.sim,sd=.5)%*%t(rep(1,k.sim))   ))
 votes.mat<- 
    apply(probs.sim[1:25,],c(1,2),FUN=function(x) rbinom(1,1,x))
 count.mat<- 
    apply(probs.sim[26:50, ],c(1,2),FUN=function(x) rpois(1,20*x))
 M<-rbind(votes.mat,count.mat)
 
## Run sfa
 sparse1<-sfa(M, maxdim=10)
 
##Analyze results.
 summary(sparse1)
 plot(sparse1,type="dim")
 plot(sparse1,type="scatter")

##Compare to true data generating process

plot(sparse1$Z,Theta.sim)
abline(c(0,1))


## End(Not run)