modelsearch2: Data-driven Extension of a Latent Variable Model

modelsearch2R Documentation

Data-driven Extension of a Latent Variable Model

Description

Procedure adding relationship between variables that are supported by the data.

Usage

modelsearch2(
  object,
  link,
  data,
  method.p.adjust,
  method.maxdist,
  n.sample,
  na.omit,
  alpha,
  nStep,
  trace,
  cpus
)

## S3 method for class 'lvmfit'
modelsearch2(
  object,
  link = NULL,
  data = NULL,
  method.p.adjust = "fastmax",
  method.maxdist = "approximate",
  n.sample = 1e+05,
  na.omit = TRUE,
  alpha = 0.05,
  nStep = NULL,
  trace = TRUE,
  cpus = 1
)

Arguments

object

a lvmfit object.

link

[character, optional for lvmfit objects] the name of the additional relationships to consider when expanding the model. Should be a vector containing strings like "Y~X". See the details section.

data

[data.frame, optional] the dataset used to identify the model

method.p.adjust

[character] the method used to adjust the p.values for multiple comparisons. Can be any method that is valid for the stats::p.adjust function (e.g. "fdr"). Can also be "max", "fastmax", or "gof".

method.maxdist

[character] the method used to estimate the distribution of the max statistic. "resampling" resample the score under the null to estimate the null distribution. "bootstrap" performs a wild bootstrap of the iid decomposition of the score to estimate the null distribution. "approximate" attemps to identify the latent gaussian variable corresponding to each score statistic (that is chi-2 distributed). It approximates the correlation matrix between these latent gaussian variables and uses numerical integration to compute the distribution of the max.

n.sample

[integer, >0] number of samples used in the resampling approach.

na.omit

should tests leading to NA for the test statistic be ignored. Otherwise this will stop the selection process.

alpha

[numeric 0-1] the significance cutoff for the p-values. When the p-value is below, the corresponding link will be added to the model and the search will continue. Otherwise the search will stop.

nStep

the maximum number of links that can be added to the model.

trace

[logical] should the execution of the function be traced?

cpus

the number of cpus that can be used for the computations.

Details

method.p.adjust = "max" computes the p-values based on the distribution of the max statistic. This max statistic is the max of the square root of the score statistic. The p-value are computed integrating the multivariate normal distribution.

method.p.adjust = "fastmax" only compute the p-value for the largest statistic. It is faster than "max" and lead to identical results.

method.p.adjust = "gof" keep adding links until the chi-squared test (of correct specification of the covariance matrix) is no longer significant.

Value

A list containing:

  • sequenceTest: the sequence of test that has been performed.

  • sequenceModel: the sequence of models that has been obtained.

  • sequenceQuantile: the sequence of rejection threshold. Optional.

  • sequenceIID: the influence functions relative to each test. Optional.

  • sequenceSigma: the covariance matrix relative to each test. Optional.

  • initialModel: the model before the sequential search.

  • statistic: the argument statistic.

  • method.p.adjust: the argument method.p.adjust.

  • alpha: [numeric 0-1] the significance cutoff for the p-values.

  • cv: whether the procedure has converged.

Examples


## simulate data 
mSim <- lvm()
regression(mSim) <- c(y1,y2,y3,y4)~u
regression(mSim) <- u~x1+x2
categorical(mSim,labels=c("A","B","C")) <- "x2"
latent(mSim) <- ~u
covariance(mSim) <- y1~y2
transform(mSim, Id~u) <- function(x){1:NROW(x)}

set.seed(10)
df.data <- lava::sim(mSim, n = 1e2, latent = FALSE)

## only identifiable extensions
m <- lvm(c(y1,y2,y3,y4)~u)
latent(m) <- ~u
addvar(m) <- ~x1+x2

e <- estimate(m, df.data)

## Not run: 
resSearch <- modelsearch(e)
resSearch

resSearch2 <- modelsearch2(e, nStep = 2)
resSearch2

## End(Not run)


## some extensions are not identifiable
m <- lvm(c(y1,y2,y3)~u)
latent(m) <- ~u
addvar(m) <- ~x1+x2 

e <- estimate(m, df.data)

## Not run: 
resSearch <- modelsearch(e)
resSearch
resSearch2 <- modelsearch2(e)
resSearch2

## End(Not run)

## for instance
mNI <- lvm(c(y1,y2,y3)~u)
latent(mNI) <- ~u
covariance(mNI) <- y1~y2
## estimate(mNI, data = df.data)
## does not converge




bozenne/lavaSearch2 documentation built on Feb. 13, 2024, 10:18 p.m.