# sgSEMp1: Semi-supervised Generalized Structural Equation Modelling... In gSEM: Semi-Supervised Generalized Structural Equation Modeling

## Description

This function carries out gSEM principle 1. Principle 1 determines the univariate relationships in the spirit of the Markovian process. The relationship between each pair of system elements, including predictors and the system level response, is determined with the Markovian property that assumes the value of the current predictor is sufficient in relating to the next level variable, i.e., the relationship is independent of the specific value of the preceding-level variable to the current predictor, given the current value.

## Usage

 1 2 sgSEMp1(x, predictor = NULL, response = NULL, nlsInits = data.frame(a1 = 1, a2 = 1, a3 = 1)) 

## Arguments

 x A dataframe, requiring at least 2 columns. By default, its first column stores the main or primary influencing predictor, or exogenous variable, e.g. time, or a main predictor. The second column stores the response variable, and other columns store intermediate variables. predictor A character string of the column name of the main predictor OR a numeric number indexing the column of the main predictor. response A character string of the column name of the main response OR a numeric number indexing the column of the main response. nlsInits A data frame of initial vectors for the nonlinear least square procedure, nls(). Each column corresponds to a sequence of initial values for one coefficient. The data frame can be generated by the genInit() function. Each row is one initial vector for all coefficients. Currently the only nls function included is y = a + b * exp(c * x).

## Details

sgSEMp1 builds a network model of interfacing multiple continuous variables. Each pair of variables is fitted by one of the optimal relationships selected from 6 pre-determined functional forms, representing the sensible models commonly used in (energy) degradation science. They are:

• 1. Simple Linear(SL): y = a + b * x

• 2. Quadratic(Quad): y = a + b * x + c * x^2

• 4. Exponential(Exp): y = a + b * exp^x

• 5. Logarithm(Log): y = a + b * log(x)

• 6. Nonlinearizable(nls): y = a + b * exp(c * x)

Adjusted R-squared is used for model selection for every pair.

P-values reported in the "res.print" field of the return list are associated with the tests of the coefficients (a,b) and c as appropriate in the chosen model from the 6 candidates. In the case of polynomial model, the p-values are arranged in the order of increasing exponents. For example, in the quadratic functional form y ~ a + bx + cx^2, the three P-values correspond to those of \hat_a, \hat_b and \hat_c, respectively. If there are less than 3 coefficients to estimate, the extra P-value field is filled with NA's.

## Value

An object of class sgSEMp1, which is a list of the following items:

• "Graph": A network graph that contains the univariate relationships between response and predictors determined by principle 1.

• "table": A matrix. For each row, first column is the response variable, second column is the predictor, the other columns show corresponding summary information: The optimal functional form, R-squared, adj-R-squared, P-value1, P-value2 and P-value3. See details.

• "bestModels": A matrix. First dimension indicates predictors. The second dimension indicates response variables. The i-jth cell of the matrix stores the name of the best functional form corresponding to the j-th response variable regressed on the i-th predictor.

• "allModels": A three dimensional array, indexed by [I, J, K], for all the models fitted to the n by p data set. The first dimension "I" indexes the predictor included in the model, and accepts integers 1 to p for one of the p variables; thus a value of "I=i" indicates using the ith variable in the data as the predictor. The second dimension "J" indexes the variable used as the response variable. The third dimension "K" specifies the fitting result of one of the 6 functional forms: 1=SL, 2=Quad, 3=SQuad, 4=Exp, 5=Log, 6=nls. The i-j-k-th cell of the list stores a "lm" object, corresponding to the j-th response, i-th predictor and the k-th functional form.

The object has two added attributes:

• "attr(res.best, "Step")": A vector. For each variable, it shows in which step it is chosen to be significantly related to the response variable.

• "attr(res.best, "diag.Step")": A matrix. First dimension is for predictors; second dimension is for response variables. Each cell shows in which step the pairwise relation is being fitted.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 ## Load the built-in sample acrylic data set data(acrylic) ## Run semi-gSEM principle one ans <- sgSEMp1(acrylic, predictor = "IrradTot", response = "YI") ## Plot the result plot(ans) #Default cutoff value for a solid path in the resulting graph is 0.2. ## Plot result with different R-sqr cutoff plot(ans, cutoff = 0.4) ## Summary summary(ans) ## Extract relations between IrradTot and YI cf <- path(ans, from = "IrradTot", to = "YI") print(cf) ## Print three components of the result ans$table ans$bestModels ans$allModels ## Checking fitting result of YI by IrradTot using the exponential model summary(ans$allModel[[1,2,4]])