Nothing
##' This function carries out gSEM principle 1. Principle 1 determines the univariate relationships in the spirit of the Markovian process. The relationship between each pair of system elements, including predictors and the system level response, is determined with the Markovian property that assumes the value of the current predictor is sufficient in relating to the next level variable, i.e., the relationship is independent of the specific value of the preceding-level variable to the current predictor, given the current value.
##'
##' sgSEMp1 builds a network model of interfacing multiple continuous variables. Each pair of variables is fitted by one of the optimal relationships selected from 6 pre-determined functional forms, representing the sensible models commonly used in (energy) degradation science. They are:
##' \itemize{
##' \item 1. Simple Linear(SL): y = a + b * x
##' \item 2. Quadratic(Quad): y = a + b * x + c * x^2
##' \item 3. Simple Quadratic(SQuad): y = a + b * x^2
##' \item 4. Exponential(Exp): y = a + b * exp^x
##' \item 5. Logarithm(Log): y = a + b * log(x)
##' \item 6. Nonlinearizable(nls): y = a + b * exp(c * x)
##' }
##' Adjusted R-squared is used for model selection for every pair.
##'
##' P-values reported in the "res.print" field of the return list are associated with the tests of the coefficients (a,b) and c as appropriate in the chosen model from the 6 candidates. In the case of polynomial model, the p-values are arranged in the order of increasing exponents. For example, in the quadratic functional form y ~ a + bx + cx^2, the three P-values correspond to those of \\hat_a, \\hat_b and \\hat_c, respectively. If there are less than 3 coefficients to estimate, the extra P-value field is filled with NA's.
##'
##' @title Semi-supervised Generalized Structural Equation Modelling (gSEM) - Principle 1
##' @param x A dataframe, requiring at least 2 columns. By default, its first column stores the main or primary influencing predictor, or exogenous variable, e.g. time, or a main predictor. The second column stores the response variable, and other columns store intermediate variables.
##' @param predictor A character string of the column name of the main predictor OR a numeric number indexing the column of the main predictor.
##' @param response A character string of the column name of the main response OR a numeric number indexing the column of the main response.
##' @param nlsInits A data frame of initial vectors for the nonlinear least square procedure, nls(). Each column corresponds to a sequence of initial values for one coefficient. The data frame can be generated by the genInit() function. Each row is one initial vector for all coefficients. Currently the only nls function included is y = a + b * exp(c * x).
##' @return An object of class sgSEMp1, which is a list of the following items:
##'
##' \itemize{
##' \item "Graph": A network graph that contains the univariate relationships between response and predictors determined by principle 1.
##' \item "table": A matrix. For each row, first column is the response variable, second column is the predictor, the other columns show corresponding summary information: The optimal functional form, R-squared, adj-R-squared, P-value1, P-value2 and P-value3. See details.
##' \item "bestModels": A matrix. First dimension indicates predictors. The second dimension indicates response variables. The i-jth cell of the matrix stores the name of the best functional form corresponding to the j-th response variable regressed on the i-th predictor.
##' \item "allModels": A three dimensional array, indexed by [I, J, K], for all the models fitted to the n by p data set. The first dimension "I" indexes the predictor included in the model, and accepts integers 1 to p for one of the p variables; thus a value of "I=i" indicates using the ith variable in the data as the predictor. The second dimension "J" indexes the variable used as the response variable. The third dimension "K" specifies the fitting result of one of the 6 functional forms: 1=SL, 2=Quad, 3=SQuad, 4=Exp, 5=Log, 6=nls. The i-j-k-th cell of the list stores a "lm" object, corresponding to the j-th response, i-th predictor and the k-th functional form.
##' }
##'
##' The object has two added attributes:
##'
##' \itemize{
##'
##' \item "attr(res.best, "Step")": A vector. For each variable, it shows in which step it is chosen to be significantly related to the response variable.
##'
##' \item "attr(res.best, "diag.Step")": A matrix. First dimension is for predictors; second dimension is for response variables. Each cell shows in which step the pairwise relation is being fitted.
##' }
##' @seealso sgSEMp2() and plot.sgSEMp1()
##'
##' @export
##' @importFrom 'stats' 'residuals'
##' @examples
##' ## Load the built-in sample acrylic data set
##' data(acrylic)
##'
##' ## Run semi-gSEM principle one
##' ans <- sgSEMp1(acrylic, predictor = "IrradTot", response = "YI")
##'
##' ## Plot the result
##' plot(ans) #Default cutoff value for a solid path in the resulting graph is 0.2.
##'
##' ## Plot result with different R-sqr cutoff
##' plot(ans, cutoff = 0.4)
##'
##' ## Summary
##' summary(ans)
##'
##' ## Extract relations between IrradTot and YI
##' cf <- path(ans, from = "IrradTot", to = "YI")
##' print(cf)
##'
##' ## Print three components of the result
##' ans$table
##' ans$bestModels
##' ans$allModels
##'
##' ## Checking fitting result of YI by IrradTot using the exponential model
##' summary(ans$allModel[[1,2,4]])
sgSEMp1 <- function(x,
predictor = NULL,
response = NULL,
nlsInits = data.frame(a1 = 1, a2 = 1, a3 = 1)){
###############################
### Checking predictor and response
###############################
if(!missing(predictor)){
if(!is.character(predictor)){
if(is.wholenumber(predictor)){
if(predictor < 1 | predictor > length(colnames(x)))
stop("predictor location out of range!")
predictor.loc <- predictor
}
}else{ if(!(predictor %in% colnames(x))){
stop(paste0("predictor '", predictor, "' does not exist!"))
}
predictor.loc <- which(colnames(x) == predictor)
}
neworder <- 1:length(colnames(x))
neworder[1] <- predictor.loc
neworder[predictor.loc] <- 1
x <- x[neworder]
}else{
predictor <- names(x)[1]
}
if(!missing(response)){
response.loc <- which(colnames(x) == response)
if(!is.character(response)){
if(is.wholenumber(response)){
if(response < 1 | response > length(colnames(x)))
stop("Response location out of range!")
response.loc <- response
}
}else{
if(!(response %in% colnames(x))){
stop(paste0("Response '", response, "' does not exist!"))
}
response.loc <- which(colnames(x) == response)
}
neworder <- 1:length(colnames(x))
neworder[2] <- response.loc
neworder[response.loc] <- 2
x <- x[neworder]
} else{
response <- names(x)[2]
}
###############################
### The following function fits paired relations
###############################
find.relation <- function(iVar, iResp, criterion = "adj.R2"){
## saves all adjusted R2 for all 8 functional forms
adj.R2 <- rep(0, length(modelNames))
## Save "lm" object for all 8 functional forms
model.res <- vector("list", length(modelNames))
Resp.v <- x[iResp] # data of response variable #Resp.v <- x[[iResp]]
Var.v <- x[iVar] # data of predictor # Var.v <- x[[iVar]]
Resp <- colnames(x)[iResp] # name of response variable
Var <- colnames(x)[iVar] # name of predictor
cat("Working on ", Resp, "~", Var, "\n")
lm4.flag <- TRUE
lm5.flag <- TRUE
##-------------------------
## Functional Form 1: Linear
##-------------------------
Rel1 <- paste0(Resp, "~", Var)
lm1 <- do.call("lm", list(Rel1, data=as.name("x")))
lm1_step <- stepAIC(lm1,trace=F)
if (length(lm1$coef)!=length(lm1_step$coef)) {
if ( length(which(lm1_step$coef==lm1$coef[1]))>0 ) {
if ( length(which(lm1_step$coef==lm1$coef[2]))>0 ) {
adj.R2[1] <- summary(lm1)$adj.r.squared
model.res[[1]] <- lm1
Res.all[[iVar, iResp, "SL"]] <<- lm1 # Res.all[[iVar, iResp, "SL"]] <<- lm1
} else {
adj.R2[7] <- summary(lm1_step)$adj.r.squared
model.res[[7]] <- lm1_step
Res.all[[iVar, iResp, "Con"]] <<- lm1_step # Res.all[[iVar, iResp, "SL"]] <<- lm1
}
} else {
if ( length(which(lm1_step$coef==lm1$coef[2]))>0 ) {
adj.R2[1] <- summary(lm1_step)$adj.r.squared
model.res[[1]] <- lm1_step
Res.all[[iVar, iResp, "SL"]] <<- lm1_step # Res.all[[iVar, iResp, "SL"]] <<- lm1
} else {
adj.R2[7] <- NA
model.res[[7]] <- NA
Res.all[[iVar, iResp, "Con"]] <<- NA # Res.all[[iVar, iResp, "SL"]] <<- lm1
}
}
} else {
adj.R2[1] <- summary(lm1)$adj.r.squared
model.res[[1]] <- lm1
Res.all[[iVar, iResp, "SL"]] <<- lm1 # Res.all[[iVar, iResp, "SL"]] <<- lm1
}
##-------------------------
## Functional Form 2: Quadratic
##-------------------------
Rel2 <- paste0(Resp,"~",Var,"+","I(", Var,"^2)")
lm1 <- do.call("lm", list(Rel2, data = as.name("x")))
lm1_step <- stepAIC(lm1,trace=F)
if (length(lm1$coef)!=length(lm1_step$coef)) {
if ( length(which(lm1_step$coef==lm1$coef[1]))>0 ) {
if ( length(which(lm1_step$coef==lm1$coef[2]))>0 ) {
if ( length(which(lm1_step$coef==lm1$coef[3]))>0 ) {
adj.R2[2] <- summary(lm1)$adj.r.squared
model.res[[2]] <- lm1
Res.all[[iVar, iResp, "Quad"]] <<- lm1 # Res.all[[iVar, iResp, "SL"]] <<- lm1
} else {
adj.R2[1] <- summary(lm1_step)$adj.r.squared
model.res[[1]] <- lm1_step
Res.all[[iVar, iResp, "SL"]] <<- lm1_step # Res.all[[iVar, iResp, "SL"]] <<- lm1
}
} else {
if ( length(which(lm1_step$coef==lm1$coef[3]))>0 ) {
adj.R2[3] <- summary(lm1_step)$adj.r.squared
model.res[[3]] <- lm1_step
Res.all[[iVar, iResp, "SQuad"]] <<- lm1_step # Res.all[[iVar, iResp, "SL"]] <<- lm1
} else {
adj.R2[7] <- summary(lm1_step)$adj.r.squared
model.res[[7]] <- lm1_step
Res.all[[iVar, iResp, "Con"]] <<- lm1_step # Res.all[[iVar, iResp, "SL"]] <<- lm1
}
}
} else {
if ( length(which(lm1_step$coef==lm1$coef[2]))>0 ) {
if ( length(which(lm1_step$coef==lm1$coef[3]))>0 ) {
adj.R2[2] <- summary(lm1_step)$adj.r.squared
model.res[[2]] <- lm1_step
Res.all[[iVar, iResp, "Quad"]] <<- lm1_step # Res.all[[iVar, iResp, "SL"]] <<- lm1
} else {
adj.R2[1] <- summary(lm1_step)$adj.r.squared
model.res[[1]] <- lm1_step
Res.all[[iVar, iResp, "SL"]] <<- lm1_step # Res.all[[iVar, iResp, "SL"]] <<- lm1
}
} else {
if ( length(which(lm1_step$coef==lm1$coef[3]))>0 ) {
adj.R2[3] <- summary(lm1_step)$adj.r.squared
model.res[[3]] <- lm1_step
Res.all[[iVar, iResp, "SQuad"]] <<- lm1_step # Res.all[[iVar, iResp, "SL"]] <<- lm1
} else {
adj.R2[7] <- NA
model.res[[7]] <- NA
Res.all[[iVar, iResp, "Con"]] <<- NA # Res.all[[iVar, iResp, "SL"]] <<- lm1
}
}
}
} else {
adj.R2[2] <- summary(lm1)$adj.r.squared
model.res[[2]] <- lm1
Res.all[[iVar, iResp, "Quad"]] <<- lm1 # Res.all[[iVar, iResp, "SL"]] <<- lm1
}
##-------------------------
## Functional Form 3: Quadratic (no linear term)
##-------------------------
Rel3 <- paste0(Resp, "~", "I(", Var,"^2)")
lm1 <- do.call("lm", list(Rel3, data = as.name("x")))
lm1_step <- stepAIC(lm1,trace=F)
if (length(lm1$coef)!=length(lm1_step$coef)) {
if ( length(which(lm1_step$coef==lm1$coef[1]))>0 ) {
if ( length(which(lm1_step$coef==lm1$coef[2]))>0 ) {
adj.R2[3] <- summary(lm1)$adj.r.squared
model.res[[3]] <- lm1
Res.all[[iVar, iResp, "SQuad"]] <<- lm1 # Res.all[[iVar, iResp, "SL"]] <<- lm1
} else {
adj.R2[7] <- summary(lm1_step)$adj.r.squared
model.res[[7]] <- lm1_step
Res.all[[iVar, iResp, "Con"]] <<- lm1_step # Res.all[[iVar, iResp, "SL"]] <<- lm1
}
} else {
if ( length(which(lm1_step$coef==lm1$coef[2]))>0 ) {
adj.R2[3] <- summary(lm1_step)$adj.r.squared
model.res[[3]] <- lm1_step
Res.all[[iVar, iResp, "SQuad"]] <<- lm1_step # Res.all[[iVar, iResp, "SL"]] <<- lm1
} else {
adj.R2[1] <- NA
model.res[[1]] <- NA
Res.all[[iVar, iResp, "SL"]] <<- NA # Res.all[[iVar, iResp, "SL"]] <<- lm1
}
}
} else {
adj.R2[3] <- summary(lm1)$adj.r.squared
model.res[[3]] <- lm1
Res.all[[iVar, iResp, "SQuad"]] <<- lm1 # Res.all[[iVar, iResp, "SL"]] <<- lm1
}
##-------------------------
## Functional Form 4: Exponential
##-------------------------
lm4.flag <- all(exp(Var.v[!is.na(Var.v)]) != Inf)
if(lm4.flag){
Rel4 <- paste0(Resp, "~", "exp(", Var,")")
lm1 <- do.call("lm", list(Rel4, data = as.name("x")))
lm1_step <- stepAIC(lm1,trace=F)
if (length(lm1$coef)!=length(lm1_step$coef)) {
if ( length(which(lm1_step$coef==lm1$coef[1]))>0 ) {
if ( length(which(lm1_step$coef==lm1$coef[2]))>0 ) {
adj.R2[4] <- summary(lm1)$adj.r.squared
model.res[[4]] <- lm1
Res.all[[iVar, iResp, "Exp"]] <<- lm1 # Res.all[[iVar, iResp, "SL"]] <<- lm1
} else {
adj.R2[7] <- summary(lm1_step)$adj.r.squared
model.res[[7]] <- lm1_step
Res.all[[iVar, iResp, "Con"]] <<- lm1_step # Res.all[[iVar, iResp, "SL"]] <<- lm1
}
} else {
if ( length(which(lm1_step$coef==lm1$coef[2]))>0 ) {
adj.R2[4] <- summary(lm1_step)$adj.r.squared
model.res[[4]] <- lm1_step
Res.all[[iVar, iResp, "Exp"]] <<- lm1_step # Res.all[[iVar, iResp, "SL"]] <<- lm1
} else {
adj.R2[1] <- NA
model.res[[1]] <- NA
Res.all[[iVar, iResp, "SL"]] <<- NA # Res.all[[iVar, iResp, "SL"]] <<- lm1
}
}
} else {
adj.R2[4] <- summary(lm1)$adj.r.squared
model.res[[4]] <- lm1
Res.all[[iVar, iResp, "Exp"]] <<- lm1 # Res.all[[iVar, iResp, "SL"]] <<- lm1
}
}
##-------------------------
## Functional Form 5: Log
##-------------------------
lm5.flag <- all(Var.v > 0 & Var.v < Inf, na.rm = TRUE)
if(lm5.flag){
Rel5 <- paste0(Resp, "~", "log(", Var,")")
lm1 <- do.call("lm", list(Rel5, data=as.name("x")))
lm1_step <- stepAIC(lm1,trace=F)
if (length(lm1$coef)!=length(lm1_step$coef)) {
if ( length(which(lm1_step$coef==lm1$coef[1]))>0 ) {
if ( length(which(lm1_step$coef==lm1$coef[2]))>0 ) {
adj.R2[5] <- summary(lm1)$adj.r.squared
model.res[[5]] <- lm1
Res.all[[iVar, iResp, "Log"]] <<- lm1 # Res.all[[iVar, iResp, "SL"]] <<- lm1
} else {
adj.R2[7] <- summary(lm1_step)$adj.r.squared
model.res[[7]] <- lm1_step
Res.all[[iVar, iResp, "Con"]] <<- lm1_step # Res.all[[iVar, iResp, "SL"]] <<- lm1
}
} else {
if ( length(which(lm1_step$coef==lm1$coef[2]))>0 ) {
adj.R2[5] <- summary(lm1_step)$adj.r.squared
model.res[[5]] <- lm1_step
Res.all[[iVar, iResp, "Log"]] <<- lm1_step # Res.all[[iVar, iResp, "SL"]] <<- lm1
} else {
adj.R2[1] <- NA
model.res[[1]] <- NA
Res.all[[iVar, iResp, "SL"]] <<- NA # Res.all[[iVar, iResp, "SL"]] <<- lm1
}
}
} else {
adj.R2[5] <- summary(lm1)$adj.r.squared
model.res[[5]] <- lm1
Res.all[[iVar, iResp, "Log"]] <<- lm1 # Res.all[[iVar, iResp, "SL"]] <<- lm1
}
}
##-------------------------
## Functional Form 6: nls
##-------------------------
## if(Resp == "G" & Var == "time") browser()
Rel6 <- paste0(Resp, " ~ ", "a1 + a2 * exp(a3 * ",Var, ")")
model6 <- NA
SSE <- NA
## Find the best (if any) nls models from all initial values
for(i in 1:nrow(nlsInits)){
nls1 <- tryCatch({
do.call("nls", list(Rel6, data = as.name("x"),
start = list(a1 = nlsInits[i, 1],
a2 = nlsInits[i,2], a3 = nlsInits[i,3])))
}, error = function(e) e)
if(inherits(nls1, "nls")){
## if model6 is still NA, initialize it
if(!inherits(model6, "nls")){
model6 <- nls1
SSE <- sum(residuals(nls1) ^ 2)
init <- nlsInits[i,]
}else{
newSSE <- sum(residuals(nls1) ^ 2)
if(newSSE < SSE){
model6 <- nls1
SSE <- newSSE
init <- nlsInits[i,]
}else{}
}
}
}
## Assign result
if(!inherits(model6, "nls")){
adj.R2[6] <- -Inf
model.res[[6]] <- NA
Res.all[[iVar, iResp, "nls"]] <<- NA # Res.all[[iVar, iResp, "nls"]] <<- NA
}else{
R2s <- nlsR2(model6, y = x[[Resp]], p = 3)
adj.R2[6] <- R2s$adjR2
model.res[[6]] <- model6
Res.all[[iVar, iResp, "nls"]] <<- model6 #Res.all[[iVar, iResp, "nls"]] <<- model6
}
##-------------------------
## Functional Form 7: Change Point
##-------------------------
## change <- 0.63
## change <- min(Var.v) + (change * (max(Var.v) - min(Var.v)))
## Var.new <- as.numeric(Var.v > change)
## x$Var.plus <- (Var.v - change) * Var.new
## Rel6 <- paste0(Resp,"~", Var, "+", "Var.plus")
## browser()
## lm6 <- do.call("lm", list(Rel6, data = as.name("x")))
## adj.R2[6] <- summary(lm6)$adj.r.squared
## model.res[[6]] <- lm6
## Res.all[[iVar, iResp, "CPSLSL"]] <<- lm6
## fm3DNase1 <- nls(density ~ Asym/(1 + exp((xmid - log(conc))/scal)),
## data = DNase1,
## start = list(Asym = 3, xmid = 0, scal = 1))
##Functional Form 8:
## Choose the best model, if no model works, write -1.
## Populate the 'table' item in function return.
## Populate the 'bestModels' item in function return.
if(max(adj.R2, na.rm = TRUE) >= 0.001){
step <- attributes(Res.best)$Step[iResp] + 1
attributes(Res.best)$Step[iVar] <<- min(step, attributes(Res.best)$Step[iVar]) ## <<
attributes(Res.best)$diag.Step[iVar, iResp] <<- step ## <<
nbest <- which.max(adj.R2) # location of best model
Res.best[iVar, iResp] <<- modelNames[nbest] ## <<
## A row for the print table
table1r <- matrix(NA, nrow = 1, ncol = nRes)
colnames(table1r) <- c("Response", "Variable",
"Model", "R-Sqr", "adj-R-Sqr",
"Pval1", "Pval2", "Pval3")
table1r <- as.data.frame(table1r)
table1r[1, c("Response", "Variable")] = c(Resp, Var)
table1r[1, c("Model")] <- modelNames[nbest]
## if(modelNames[nbest] == "nls") browser()
## Best model from Resp ~ Var
finalModel <- model.res[[nbest]]
if(inherits(finalModel, "lm")){
model <- summary(finalModel)
table1r[1,c("R-Sqr", "adj-R-Sqr", "Pval1", "Pval2")] <-
c(model$r.sq, model$adj.r.sq, model$coef[1,4], model$coef[2,4])
if(modelNames[nbest] == "Quad"){
table1r[1, "Pval3"] <- model$coeff[3, 4]
}else{
table1r[1, "Pval3"] <- NA
}
}else if(inherits(finalModel, "nls")){
table1r[1,c("R-Sqr", "adj-R-Sqr", "Pval1", "Pval2", "Pval3")] <-
c(R2s$R2, R2s$adjR2, NA, NA, NA)
}
Res.print <<- rbind(Res.print, table1r)
}else{
Res.best[iVar, iResp] <<- -1
}
}
###############################
### Main scripts; Above function is called
###############################
nVar <- ncol(x) #total numb of variables
nRVar <- nVar - 1 #total numb of possible response variables
## Current considered functional forms (Order matters!!!)
modelNames <- c("SL", "Quad", "SQuad", "Exp", "Log", "nls","Con")
nModel <- length(modelNames)
## Initiate Res.all
Res.all <- vector( "list", nVar * nVar * nModel) #list stores the final results
dim(Res.all) <- c(nVar, nVar, nModel)
Res.all[,,] <- NA
dimnames(Res.all) <- list(colnames(x), colnames(x), modelNames)
## Initiate a matrix storing final results
Res.best <- matrix(rep(NA, nVar * nVar), nrow = nVar,
dimnames = list(colnames(x), colnames(x)) )
attr(Res.best, "Step") <- c(Inf, 0, rep(Inf, nVar-2))
names(attributes(Res.best)$Step) <- colnames(x)
attr(Res.best, "diag.Step") <- matrix(rep(Inf, nVar*nVar), nrow = nVar)
dimnames(attributes(Res.best)$diag.Step) <- list(colnames(x),colnames(x))
nRes <- 8 # Number of cells in print variable; see below
## matrix stores fianl restults, including information of best model name,
## R-Sqr, adj-R-Sqr, Pval1, Pval2, Pval3
Res.print <- matrix(NA, nrow = 0, ncol = nRes)
colnames(Res.print) <- c("Response", "Variable", "Model", "R-Sqr", "adj-R-Sqr",
"Pval1", "Pval2", "Pval3")
## indicator of possible response variables have already been tested
fitted.flag <- rep(FALSE, nRVar)
## indicator of possible reponse varibals need to be tested
tofit.flag <- c(TRUE, rep(FALSE, nRVar-1))
while(sum(tofit.flag - fitted.flag) != 0){
iResp <- which(tofit.flag != fitted.flag)[1] + 1
iVar <- (1 : nVar)[c(-2, -iResp)]
# lapply(iVar, function(x, y) find.relation(x,y), y = iResp)
for ( iVar_index in 1:length(iVar) ) {
find.relation(iVar[iVar_index],iResp)
}
tofit.flag <- tofit.flag | (!(Res.best[-1, iResp] %in% c(NA, -1)))
fitted.flag[iResp - 1] <- TRUE
}
## Outputs are three tables
res <- list(data = x, predictor = predictor, response = response,
table = Res.print, bestModels = Res.best, allModels = Res.all)
class(res) <- c("sgSEMp1","list")
res$bestModels <- res$bestModels[-2,-1]
res$allModels <- res$allModels[-2,-1,]
invisible(res)
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.