R/sjSDM.R

Defines functions residuals.sjSDM update.sjSDM logLik.sjSDM simulate.sjSDM summary.sjSDM getSe coef.sjSDM predict.sjSDM print.sjSDM sjSDM

Documented in coef.sjSDM getSe logLik.sjSDM predict.sjSDM print.sjSDM residuals.sjSDM simulate.sjSDM sjSDM summary.sjSDM update.sjSDM

#' @title Fitting scalable joint Species Distribution Models (sjSDM)
#'
#' @description 
#' 
#' \code{sjSDM} is used to fit joint Species Distribution models (jSDMs) using the central processing unit (CPU) or the graphical processing unit (GPU). 
#' The default is a multivariate probit model based on a Monte-Carlo approximation of the joint likelihood. 
#' \code{sjSDM} can be used to fit linear but also deep neural networks and supports the well known formula syntax. 
#' 
#' @param Y matrix of species occurrences/responses in range
#' @param env matrix of environmental predictors, object of type \code{\link{linear}} or \code{\link{DNN}}
#' @param biotic defines biotic (species-species associations) structure, object of type \code{\link{bioticStruct}}
#' @param spatial defines spatial structure, object of type \code{\link{linear}} or \code{\link{DNN}}
#' @param family error distribution with link function, see details for supported distributions
#' @param iter number of fitting iterations
#' @param step_size batch size for stochastic gradient descent, if \code{NULL} then step_size is set to: \code{step_size = 0.1*nrow(X)}
#' @param learning_rate learning rate for Adamax optimizer
#' @param se calculate standard errors for environmental coefficients
#' @param sampling number of sampling steps for Monte Carlo integration
#' @param parallel number of cpu cores for the data loader, only necessary for large datasets
#' @param control control parameters for optimizer, see \code{\link{sjSDMControl}}
#' @param device which device to be used, "cpu" or "gpu"
#' @param dtype which data type, most GPUs support only 32 bit floats.
#' @param seed seed for random operations
#' @param verbose `TRUE` or `FALSE`, indicating whether progress should be printed or not
#' 
#' @details 
#' \loadmathjax
#' The function fits per default a multivariate probit model via Monte-Carlo integration (see Chen et al., 2018) of the joint likelihood for all species.
#' 
#' \subsection{Model description}{
#' 
#' The most common jSDM structure describes the site (\mjseqn{i = 1, ..., I})  by species (\mjseqn{j = 1, ..., J}) matrix \mjseqn{Y_{ij}} as a function of
#' environmental covariates \mjseqn{X_{in}}(\mjseqn{n=1,...,N} covariates), and the species-species covariance matrix
#' \mjseqn{\Sigma} accounts for correlations in \mjseqn{e_{ij}}:
#' 
#' \mjsdeqn{g(Z_{ij}) = \beta_{j0} + \Sigma^{N}_{n=1}X_{in}\beta_{nj} + e_{ij}}
#' 
#' with \mjseqn{g(.)} as link function. For the multivariate probit model, the link function is:
#' 
#' \mjsdeqn{Y_{ij}=1(Z_{ij} > 0)}
#' 
#' The probability to observe the occurrence vector \mjseqn{\bf{Y_i}} is:
#' 
#' \mjsdeqn{Pr(\bf{Y}_i|\bf{X}_i\beta, \Sigma) = \int_{A_{iJ}}...\int_{A_{i1}} \phi_J(\bf{Y}_i^{\ast};\bf{X}_i\beta, \Sigma)  dY_{i1}^{\ast}... dY_{iJ}^{\ast}}
#' 
#' in the interval \mjseqn{A_{ij}} with \mjseqn{(-\inf, 0]} if \mjseqn{Y_{ij}=0} and \mjseqn{ [0, +\inf) }  if \mjseqn{Y_{ij}=1}.
#' 
#' and \mjseqn{\phi} being the density function of the multivariate normal distribution. 
#' 
#' The probability of \mjseqn{\bf{Y_i}} requires to integrate over \mjseqn{\bf{Y_i^{\ast}}} which has no closed analytical expression for more than two species
#' which makes the evaluation of the likelihood computationally costly and needs a numerical approximation.
#' The previous equation can be expressed more generally as:
#' 
#' \mjsdeqn{ \mathcal{L}(\beta, \Sigma; \bf{Y}_i, \bf{X}_i) = \int_{\Omega} \prod_{j=1}^J Pr(Y_{ij}|\bf{X}_i\beta+\zeta) Pr(\zeta|\Sigma) d\zeta  }
#' 
#' \code{sjSDM} approximates this integral by \mjseqn{M} Monte-Carlo samples from the multivariate normal species-species covariance. 
#' After integrating out the covariance term, the remaining part of the likelihood can be calculated as in an univariate case and the average 
#' of the \mjseqn{M} samples are used to get an approximation of the integral:
#' 
#' \mjsdeqn{ \mathcal{L}(\beta, \Sigma; \bf{Y}_i, \bf{X}_i) \approx \frac{1}{M} \Sigma_{m=1}^M \prod_{j=1}^J Pr(Y_{ij}|\bf{X}_i\beta+\zeta_m)}
#' 
#' with \mjseqn{ \zeta_m \sim MVN(0, \Sigma)}. 
#' 
#' \code{sjSDM} uses 'PyTorch' to run optionally the model on the graphical processing unit (GPU). Python dependencies needs to be  
#' installed before being able to use the \code{sjSDM} function. We provide a function which installs automatically python and the python dependencies.
#' See \code{\link{install_sjSDM}}, \code{vignette("Dependencies", package = "sjSDM")}
#' 
#' See Pichler and Hartig, 2020 for benchmark results.
#' }
#' 
#' \subsection{Supported distributions}{
#' 
#' Currently supported distributions and link functions, which are :
#' \itemize{
#' \item \code{\link{binomial}}: \code{"probit"} or \code{"logit"}
#' \item \code{\link{poisson}}: \code{"log"} 
#' \item \code{"nbinom"}: \code{"log"} 
#' \item \code{\link{gaussian}}: \code{"identity"} 
#' }
#' }
#' 
#' \subsection{Space}{
#' 
#' We can extend the model to account for spatial auto-correlation between the sites by:
#' 
#' \mjsdeqn{g(Z_{ij}) = \beta_{j0} + \Sigma^{N}_{n=1}X_{in}\beta_{nj} + \Sigma^{M}_{m=1}S_{im}\alpha_{mj} + e_{ij}}
#' 
#' There are two ways to generate spatial predictors \mjseqn{S}:
#' 
#' \itemize{
#' \item trend surface model - using spatial coordinates in a polynomial:
#' 
#'  \code{linear(data=Coords, ~0+poly(X, Y, degree = 2))} 
#'  
#' \item eigenvector spatial filtering - using spatial eigenvectors. 
#'   Spatial eigenvectors can be generated by the \code{\link{generateSpatialEV}} function:
#'   
#'   \code{SPV = generateSpatialEV(Coords)}
#'   
#'   Then we use, for example, the first 20 spatial eigenvectors:
#' 
#'   \code{linear(data=SPV[ ,1:20], ~0+.)}
#' }
#' 
#' It is important to set the intercept to 0 in the spatial term (e.g. via \code{~0+.}) because the intercept is already set in the environmental object.
#' 
#' }
#' 
#' \subsection{Installation}{
#' 
#' \code{\link{install_sjSDM}} should be theoretically able to install conda and 'PyTorch' automatically. If \code{\link{sjSDM}} still does not work after reloading RStudio, you can try to solve this on your following our trouble shooting guide \code{\link{installation_help}}.
#' If the problem remains, please create an issue on \href{https://github.com/TheoreticalEcology/s-jSDM/issues}{issue tracker} with a copy of the \code{\link{install_diagnostic}} output as a quote. 
#' }
#' 
#' @return 
#' An S3 class of type 'sjSDM' including the following components:
#' 
#' \item{cl}{Model call}
#' \item{formula}{Formula object for environmental covariates.}
#' \item{names}{Names of environmental covariates.}
#' \item{species}{Names of species (can be \code{NULL} if columns of Y are not named).}
#' \item{get_model}{Method which builds and returns the underlying 'python' model.}
#' \item{logLik}{negative log-Likelihood of the model and the regularization loss.}
#' \item{model}{The actual model.}
#' \item{settings}{List of model settings, see arguments of \code{\link{sjSDM}}.}
#' \item{family}{Response family.}
#' \item{time}{Runtime.}
#' \item{data}{List of Y, X (and spatial) model matrices.}
#' \item{sessionInfo}{Output of \code{\link{sessionInfo}}.}
#' \item{weights}{List of model coefficients (environmental (and spatial)).}
#' \item{sigma}{Lower triangular weight matrix for the covariance matrix.}
#' \item{history}{History of iteration losses.}
#' \item{se}{Matrix of standard errors, if \code{se = FALSE} the field 'se' is \code{NULL}.}
#' 
#' Implemented S3 methods include \code{\link{summary.sjSDM}}, \code{\link{plot.sjSDM}}, \code{\link{print.sjSDM}}, \code{\link{predict.sjSDM}}, and \code{\link{coef.sjSDM}}. For other methods, see section 'See Also'.
#' 
#' 
#' @references 
#' Chen, D., Xue, Y., & Gomes, C. P. (2018). End-to-end learning for the deep multivariate probit model. arXiv preprint arXiv:1803.08591.
#' 
#' Pichler, M., & Hartig, F. (2021). A new joint species distribution model for faster and more accurate inference of species associations from big community data. Methods in Ecology and Evolution, 12(11), 2159-2173. 
#' 
#' @example /inst/examples/sjSDM-example.R
#' @seealso  \code{\link{getCor}},  \code{\link{getCov}}, \code{\link{update.sjSDM}}, \code{\link{sjSDM_cv}}, \code{\link{DNN}}, \code{\link{plot.sjSDM}}, \code{\link{print.sjSDM}}, \code{\link{predict.sjSDM}}, \code{\link{coef.sjSDM}}, \code{\link{summary.sjSDM}}, \code{\link{simulate.sjSDM}}, \code{\link{getSe}}, \code{\link{anova.sjSDM}}, \code{\link{importance}}
#' 
#' @import checkmate mathjaxr
#' @author Maximilian Pichler
#' @export
sjSDM = function(Y = NULL, 
                 env = NULL,
                 biotic = bioticStruct(),
                 spatial = NULL,
                 family = stats::binomial("probit"),
                 iter = 100L, 
                 step_size = NULL,
                 learning_rate = 0.01, 
                 se = FALSE, 
                 sampling = 100L,
                 parallel = 0L, 
                 control = sjSDMControl(),
                 device = "cpu", 
                 dtype = "float32",
                 seed = 758341678,
                 verbose = TRUE) {
  
  assertMatrix(Y)
  assert(checkMatrix(env), checkDataFrame(env), checkClass(env, "DNN"), checkClass(env, "linear"))
  assert(checkClass(spatial, "DNN"), checkClass(spatial, "linear"), checkNull(spatial))
  assert_class(biotic, "bioticStruct")
  qassert(iter, c("X1[1,)"))
  qassert(step_size, c("X1[1,)", "0"))
  qassert(learning_rate, c("R1(0,)"))
  qassert(sampling, c("X1(1,)"))
  qassert(parallel, c("X1[0,)"))
  qassert(control, c("L"))
  qassert(device, c("S1", "X1[0,)", "I1[0,)"))
  qassert(dtype, "S1")
  
  seed = as.integer(seed)
  
  if( any(apply(Y, 2, var) < .Machine$double.eps) ) warning("No variation in at least one of Y columns detected!")
  
  if(inherits(family, "character")) {
    if(family == "nbinom") {
      family = stats::poisson()
      family$family = "nbinom"
    }
  }
  
  family = check_family(family)
  
  check_module()
  
  out = list()
  
  if(is.numeric(device)) device = as.integer(device)
  
  if(device == "gpu") device = 0L
  
  if(is.matrix(env) || is.data.frame(env)) env = linear(data = env)
  
  out$cl = match.call()
  out$formula = env$formula
  out$names = colnames(env$X)
  out$species = colnames(Y)
  
  ### settings ##
  if(is.null(biotic$df)) {
    
    biotic$df = max(5L, as.integer(floor(ncol(Y) / 2)))
    
  }
  if(is.null(step_size)) step_size = as.integer(floor(nrow(env$X) * 0.1))
  else step_size = as.integer(step_size)
  
  output = as.integer(ncol(Y))
  input = as.integer(ncol(env$X))
  intercept = "(Intercept)" %in% colnames(env$X)
  
  out$get_model = function(){
    model = pkg.env$fa$Model_sjSDM( device = device, dtype = dtype, seed = seed)
    
    if(inherits(env, "DNN")) {
      activation=env$activation
      hidden = as.integer(env$hidden)
      bias = env$bias
    } else {
      hidden = list()
      activation = c("linear")
      bias = list(FALSE)
    }
    
    if(!is.null(env$dropout)) dropout_env = env$dropout
    else dropout_env = -99
    
    model$add_env(input, output, hidden = hidden, activation = activation,bias = bias, l1 = env$l1, l2=env$l2, dropout=dropout_env, intercept=intercept)
    
    if(!is.null(spatial)) {
      
      if(!is.null(spatial$dropout)) dropout_sp = spatial$dropout
      else dropout_sp = -99
      
      if(inherits(spatial, "DNN")) {
        activation_spatial=spatial$activation
        hidden_spatial = spatial$hidden
        bias_spatial = spatial$bias
        model$add_spatial(as.integer(ncol(spatial$X)), output_shape = output, hidden = hidden_spatial, activation = activation_spatial, bias = bias_spatial, l1 = spatial$l1, l2= spatial$l2, dropout=dropout_sp)
      } 
      if(inherits(spatial, "linear")) {
        model$add_spatial(as.integer(ncol(spatial$X)), output_shape = output, l1 = spatial$l1, l2= spatial$l2)
      }
    }
    
    control$optimizer$params$lr = learning_rate
    optimizer = do.call(control$optimizer$ff(), control$optimizer$params)
    alpha = 1.0
    link = family$link
    if(link == "probit") alpha = 1.70169
    
    model$build(df = biotic$df, 
                l1 = biotic$l1_cov, 
                l2 = biotic$l2_cov, 
                reg_on_Diag = biotic$on_diag,
                inverse = biotic$inverse,
                reg_on_Cov = biotic$reg_on_Cov,
                optimizer = optimizer, 
                link = link,
                alpha = alpha,
                diag=biotic$diag,
                scheduler=control$scheduler_boolean,
                patience=control$scheduler_patience,
                mixed = control$mixed,
                factor=control$lr_reduce_factor)
    
    return(model)
  }
  model = out$get_model()
  
  if(is.null(spatial)) {
    time = system.time({model$fit(env$X, Y, batch_size = step_size, 
                                  epochs = as.integer(iter), parallel = parallel, 
                                  sampling = as.integer(sampling),
                                  early_stopping_training=control$early_stopping_training,
                                  verbose = verbose)})[3]
    out$logLik = force_r( model$logLik(env$X, Y,batch_size = step_size,parallel = parallel) )
    if(se && !inherits(env, "DNN")) try({ out$se = t(abind::abind(force_r(model$se(env$X, Y, batch_size = step_size, parallel = parallel)),along=0L)) })
  
  } else {
    time = system.time({model$fit(env$X, Y=Y,SP=spatial$X, batch_size = step_size, 
                                  epochs = as.integer(iter), parallel = parallel, 
                                  sampling = as.integer(sampling),
                                  early_stopping_training=control$early_stopping_training,
                                  verbose = verbose)})[3]
    out$logLik = force_r( model$logLik(env$X, Y, SP=spatial$X, batch_size = step_size,parallel = parallel) )
    if(se && !inherits(env, "DNN")) try({ out$se = t(abind::abind(force_r(model$se(env$X, Y, SP=spatial$X,batch_size = step_size, parallel = parallel, verbose = verbose)),along=0L)) })
    
  }

  if(inherits(env, "linear")) class(out) = c("sjSDM", "linear")
  if(inherits(env, "DNN")) {
    out$env_architecture = parse_nn(model$env)
    class(out) = c("sjSDM", "DNN")
  }
  
  if(inherits(spatial, "DNN")) out$spatial_architecture = parse_nn(model$spatial)
  
  if(!is.null(spatial)) class(out) = c(class(out), "spatial")
    
  out$model = model
  out$settings = list(biotic = biotic, env = env, spatial = spatial,iter = iter, 
                      step_size = step_size,learning_rate = learning_rate, control = control, 
                      parallel = parallel,device = device, dtype = dtype, sampling = sampling,
                      se = se)
  out$family = family
  out$time = time
  out$data = list(X = env$X, Y = Y)
  out$sessionInfo = utils::sessionInfo()
  out$weights = force_r(model$env_weights)
  out$sigma = force_r(model$get_sigma)
  if(out$family$family$family== "nbinom") out$theta = force_r(model$get_theta)
  out$history = force_r(model$history)
  out$spatial_weights = force_r(model$spatial_weights)
  out$spatial = spatial
  out$Null = NULL # ?????
  out$seed = seed
  .n = pkg.env$torch$cuda$empty_cache()
  return(out)
}


#' Print a fitted sjSDM model
#' 
#' @param x a model fitted by \code{\link{sjSDM}}
#' @param ... optional arguments for compatibility with the generic function, no function implemented
#' 
#' @return No return value
#' 
#' @export
print.sjSDM = function(x, ...) {
  cat("sjSDM model, see summary(model) for details \n")
}


#' Predict from a fitted sjSDM model
#' 
#' @param object a model fitted by \code{\link{sjSDM}}
#' @param newdata newdata for predictions
#' @param SP spatial predictors (e.g. X and Y coordinates)
#' @param Y Known occurrences of species, must be a matrix of the original size, species to be predicted must consist of NAs
#' @param type raw or link
#' @param dropout use dropout for predictions or not, only supported for DNNs
#' @param ... optional arguments for compatibility with the generic function, no function implemented
#' 
#' @return Matrix of predictions (sites by species)
#' 
#' @example /inst/examples/predict-example.R
#' 
#' @import checkmate
#' @export
predict.sjSDM = function(object, newdata = NULL, SP = NULL, Y = NULL, type = c("link", "raw"), dropout = FALSE,...) {
  object = checkModel(object)
  
  assert( checkNull(newdata), checkMatrix(newdata), checkDataFrame(newdata) )
  assert( checkNull(SP), checkMatrix(newdata), checkDataFrame(newdata) )
  qassert( dropout, "B1")
  
  if(!is.null(Y)) {
    if(object$family$link == "count") warning("Conditional predictions are available for binomial response only")
    type = "raw"
  }
  
  if(inherits(object, "spatial")) assert_class(object, "spatial")
  
  type = match.arg(type)
  
  if(type == "raw") link = FALSE
  else link = TRUE
  

  
  if(inherits(object, "spatial")) {
    
    
    if(is.null(newdata)) {
      return(force_r( object$model$predict(newdata = object$data$X, SP = object$spatial$X, link=link, dropout = dropout, ...)))
    } else {
      
      if(is.data.frame(newdata)) {
        newdata = stats::model.matrix(object$formula, newdata)
      } else {
        newdata = stats::model.matrix(object$formula, data.frame(newdata))
      }
      
      if(is.data.frame(SP)) {
        sp = stats::model.matrix(object$spatial$formula, SP)
      } else {
        sp = stats::model.matrix(object$spatial$formula, data.frame(SP))
      }
      
    }
    pred = force_r(object$model$predict(newdata = newdata, SP = sp, link=link, dropout = dropout, ...))
    
    
  } else {
    
    if(is.null(newdata)) {
      return(force_r(object$model$predict(newdata = object$data$X, link=link, ...)))
    } else {
      if(is.data.frame(newdata)) {
        newdata = stats::model.matrix(object$formula, newdata)
      } else {
        newdata = stats::model.matrix(object$formula, data.frame(newdata))
      }
    }
    pred = force_r(object$model$predict(newdata = newdata, link=link, dropout = dropout, ...))
  }
  
  if(!is.null(Y)) {
    predictions = pred
    to_predict = which(apply(Y,2,  function(i) any(is.na(i))))
    focal = which(apply(Y,2,  function(i) any(!is.na(i))))
    Y_copy = matrix(NA, nrow(Y), length(to_predict))
    counter = 1
    for(K in to_predict) {
      joint_ll = reticulate::py_to_r(
        pkg.env$fa$MVP_logLik(cbind(1, Y[, focal]), 
                              predictions[,c(K, focal)], 
                              reticulate::py_to_r(object$model$get_sigma)[c(K, focal),],
                              device = object$model$device,
                              individual = TRUE,
                              dtype = object$model$dtype,
                              batch_size = as.integer(object$settings$step_size),
                              alpha = object$model$alpha,
                              link = object$family$link,
                              theta = object$theta[c(K, focal)], ...
        )
      )
      raw_ll = 
        reticulate::py_to_r(
          pkg.env$fa$MVP_logLik(Y[,focal], 
                                predictions[,focal], 
                                reticulate::py_to_r(object$model$get_sigma)[focal,],
                                device = object$model$device,
                                individual = TRUE,
                                dtype = object$model$dtype,
                                batch_size = as.integer(object$settings$step_size),
                                alpha = object$model$alpha,
                                link = object$family$link,
                                theta = object$theta[focal], ...
          )
        ) 
      raw_conditional_ll = -( (-joint_ll) - (-raw_ll ))
      pred_prob = exp(-raw_conditional_ll)
      pred_prob[pred_prob> 1] = 1.0
      pred_prob[pred_prob<0] = 0
      Y_copy[,counter] = pred_prob
      counter = counter + 1
      
    }
    
    pred = Y_copy
  }
  
  
  return(pred)
}



#' Return coefficients from a fitted sjSDM model
#' 
#' @param object a model fitted by \code{\link{sjSDM}}
#' @param ... optional arguments for compatibility with the generic function, no function implemented
#' 
#' @return Matrix of environmental coefficients or list of environmental and spatial coefficients for spatial models. 
#' 
#' @export
coef.sjSDM = function(object, ...) {
  if(inherits(object, "spatial")) {
    return(list(env=object$weights, spatial = object$spatial_weights))
  } else {
    return(object$weights)
  }
}


#' Post hoc calculation of standard errors
#' @param object a model fitted by \code{\link{sjSDM}}
#' @param step_size batch size for stochastic gradient descent
#' @param parallel number of cpu cores for the data loader, only necessary for large datasets
#' 
#' @return The object passed to this function but the \code{object$se} field contains the standard errors now
#' 
#' @export
getSe = function(object, step_size = NULL, parallel = 0L){
  if(!inherits(object, "sjSDM")) stop("object must be of class sjSDM")
  object = checkModel(object)
  if(is.null(step_size)) step_size = object$settings$step_size
  else step_size = as.integer(step_size)
  if(!inherits(object, "spatialRE")) try({ object$se = t(abind::abind(force_r(object$model$se(object$data$X, object$data$Y, batch_size = step_size, parallel = parallel)),along=0L)) })
  else try({ object$se = t(abind::abind(force_r(object$model$se(object$data$X, object$data$Y, object$spatial$re, batch_size = step_size, parallel = parallel)),along=0L)) })
  return(object)
}

#' Return summary of a fitted sjSDM model
#' 
#' @param object a model fitted by \code{\link{sjSDM}}
#' @param ... optional arguments for compatibility with the generic function, no functionality implemented
#' 
#' @return The above matrix is silently returned.
#' 
#' @export
summary.sjSDM = function(object, ...) {

  out = list()
  
  cat("Family: ", object$family$family$family, "\n\n")
  cat("LogLik: ", -object$logLik[[1]], "\n")
  cat("Regularization loss: ", object$logLik[[2]], "\n\n")
  
  if(object$family$family$family == "nbinom") {
    disps = 1+(softplus(object$theta)+0.00001)
    out$disperion = disps
    cat("Dispersion parameters for nbinom", disps, "\n\n")
  }
  
  cov_m = getCov(object)
  cor_m = stats::cov2cor(cov_m)
  
  p_cor = round(cor_m, 3)
  # p_cor[upper.tri(p_cor)] = NULL # TODO - find out what can bet set that is numeric and doesn't show
  colnames(p_cor) = paste0("sp", 1:ncol(p_cor))
  rownames(p_cor) = colnames(p_cor)
  

  if(dim(p_cor)[1] < 50) {
    cat("Species-species correlation matrix: \n\n")
    kk = format(p_cor,nsmall = 4)
    kk[upper.tri(kk)] = ""
    kk = cbind(colnames(p_cor), kk)
    kk = (apply(kk, 1:2, function(i) paste0("\t", i)))
    kk[, ncol(kk)] = paste0(kk[, ncol(kk)], "\n")
    cat(paste0(t(kk), collapse = ""))
    cat("\n\n\n")
  }
  
  if(inherits(object, "spatial")) {
    if(inherits(object$spatial, "linear")) {
      cat("Spatial: \n")
      sp = t(object$spatial_weights[[1]])
      rownames(sp) = colnames(object$spatial$X)
      colnames(sp) = paste0("sp", 1:ncol(sp))
      print(sp)
    } 
    
    if(inherits(object$spatial, "DNN")) {
      cat("Spatial architecture:\n")
      cat(object$spatial_architecture)
    }
    
    cat("\n\n\n")
  }
  
  
  if(inherits(object, "linear")) {
  
      coefs = coef.sjSDM(object)[[1]]
      if(inherits(coefs, "list")) coefs = coefs[[1]]
      env2 = t(coefs)
    
      env = data.frame(env2)
      if(is.null(object$species)) colnames(env) = paste0("sp", 1:ncol(env))
      else colnames(env) = object$species
      rownames(env) = object$names
    
      # TO DO: p-value parsing:
      if(!is.null(object$se)) {
        out$z = env2 / object$se
        out$P = 2*stats::pnorm(abs(out$z),lower.tail = FALSE)
        out$se = object$se
        
        coefmat = cbind(
          as.vector(as.matrix(env)),
          as.vector(as.matrix(out$se)),
          as.vector(as.matrix(out$z)),
          as.vector(as.matrix(out$P))
          )
        colnames(coefmat) = c("Estimate", "Std.Err", "Z value", "Pr(>|z|)")
        rownames(coefmat) = apply(expand.grid( rownames(env), colnames(env)), 1, function(n) paste0(n[2]," ", n[1]))
        stats::printCoefmat(coefmat, signif.stars = getOption("show.signif.stars"), digits = 3)
        out$coefmat = coefmat
      } else {
      
      cat("Coefficients (beta): \n\n")
      if(dim(env)[2] > 50) utils::head(env)
      else print(env)
      } 
    }else {
    
    cat("Env architecture:\n")
    cat(object$env_architecture)
    env = coef.sjSDM(object)
    
  }

  out$coefs = env
  out$logLik = object$logLik
  out$sigma = object$sigma
  out$cov = cov_m
  out$names = list(species = if(is.null(object$species)) paste0("sp", 1:ncol(object$data$Y)) else object$species, env = object$names)
  return(invisible(out))
}


#' Generates simulations from sjSDM model
#'
#' Simulate nsim responses from the fitted model following a multivariate probit model. 
#' So currently only supported for \code{family = stats::binomial("probit")}
#'
#' @param object a model fitted by \code{\link{sjSDM}}
#' @param nsim number of simulations
#' @param seed seed for random number generator
#' @param ... optional arguments for compatibility with the generic function, no functionality implemented
#' 
#' @return Array of simulated species occurrences of dimension order (nsim, sites, species)
#' 
#' @importFrom stats simulate
#' @export
simulate.sjSDM = function(object, nsim = 1, seed = NULL, ...) {
  object = checkModel(object)
  if(!is.null(seed)) {
    set.seed(seed)
    pkg.env$torch$cuda$manual_seed(seed)
    pkg.env$torch$manual_seed(seed)
  }
  pred = predict(object)
  
  if(object$family$family$family == "binomial") {
    sim = apply(pred, 1:2, function(p) stats::rbinom(nsim, 1, p))
  } else if(object$family$family$family == "poisson") {
    sim = apply(pred, 1:2, function(p) stats::rpois(nsim, p))
  } else if(object$family$family$family == "nbinom") {
    theta = 1/(softplus(object$theta)+0.00001)
    
    sim = 
      lapply(1:nsim, function(n) {
          t(sapply(1:nrow(pred), function(i) {
              sapply(1:ncol(pred), function(j) {
                prob = theta[j] / (theta[j] +pred[i,j]) +0.00001
                stats::rnbinom(1, size = theta[j], prob = prob)
                })
            })
          )
        })
    sim = abind::abind(sim, along = 0L)
  }
  return(sim)
}



#' Extract negative-log-Likelihood from a fitted sjSDM model
#'
#' @param object a model fitted by \code{\link{sjSDM}}
#' @param individual returns internal ll structure, mostly for internal useage
#' @param ... optional arguments passed to internal logLik function (only used if \code{individual=TRUE})
#' 
#' @return Numeric value or numeric matrix if individual is true.
#' 
#' @importFrom stats simulate
#' @export
logLik.sjSDM <- function(object, individual=FALSE,...){
  if(!individual) return(object$logLik[[1]])
  else {
    object = checkModel(object)
    if(!inherits(object, "spatial")) return(force_r(object$model$logLik(object$data$X, object$data$Y, individual = TRUE, ...)))
    else return(force_r(object$model$logLik(object$data$X, object$data$Y, object$spatial$X, individual = TRUE, ...)))
  }
}




#' Update and re-fit a model call
#' 
#' @param object of class 'sjSDM'
#' @param env_formula new environmental formula
#' @param spatial_formula new spatial formula
#' @param biotic new biotic config
#' @param ... additional arguments
#' 
#' @return An S3 class of type 'sjSDM'. See \code{\link{sjSDM}} for more information.
#' @export
update.sjSDM = function(object, env_formula = NULL, spatial_formula = NULL, biotic = NULL, ...) {
  
  mf = match.call()
  if(!is.null(env_formula)){
    m = match("env_formula", names(mf))
    if(inherits(mf[m]$env_formula,"name")) mf[m]$env_formula = eval(mf[m]$env_formula, envir = parent.env(environment()))
    env_formula = stats::as.formula(mf[m]$env_formula)
  } else {
    env_formula = object$settings$env$formula
  }
  
  if(!is.null(spatial_formula)){
    m = match("spatial_formula", names(mf))
    if(inherits(mf[m]$spatial_formula, "name")) mf[m]$spatial_formula = eval(mf[m]$spatial_formula, envir = parent.env(environment()))
    spatial_formula = stats::as.formula(mf[m]$spatial_formula)
  } else {
    spatial_formula = object$settings$spatial$formula
  }
  
  if(is.null(biotic)) {
    biotic = object$settings$biotic
  } 
  
  env = object$settings$env
  env$formula = env_formula
  env$X = stats::model.matrix(env_formula, env$data)
  
  if(inherits(object, "spatial")) {
    spatial = object$settings$spatial
    spatial$formula = spatial_formula
    spatial$X = stats::model.matrix(spatial_formula, spatial$data)
  } else {
    spatial = NULL
  }
  
  new_model = sjSDM(object$data$Y, 
                    env = env, 
                    spatial = spatial,
                    biotic = biotic,
                    family = object$family$family, 
                    iter = object$settings$iter, 
                    step_size = object$settings$step_size, 
                    learning_rate = object$settings$learning_rate,
                    sampling = object$settings$sampling,
                    control = object$settings$control,
                    device = object$settings$device, 
                    dtype = object$settings$dtype,
                    se = object$settings$se,
                    ...
  )
  return(new_model)
}


#' Residuals for a sjSDM model
#'
#' Returns residuals for a fitted sjSDM model
#'
#' @param object a model fitted by \code{\link{sjSDM}}
#' @param type residual type. Currently only supports raw
#' @param ... further arguments, not supported yet.
#' @return residuals in the format of the provided community matrix
#' 
#' @export
residuals.sjSDM <- function(object, type = "raw", ...){
  raw = object$data$Y - predict(object, type = "raw")
  return(raw)
}

Try the sjSDM package in your browser

Any scripts or data that you put into this service are public.

sjSDM documentation built on Sept. 11, 2024, 7:18 p.m.