R/spatialfuncs.r

Defines functions get.tau.typed.permute get.tau.permute get.tau.typed.bootstrap get.tau.bootstrap get.tau.ci get.tau.typed get.tau get.theta.typed.permute get.pi.typed.permute get.theta.permute get.pi.permute get.theta.typed.bootstrap get.pi.typed.bootstrap get.theta.bootstrap get.pi.bootstrap get.theta.ci get.pi.ci get.theta.typed get.pi.typed get.theta get.pi

Documented in get.pi get.pi.bootstrap get.pi.ci get.pi.permute get.pi.typed get.pi.typed.bootstrap get.pi.typed.permute get.tau get.tau.bootstrap get.tau.ci get.tau.permute get.tau.typed get.tau.typed.bootstrap get.tau.typed.permute get.theta get.theta.bootstrap get.theta.ci get.theta.permute get.theta.typed get.theta.typed.bootstrap get.theta.typed.permute

##' Generalized version of \code{get.pi}
##'
##' Generalized version of the \code{get.pi} function that takes in an arbitrary function and
##' returns the probability that a point within a particular range of a point of interest shares the relationship
##' specified by the passed in function with that point.
##'
##' @param posmat a matrix with columns x, y and any other named columns
##'    columns needed by \code{fun}
##' @param fun a function that takes in two rows of \code{posmat} and returns:
##' \enumerate{
##'      \item for pairs included in the numerator and denominator
##'      \item for pairs that should only be included in the denominator
##'      \item for pairs that should be ignored all together}
##' Note that names from \code{posmat} are not preserved in calls to \code{fun}, so the columns of the matrix should be
##' referenced numerically
##' so this is not available to the \code{fun}
##' @param r the series of spatial distances (or there maximums) we are
##'          interested in
##' @param r.low the low end of each range, 0 by default
##' @param data.frame logical indicating whether to return results as a data frame (default = TRUE)
##'
##' @return  pi value for each distance range that we look at. Where:
##'
##'\deqn{ \pi(d_1, d_2) = \frac{\sum \boldsymbol{1} (d_{ij} \in [d_1,d_2)) \boldsymbol{1} (f(i,j)=1) }{\sum \sum \boldsymbol{1} [d_{ij} \in (d_1,d_2)) \boldsymbol{1} (f(i,j) \in \{1,2\}) }}
##'
##' @author Justin Lessler and Henrik Salje
##'
##' @family get.pi
##' @family spatialtau
##'
##' @example R/examples/get_pi.R
##'

get.pi <- function(posmat,
                   fun,
                   r = 1,
                   r.low=rep(0,length(r)),
                   data.frame=TRUE) {

  xcol <-  which(colnames(posmat) == "x")
  ycol <- which(colnames(posmat) == "y")

  #check that both columns exist
  if (length(xcol)!=1 & length(ycol)!=1) {
    stop("unique x and y columns must be defined")
  }

  rc <- .Call("get_pi",
              posmat,
              fun,
              r,
              r.low,
              1:nrow(posmat),
              xcol,
              ycol)
  
  if (data.frame == FALSE) {
       return(rc)
  } else if (data.frame == TRUE) {
       return(data.frame(r.low=r.low, r=r, pi=rc))
  }
}


##'
##' Generalized version of \code{get.theta}
##'
##'
##' Generalized version of the \code{get.theta} function that takes in an arbitrary function and
##' returns the odds that a point within a particular range of a point of interest shares the relationship
##' specified by the passed in function with that point.
##'
##' @param posmat a matrix with columns x, y and any other named columns
##'    columns needed by fun
##' @param fun a function that takes in two rows of posmat and returns:
##' \enumerate{
##'      \item  for pairs that are (potentially) related
##'      \item for pairs that are unrelated
##'      \item for pairs that should be ignored all together}
##' Note that names from \code{posmat} are not preserved in calls to \code{fun}, so the columns of the matrix should be
##' referenced numerically
##' so this is not available to the fun
##' @param r the series of spatial distances (or there maximums) we are
##'          interested in
##' @param r.low the low end of each range, 0 by default
##' @param data.frame logical indicating whether to return results as a data frame (default = TRUE)
##'
##' @return  theta value for each distance range that we look at. Where:
##'
##' \deqn{ \theta(d_1,d_2) = \frac{\sum \boldsymbol{1} d_{ij} \in [d_1,d_2)) \boldsymbol{1} (f(i,j)=1) }{\sum \sum \boldsymbol{1} d_{ij} \in [d_1,d_2)) \boldsymbol{1} (f(i,j)=2) }}
##'
##' @author Justin Lessler and Henrik Salje
##'
##' @family get.theta
##' @family spatialtau
##'
##' @example R/examples/get_theta.R
##'

get.theta <- function(posmat,
                      fun,
                      r = 1,
                      r.low=rep(0,length(r)),
                      data.frame=TRUE) {

  xcol <-  which(colnames(posmat)=="x")
  ycol <- which(colnames(posmat)=="y")

  #check that both columns exist
  if (length(xcol)!=1 & length(ycol)!=1) {
    stop("unique x and y columns must be defined")
  }

  rc <- .Call("get_theta",
              posmat,
              fun,
              r,
              r.low,
              1:nrow(posmat),
              xcol,
              ycol)
  
  if (data.frame == FALSE) {
       return(rc)
  } else if (data.frame == TRUE) {
       return(data.frame(r.low=r.low, r=r, theta=rc))
  }
}


##'
##' Optimized version of \code{get.pi} for typed data.
##'
##' Version of the \code{get.pi} function that is optimized for statically typed data. That is
##' data where we are interested in the probability of points within some distance of points of
##' typeA are of typeB.
##'
##' @param posmat a matrix with columns type, x and y
##' @param typeA the "from" type that we are interested in, -1 is wildcard
##' @param typeB the "to" type that we are interested i, -1 is wildcard
##' @param r the series of spatial distances wer are interested in
##' @param r.low the low end of each range....0  by default
##' @param data.frame logical indicating whether to return results as a data frame (default = TRUE)
##'
##' @return pi values for all the distances we looked at
##'
##' @author Justin Lessler and Henrik Salje
##'
##' @family get.pi
##'
##' @example  R/examples/get_pi_typed.R
##'

get.pi.typed <- function(posmat,
                         typeA = -1,
                         typeB = -1,
                         r=1,
                         r.low=rep(0,length(r)),
                         data.frame=TRUE) {

  rc <- .C("get_pi_typed",
            as.integer(posmat[,"type"]),
            as.double(posmat[,"x"]),
            as.double(posmat[,"y"]),
            as.integer(nrow(posmat)),
            as.integer(typeA),
            as.integer(typeB),
            as.double(r.low),
            as.double(r),
            as.integer(length(r)),
            as.integer(1:nrow(posmat)),
            rc=double(length(r)))
     
  if (data.frame == FALSE) {
       return(rc)
  } else if (data.frame == TRUE) {
       return(data.frame(r.low=r.low, r=r, pi=rc$rc))
  }
}


##'
##' Optimized version of \code{get.theta} for typed data.
##'
##' Version of the \code{get.theta} function that is optimized for statically typed data. That is
##' data where we are interested in the odds that points within some distance of points of
##' typeA are of typeB.
##'
##' @param posmat a matrix with columns type, x and y
##' @param typeA the "from" type that we are interested in, -1 is wildcard
##' @param typeB the "to" type that we are interested i, -1 is wildcard
##' @param r the series of spatial distances wer are interested in
##' @param r.low the low end of each range....0  by default
##' @param data.frame logical indicating whether to return results as a data frame (default = TRUE)
##'
##' @return theta values for all the distances we looked at
##'
##' @author Justin Lessler and Henrik Salje
##'
##' @family get.theta
##'
##' @example R/examples/get_theta_typed.R
##'

get.theta.typed <- function(posmat,
                            typeA = -1,
                            typeB = -1,
                            r=1,
                            r.low=rep(0,length(r)),
                            data.frame=TRUE) {
     
     rc <- .C("get_theta_typed",
              as.integer(posmat[,"type"]),
              as.double(posmat[,"x"]),
              as.double(posmat[,"y"]),
              as.integer(nrow(posmat)),
              as.integer(typeA),
              as.integer(typeB),
              as.double(r.low),
              as.double(r),
              as.integer(length(r)),
              as.integer(1:nrow(posmat)),
              rc=double(length(r)))
     
     if (data.frame == FALSE) {
          return(rc)
     } else if (data.frame == TRUE) {
          return(data.frame(r.low=r.low, r=r, theta=rc$rc))
     }
}


##' Calculate bootstrapped confidence intervals for \code{get.pi} values.
##'
##' Wrapper to \code{get.pi.bootstrap} that takes care of calculating the
##' confidence intervals based on the bootstrapped values..
##'
##'
##' @param posmat a matrix with columns type, x and y
##' @param fun the function to decide relationships
##' @param r the series of spatial distances wer are interested in
##' @param r.low the low end of each range. 0 by default
##' @param boot.iter the number of bootstrap iterations
##' @param ci.low the low end of the ci...0.025 by default
##' @param ci.high the high end of the ci...0.975 by default
##' @param data.frame logical indicating whether to return results as a data frame (default = TRUE)
##'
##' @return a matrix with a row for the high and low values and
##'     a column per distance
##'
##' @author Justin Lessler
##'
##' @family get.pi
##'
##' @example R/examples/get_pi_ci.R
##'

get.pi.ci <- function(posmat,
                      fun,
                      r=1,
                      r.low=rep(0,length(r)),
                      boot.iter = 1000,
                      ci.low=0.025,
                      ci.high=0.975,
                      data.frame=TRUE) {
     
  boots <- get.pi.bootstrap(posmat, fun, r, r.low, boot.iter)

  rc <- apply(boots[,-(1:2)], 1, quantile, probs=c(ci.low, ci.high))
  
  if (data.frame == FALSE) {
       return(rc)
  } else if (data.frame == TRUE) {
       return(data.frame(r.low=r.low, 
                         r=r, 
                         pt.est=get.pi(posmat, fun, r, r.low)$pi, 
                         ci.low=rc[1,], 
                         ci.high=rc[2,]))
  }
}
 

##' Calculate bootstrapped confidence intervals for \code{get.theta} values.
##'
##' Wrapper to \code{get.theta.bootstrap} that takes care of calculating the
##' confience intervals based on the bootstrapped values.
##'
##'
##' @param posmat a matrix with columns type, x and y
##' @param fun the function to decide relationships
##' @param r the series of spatial distances we are interested in
##' @param r.low the low end of each range. 0 by default
##' @param boot.iter the number of bootstrap iterations
##' @param ci.low the low end of the ci...0.025 by default
##' @param ci.high the high end of the ci...0.975 by default
##' @param data.frame logical indicating whether to return results as a data frame (default = TRUE)
##'
##' @return a matrix with a row for the high and low values and
##'     a column per distance
##'
##' @author Justin Lessler
##'
##' @family get.theta
##'
##' @example R/examples/get_theta_ci.R
##'

get.theta.ci <- function(posmat,
                         fun,
                         r=1,
                         r.low=rep(0,length(r)),
                         boot.iter = 1000,
                         ci.low=0.025,
                         ci.high=0.975,
                         data.frame=TRUE) {
     
  boots <- get.theta.bootstrap(posmat, fun, r, r.low, boot.iter)

  rc <- apply(boots[,-(1:2)], 1, quantile, probs=c(ci.low, ci.high))
  
  if (data.frame == FALSE) {
       return(rc)
  } else if (data.frame == TRUE) {
       return(data.frame(r.low=r.low, 
                         r=r, 
                         pt.est=get.theta(posmat, fun, r, r.low)$theta, 
                         ci.low=rc[1,], 
                         ci.high=rc[2,]))
  }
}


##' Bootstrap \code{get.pi} values.
##'
##' Runs \code{get.pi} on multiple bootstraps of the data. Is formulated
##' such that the relationships between
##' points and themselves will not be calculated.
##'
##' @param posmat a matrix with columns type, x and y
##' @param fun the function to decide relationships
##' @param r the series of spatial distances we are interested in
##' @param r.low the low end of each range. 0 by default
##' @param boot.iter the number of bootstrap iterations
##' @param data.frame logical indicating whether to return results as a data frame (default = TRUE)
##'
##' @return pi values for all the distances we looked at
##'
##' @note In each bootstrap iteration N observations are drawn from the existing data with replacement. To avoid errors in
##' inference resulting from the same observatin being compared with itself in the bootstrapped data set, original indices
##' are perserved, and pairs of points in the bootstrapped dataset with the same original index are ignored.
##'
##' @author Justin Lessler and Henrik Salje
##'
##' @family get.pi
##'
##' @example R/examples/get_pi_bootstrap.R
##'

get.pi.bootstrap <- function(posmat,
                             fun,
                             r=1,
                             r.low=rep(0,length(r)),
                             boot.iter=500,
                             data.frame=TRUE) {


  xcol <-  which(colnames(posmat)=="x")
  ycol <- which(colnames(posmat)=="y")

  #check that both columns exist
  if (length(xcol)!=1 & length(ycol)!=1) {
    stop("unique x and y columns must be defined")
  }

  rc <- matrix(nrow=boot.iter, ncol=length(r))
  for (i in 1:boot.iter) {
    inds <- sample(nrow(posmat), replace=T)
    rc[i,] <- .Call("get_pi",
                    posmat[inds,],
                    fun,
                    r,
                    r.low,
                    inds,
                    xcol,
                    ycol)
  }
  
  if (data.frame == FALSE) {
       return(rc)
  } else if (data.frame == TRUE) {
       return(data.frame(r.low=r.low, r=r, t(rc)))
  }
}


##' Bootstrap \code{get.theta} values.
##'
##' Runs \code{get.theta} on multiple bootstraps of the data. Is formulated
##' such that the relationships between
##' points and themselves will not be calculated.
##'
##' @param posmat a matrix with columns type, x and y
##' @param fun the function to decide relationships
##' @param r the series of spatial distances we are interested in
##' @param r.low the low end of each range. 0 by default
##' @param boot.iter the number of bootstrap iterations
##' @param data.frame logical indicating whether to return results as a data frame (default = TRUE)
##'
##' @return theta values for all the distances we looked at
##'
##' @note In each bootstrap iteration N observations are drawn from the existing data with replacement. To avoid errors in
##' inference resulting from the same observatin being compared with itself in the bootstrapped data set, original indices
##' are perserved, and pairs of points in the bootstrapped dataset with the same original index are ignored.
##'
##' @author Justin Lessler and Henrik Salje
##'
##' @family get.theta
##'
##' @example R/examples/get_theta_bootstrap.R
##'

get.theta.bootstrap <- function(posmat,
                                fun,
                                r=1,
                                r.low=rep(0,length(r)),
                                boot.iter=500,
                                data.frame=TRUE) {


  xcol <-  which(colnames(posmat)=="x")
  ycol <- which(colnames(posmat)=="y")

  #check that both columns exist
  if (length(xcol)!=1 & length(ycol)!=1) {
    stop("unique x and y columns must be defined")
  }

  rc <- matrix(nrow=boot.iter, ncol=length(r))
  for (i in 1:boot.iter) {
    inds <- sample(nrow(posmat), replace=T)
    rc[i,] <- .Call("get_theta",
                    posmat[inds,],
                    fun,
                    r,
                    r.low,
                    inds,
                    xcol,
                    ycol)
  }
  
  if (data.frame == FALSE) {
       return(rc)
  } else if (data.frame == TRUE) {
       return(data.frame(r.low=r.low, r=r, t(rc)))
  }
}


##' runs bootstrapping on \code{get.pi.typed}
##'
##' Bootstraps typed pi values. Makes sure distances between a sample and
##' another draw of itself are left out
##'
##' @param boot.iter the number of bootstrap iterations
##' @param posmat a matrix with columns type, x and y
##' @param typeA the "from" type that we are interested in, -1 is wildcard
##' @param typeB the "to" type that we are interested i, -1 is wildcard
##' @param r the series of spatial distances we are interested in
##' @param r.low the low end of each range....0  by default
##' @param data.frame logical indicating whether to return results as a data frame (default = TRUE)
##'
##' @return pi values for all the distances we looked at
##'
##' @family get.pi
##'
##' @example R/examples/get_pi_typed_bootstrap.R
##'

get.pi.typed.bootstrap <- function(posmat,
                                   typeA = -1,
                                   typeB = -1,
                                   r=1,
                                   r.low=rep(0,length(r)),
                                   boot.iter,
                                   data.frame=TRUE) {


  rc <- matrix(nrow=boot.iter, ncol=length(r))
  for (i in 1:boot.iter) {
    inds <- sample(nrow(posmat), replace=T)
    rc[i,] <- .C("get_pi_typed",
                 as.integer(posmat[inds,"type"]),
                 as.double(posmat[inds,"x"]),
                 as.double(posmat[inds,"y"]),
                 as.integer(nrow(posmat)),
                 as.integer(typeA),
                 as.integer(typeB),
                 as.double(r.low),
                 as.double(r),
                 as.integer(length(r)),
                 as.integer(inds),
                 rc=double(length(r))
    )$rc
  }
  
  if (data.frame == FALSE) {
       return(rc)
  } else if (data.frame == TRUE) {
       return(data.frame(r.low=r.low, r=r, t(rc)))
  }
}


##' runs bootstrapping on \code{get.theta.typed}
##'
##' Bootstraps typed pi values. Makes sure distances between a sample and
##' another draw of itself are left out
##'
##' @param boot.iter the number of bootstrap iterations
##' @param posmat a matrix with columns type, x and y
##' @param typeA the "from" type that we are interested in, -1 is wildcard
##' @param typeB the "to" type that we are interested i, -1 is wildcard
##' @param r the series of spatial distances we are interested in
##' @param r.low the low end of each range....0  by default
##' @param data.frame logical indicating whether to return results as a data frame (default = TRUE)
##'
##' @return theta values for all the distances we looked at
##'
##' @family get.theta
##'
##' @example R/examples/get_theta_typed_bootstrap.R
##'

get.theta.typed.bootstrap <- function(posmat,
                                      typeA = -1,
                                      typeB = -1,
                                      r=1,
                                      r.low=rep(0,length(r)),
                                      boot.iter,
                                      data.frame=TRUE) {


  rc <- matrix(nrow=boot.iter, ncol=length(r))
  for (i in 1:boot.iter) {
    inds <- sample(nrow(posmat), replace=T)
    rc[i,] <- .C("get_theta_typed",
                 as.integer(posmat[inds,"type"]),
                 as.double(posmat[inds,"x"]),
                 as.double(posmat[inds,"y"]),
                 as.integer(nrow(posmat)),
                 as.integer(typeA),
                 as.integer(typeB),
                 as.double(r.low),
                 as.double(r),
                 as.integer(length(r)),
                 as.integer(inds),
                 rc=double(length(r))
    )$rc
  }
  
  if (data.frame == FALSE) {
       return(rc)
  } else if (data.frame == TRUE) {
       return(data.frame(r.low=r.low, r=r, t(rc)))
  }
}


##' get the null distribution of the \code{get.pi} function
##'
##' Does permutations to calculate the null distribution of get pi
##' if there were no spatial dependence. Randomly reassigns coordinates
##' to each observation permutations times
##'
##' @param posmat a matrix with columns type, x and y
##' @param fun the function to evaluate
##' @param r the series of spatial distances we are interested in
##' @param r.low the low end of each range....0  by default
##' @param permutations the number of permute iterations
##' @param data.frame logical indicating whether to return results as a data frame (default = TRUE)
##'
##' @return pi values for all the distances we looked at
##'
##' @family get.pi
##'
##' @example R/examples/get_pi_permute.R
##'

get.pi.permute <- function(posmat,
                           fun,
                           r=1,
                           r.low=rep(0,length(r)),
                           permutations,
                           data.frame=TRUE) {


  xcol <-  which(colnames(posmat)=="x")
  ycol <- which(colnames(posmat)=="y")

  #check that both columns exist
  if (length(xcol)!=1 & length(ycol)!=1) {
    stop("unique x and y columns must be defined")
  }

  rc <- matrix(nrow=permutations, ncol=length(r))
  for (i in 1:permutations) {
    inds <- sample(nrow(posmat))#, replace=T)
    tmp.posmat <- posmat
    tmp.posmat[,"x"] <- posmat[inds,"x"]
    tmp.posmat[,"y"] <- posmat[inds,"y"]
    rc[i,] <- .Call("get_pi",
                    tmp.posmat,
                    fun,
                    r,
                    r.low,
                    1:nrow(posmat),
                    xcol,
                    ycol)
  }
  
  if (data.frame == FALSE) {
       return(rc)
  } else if (data.frame == TRUE) {
       return(data.frame(r.low=r.low, r=r, t(rc)))
  }
}


##' get the null distribution of the \code{get.theta} function
##'
##' Does permutations to calculate the null distribution of get theta
##' if there were no spatial dependence. Randomly reassigns coordinates
##' to each observation permutations times
##'
##' @param posmat a matrix with columns type, x and y
##' @param fun the function to evaluate
##' @param r the series of spatial distances we are interested in
##' @param r.low the low end of each range....0  by default
##' @param permutations the number of permute iterations
##' @param data.frame logical indicating whether to return results as a data frame (default = TRUE)
##'
##' @return theta values for all the distances we looked at
##'
##' @family get.theta
##'
##' @example R/examples/get_theta_permute.R
##'

get.theta.permute <- function(posmat,
                              fun,
                              r=1,
                              r.low=rep(0,length(r)),
                              permutations,
                              data.frame=TRUE) {


  xcol <-  which(colnames(posmat)=="x")
  ycol <- which(colnames(posmat)=="y")

  #check that both columns exist
  if (length(xcol)!=1 & length(ycol)!=1) {
    stop("unique x and y columns must be defined")
  }

  rc <- matrix(nrow=permutations, ncol=length(r))
  for (i in 1:permutations) {
    inds <- sample(nrow(posmat))#, replace=T)
    tmp.posmat <- posmat
    tmp.posmat[,"x"] <- posmat[inds,"x"]
    tmp.posmat[,"y"] <- posmat[inds,"y"]
    rc[i,] <- .Call("get_theta",
                    tmp.posmat,
                    fun,
                    r,
                    r.low,
                    1:nrow(posmat),
                    xcol,
                    ycol)
  }
  
  if (data.frame == FALSE) {
       return(rc)
  } else if (data.frame == TRUE) {
       return(data.frame(r.low=r.low, r=r, t(rc)))
  }
}


##' get the null distribution of the get.pi.typed function
##'
##' Does permutations to calculate the null distribution of get pi
##' if there were no spatial dependence. Randomly reassigns coordinates
##' to each observation permutations times
##'
##' @param posmat a matrix with columns type, x and y
##' @param typeA the "from" type that we are interested in, -1 is wildcard
##' @param typeB the "to" type that we are interested i, -1 is wildcard
##' @param r the series of spatial distances we are interested in
##' @param r.low the low end of each range....0  by default
##' @param permutations the number of permute iterations
##' @param data.frame logical indicating whether to return results as a data frame (default = TRUE)
##'
##' @return pi values for all the distances we looked at
##'
##' @author Justin Lessler and Henrik Salje
##'
##' @family get.pi
##'
##' @example R/examples/get_pi_typed_permute.R
##'

get.pi.typed.permute <- function(posmat,
                                 typeA = -1,
                                 typeB = -1,
                                 r=1,
                                 r.low=rep(0,length(r)),
                                 permutations,
                                 data.frame=TRUE) {

  xcol <-  which(colnames(posmat)=="x")
  ycol <- which(colnames(posmat)=="y")

  #check that both columns exist
  if (length(xcol)!=1 & length(ycol)!=1) {
    stop("unique x and y columns must be defined")
  }


  rc <- matrix(nrow=permutations, ncol=length(r))
  for (i in 1:permutations) {
    inds <- sample(nrow(posmat))#, replace=T)
    rc[i,] <- .C("get_pi_typed",
                 as.integer(posmat[,"type"]),
                 as.double(posmat[inds,"x"]),
                 as.double(posmat[inds,"y"]),
                 as.integer(nrow(posmat)),
                 as.integer(typeA),
                 as.integer(typeB),
                 as.double(r.low),
                 as.double(r),
                 as.integer(length(r)),
                 as.integer(1:nrow(posmat)),
                 rc=double(length(r))
    )$rc
  }
  
  if (data.frame == FALSE) {
       return(rc)
  } else if (data.frame == TRUE) {
       return(data.frame(r.low=r.low, r=r, t(rc)))
  }
}


##' get the null distribution of the get.theta.typed function
##'
##' Does permutations to calculate the null distribution of get theta
##' if there were no spatial dependence. Randomly reassigns coordinates
##' to each observation permutations times
##'
##' @param posmat a matrix with columns type, x and y
##' @param typeA the "from" type that we are interested in, -1 is wildcard
##' @param typeB the "to" type that we are interested i, -1 is wildcard
##' @param r the series of spatial distances we are interested in
##' @param r.low the low end of each range....0  by default
##' @param permutations the number of permute iterations
##' @param data.frame logical indicating whether to return results as a data frame (default = TRUE)
##'
##' @return theta values for all the distances we looked at
##'
##' @author Justin Lessler and Henrik Salje
##'
##' @family get.theta
##'
##' @example R/examples/get_theta_typed_permute.R
##'

get.theta.typed.permute <- function(posmat,
                                    typeA = -1,
                                    typeB = -1,
                                    r=1,
                                    r.low=rep(0,length(r)),
                                    permutations,
                                    data.frame=TRUE) {

  xcol <-  which(colnames(posmat)=="x")
  ycol <- which(colnames(posmat)=="y")

  #check that both columns exist
  if (length(xcol)!=1 & length(ycol)!=1) {
    stop("unique x and y columns must be defined")
  }


  rc <- matrix(nrow=permutations, ncol=length(r))
  for (i in 1:permutations) {
    inds <- sample(nrow(posmat))#, replace=T)
    rc[i,] <- .C("get_theta_typed",
                 as.integer(posmat[,"type"]),
                 as.double(posmat[inds,"x"]),
                 as.double(posmat[inds,"y"]),
                 as.integer(nrow(posmat)),
                 as.integer(typeA),
                 as.integer(typeB),
                 as.double(r.low),
                 as.double(r),
                 as.integer(length(r)),
                 as.integer(1:nrow(posmat)),
                 rc=double(length(r))
    )$rc
  }
  
  if (data.frame == FALSE) {
       return(rc)
  } else if (data.frame == TRUE) {
       return(data.frame(r.low=r.low, r=r, t(rc)))
  }
}


##' generalized version of \code{get.tau}
##'
##'
##' returns the relative probability (or odds) that points at some distance
##' from an index point share some relationship with that point versus
##' the probability (or odds) any point shares that relationship with that point.
##'
##' @param posmat a matrix with columns x, y and any other named columns
##'    columns needed by fun
##' @param fun a function that takes in two rows of posmat and returns:
##' \enumerate{
##'      \item for pairs included in the numerator (and the denominator for independent data)
##'      \item for pairs that should only be included in the denominator
##'      \item for pairs that should be ignored all together}
##' Note that names from \code{posmat} are not preserved in calls to
##' \code{fun}, so the columns of the matrix should be referenced numerically
##' so this is not available to fun
##' @param r the series of spatial distances (or there maximums) we are
##'          interested in
##' @param r.low the low end of each range, 0 by default
##' @param comparison.type what type of points are included in the comparison set.
##' \itemize{
##'   \item "representative" if comparison set is representative of the underlying population
##'   \item "independent" if comparison set is cases/events coming from an indepedent process
##' }
##' @param data.frame logical indicating whether to return results as a data frame (default = TRUE)
##'
##' @return The tau value for each distance we look at. If \code{comparison.type} is "representative", this is:
##'
##' \code{tau = get.pi(posmat, fun, r, r.low)/get.pi(posmat,fun,infinity,0)}
##'
##' If \code{comparison.type} is "independent", this is:
##'
##' \code{tau = get.theta(posmat, fun, r, r.low)/get.theta(posmat,fun,infinity,0)}
##'
##' @author Justin Lessler and Henrik Salje
##'
##' @family get.tau
##' @family spatialtau
##'
##' @example R/examples/get_tau.R
##'

get.tau <- function(posmat,
                    fun,
                    r = 1,
                    r.low=rep(0,length(r)),
                    comparison.type = "representative",
                    data.frame=TRUE) {

  xcol <-  which(colnames(posmat)=="x")
  ycol <- which(colnames(posmat)=="y")

  #check that both columns exist
  if (length(xcol)!=1 & length(ycol)!=1) {
    stop("unique x and y columns must be defined")
  }

  if (comparison.type == "representative") {
    comp.type.int <- 0
  } else if (comparison.type == "independent") {
    comp.type.int <- 1
  } else {
    stop("unkown comparison type specified")
  }

  rc <- .Call("get_tau",
              posmat,
              fun,
              r,
              r.low,
              comp.type.int,
              1:nrow(posmat),
              xcol,
              ycol)
  
  if (data.frame == FALSE) {
       return(rc)
  } else if (data.frame == TRUE) {
       return(data.frame(r.low=r.low, r=r, tau=rc))
  }
}


##' Optimized version of \code{get.tau} for typed data
##'
##' Version of th e \code{get.tau} function that is optimized for
##' statically typed data. That is data where we want the relationship between
##' points of type A and points of type B
##'
##' @param posmat a matrix with columns type, x and y
##' @param typeA the "from" type that we are interested in, -1 is wildcard
##' @param typeB the "to" type that we are interested i, -1 is wildcard
##' @param r the series of spatial distances we are interested in
##' @param r.low the low end of each range....0  by default
##' @param comparison.type what type of points are included in the comparison set.
##' \itemize{
##'   \item "representative" if comparison set is representative of the underlying population
##'   \item "independent" if comparison set is cases/events coming from an indepedent process
##' }
##' @param data.frame logical indicating whether to return results as a data frame (default = TRUE)
##'
##' @return data frame of tau values for all the distances
##'
##' @author Justin Lessler and Henrik Salje
##'
##' @family get.tau
##'
##' @example R/examples/get_tau_typed.R
##'

get.tau.typed <- function(posmat,
                          typeA = -1,
                          typeB = -1,
                          r=1,
                          r.low=rep(0,length(r)),
                          comparison.type = "representative",
                          data.frame=TRUE) {
     
     if (comparison.type == "representative") {
          comp.type.int <- 0
     } else if (comparison.type == "independent") {
          comp.type.int <- 1
     } else {
          stop("unkown comparison type specified")
     }
     
     rc <- .C("get_tau_typed",
              as.integer(posmat[,"type"]),
              as.double(posmat[,"x"]),
              as.double(posmat[,"y"]),
              as.integer(nrow(posmat)),
              as.integer(typeA),
              as.integer(typeB),
              as.double(r.low),
              as.double(r),
              as.integer(length(r)),
              as.integer(1:nrow(posmat)),
              as.integer(comp.type.int),
              rc=double(length(r)))
     
     if (data.frame == FALSE) {
          return(rc)
     } else if (data.frame == TRUE) {
          return(data.frame(r.low=r.low, r=r, tau=rc$rc))
     }
}


##' Bootstrap confidence interval for the \code{get.tau} values
##'
##' Wrapper to \code{get.tau.bootstrap} that takes care of calulating
##' the confidence intervals based on the bootstrapped values
##'
##' @param posmat a matrix appropriate for input to \code{get.tau}
##' @param fun a function appropriate as input to \code{get.pi}
##' @param r the series of spatial distances we are interested in
##' @param r.low the low end of each range....0  by default
##' @param boot.iter the number of bootstrap iterations
##' @param comparison.type the comparison type to pass to get.tau
##' @param ci.low the low end of the ci...0.025 by default
##' @param ci.high the high end of the ci...0.975 by default
##' @param data.frame logical indicating whether to return results as a data frame (default = TRUE)
##'
##' @return a data frame with the point estimate of tau and its low and high confidence interval at each distance
##'
##' @author Justin Lessler and Henrik Salje
##'
##' @family get.tau
##'
##' @example R/examples/get_tau_ci.R
##'

get.tau.ci <- function(posmat,
                       fun,
                       r=1,
                       r.low=rep(0,length(r)),
                       boot.iter = 1000,
                       comparison.type = "representative",
                       ci.low=0.025,
                       ci.high=0.975,
                       data.frame=TRUE) {
     
     boots <- get.tau.bootstrap(posmat, fun,
                                r, r.low, boot.iter,
                                comparison.type)
     
     rc <- apply(boots[,-(1:2)], 1, quantile, probs=c(ci.low, ci.high))
     
     if (data.frame == FALSE) {
          return(rc)
     } else if (data.frame == TRUE) {
          return(data.frame(r.low=r.low, 
                            r=r, 
                            pt.est=get.tau(posmat, fun, r, r.low)$tau, 
                            ci.low=rc[1,], 
                            ci.high=rc[2,]))
     }
}


##' Bootstrap \code{get.tau} values.
##'
##' Runs \code{get.tau} on multiple bootstraps of the data. Is formulated
##' such that the relationship between points and themselves will not be
##' calculated
##'
##'
##' @param posmat a matrix appropriate for input to \code{get.tau}
##' @param fun a function appropriate as input to \code{get.pi}
##' @param r the series of spatial distances wer are interested in
##' @param r.low the low end of each range....0  by default
##' @param boot.iter the number of bootstrap iterations
##' @param comparison.type the comparison type to pass as input to \code{get.pi}
##' @param data.frame logical indicating whether to return results as a data frame (default = TRUE)
##'
##' @return a matrix containing all bootstrapped values of tau for each distance interval
##'
##' @author Justin Lessler and Henrik Salje
##'
##' @family get.tau
##'
##' @example R/examples/get_tau_bootstrap.R
##'

get.tau.bootstrap <- function(posmat,
                              fun,
                              r=1,
                              r.low=rep(0,length(r)),
                              boot.iter,
                              comparison.type = "representative",
                              data.frame=TRUE) {


  xcol <-  which(colnames(posmat)=="x")
  ycol <- which(colnames(posmat)=="y")

  #check that both columns exist
  if (length(xcol)!=1 & length(ycol)!=1) {
    stop("unique x and y columns must be defined")
  }

  if (comparison.type == "representative") {
    comp.type.int <- 0
  } else if (comparison.type == "independent") {
    comp.type.int <- 1
  } else {
    stop("unkown comparison type specified")
  }

  rc <- matrix(nrow=boot.iter, ncol=length(r))
  for (i in 1:boot.iter) {
    inds <- sample(nrow(posmat), replace=T)
    rc[i,] <- .Call("get_tau",
                    posmat[inds,],
                    fun,
                    r,
                    r.low,
                    comp.type.int,
                    inds,
                    xcol,
                    ycol)
  }
  
  if (data.frame == FALSE) {
       return(rc)
  } else if (data.frame == TRUE) {
       return(data.frame(r.low=r.low, r=r, t(rc)))
  }
}


##' runs bootstrapping for \code{get.tau.typed}
##'
##' @param posmat a matrix with columns type, x and y
##' @param typeA the "from" type that we are interested in, -1 is wildcard
##' @param typeB the "to" type that we are interested i, -1 is wildcard
##' @param r the series of spatial distances we are interested in
##' @param r.low the low end of each range....0  by default
##' @param boot.iter the number of bootstrap iterations
##' @param comparison.type what type of points are included in the comparison set.
##' \itemize{
##'   \item "representative" if comparison set is representative of the underlying population
##'   \item "independent" if comparison set is cases/events coming from an independent process
##' }
##' @param data.frame logical indicating whether to return results as a data frame (default = TRUE)
##'
##' @return tau values for all the distances we looked at
##'
##' @author Justin Lessler and Henrik Salje
##'
##' @family get.tau
##'
##' @example R/examples/get_tau_typed_bootstrap.R
##'

get.tau.typed.bootstrap <- function(posmat,
                                    typeA = -1,
                                    typeB = -1,
                                    r=1,
                                    r.low=rep(0,length(r)),
                                    boot.iter,
                                    comparison.type = "representative",
                                    data.frame=TRUE) {


  if (comparison.type == "representative") {
    comp.type.int <- 0
  } else if (comparison.type == "independent") {
    comp.type.int <- 1
  } else {
    stop("unkown comparison type specified")
  }

  rc <- matrix(nrow=boot.iter, ncol=length(r))
  for (i in 1:boot.iter) {
    inds <- sample(nrow(posmat), replace=T)
    rc[i,] <- .C("get_tau_typed",
                 as.integer(posmat[inds,"type"]),
                 as.double(posmat[inds,"x"]),
                 as.double(posmat[inds,"y"]),
                 as.integer(nrow(posmat)),
                 as.integer(typeA),
                 as.integer(typeB),
                 as.double(r.low),
                 as.double(r),
                 as.integer(length(r)),
                 as.integer(inds),
                 as.integer(comp.type.int),
                 rc=double(length(r))
    )$rc
  }
  
  if (data.frame == FALSE) {
       return(rc)
  } else if (data.frame == TRUE) {
       return(data.frame(r.low=r.low, r=r, t(rc)))  
  }
}


##' get the null distribution of the \code{get.tau} function
##'
##' Does permutations to calculate the null distribution of get pi
##' if there were no spatial dependence. Randomly reassigns coordinates
##' to each observation permutations times
##'
##' @param posmat a matrix appropriate for input to \code{get.tau}
##' @param fun a function appropriate for input to \code{get.tau}
##' @param r the series of spatial distances we are interested in
##' @param r.low the low end of each range....0  by default
##' @param permutations the number of permute iterations
##' @param comparison.type the comparison type to pass as input to \code{get.pi}
##' @param data.frame logical indicating whether to return results as a data frame (default = TRUE)
##'
##' @return tau values for all the distances we looked at
##'
##' @author Justin Lessler and Henrik Salje
##'
##' @family get.tau
##'
##' @example R/examples/get_tau_permute.R
##'

get.tau.permute <- function(posmat,
                            fun,
                            r=1,
                            r.low=rep(0,length(r)),
                            permutations,
                            comparison.type = "representative",
                            data.frame=TRUE) {


  xcol <-  which(colnames(posmat)=="x")
  ycol <- which(colnames(posmat)=="y")

  #check that both columns exist
  if (length(xcol)!=1 & length(ycol)!=1) {
    stop("unique x and y columns must be defined")
  }

  if (comparison.type == "representative") {
    comp.type.int <- 0
  } else if (comparison.type == "independent") {
    comp.type.int <- 1
  } else {
    stop("unknown comparison type specified")
  }

  rc <- matrix(nrow=permutations, ncol=length(r))
  for (i in 1:permutations) {
    inds <- sample(nrow(posmat))#, replace=T)
    tmp.posmat <- posmat
    tmp.posmat[,"x"] <- posmat[inds,"x"]
    tmp.posmat[,"y"] <- posmat[inds,"y"]
    rc[i,] <- .Call("get_tau",
                    tmp.posmat,
                    fun,
                    r,
                    r.low,
                    comp.type.int,
                    1:nrow(posmat),
                    xcol,
                    ycol)
  }

  
  if (data.frame == FALSE) {
       return(rc)
  } else if (data.frame == TRUE) {
       return(data.frame(r.low=r.low, r=r, t(rc)))
  }
}


##' get the null distribution for the \code{get.tau.typed} function
##'
##'
##' @param posmat a matrix with columns type, x and y
##' @param typeA the "from" type that we are interested in, -1 is wildcard
##' @param typeB the "to" type that we are interested i, -1 is wildcard
##' @param r the series of spatial distances we are interested in
##' @param r.low the low end of each range....0  by default
##' @param permutations the number of permute iterations
##' @param comparison.type what type of points are included in the comparison set.
##' \itemize{
##'   \item "representative" if comparison set is representative of the underlying population
##'   \item "independent" if comparison set is cases/events coming from an indepedent process
##' }
##' @param data.frame logical indicating whether to return results as a data frame (default = TRUE)
##'
##' @return a matrix with permutation tau values for each distance specified
##'
##' @author Justin Lessler and Henrik Salje
##'
##' @family get.tau
##'
##' @example R/examples/get_tau_typed_permute.R
##'

get.tau.typed.permute <- function(posmat,
                                  typeA = -1,
                                  typeB = -1,
                                  r=1,
                                  r.low=rep(0,length(r)),
                                  permutations,
                                  comparison.type = "representative",
                                  data.frame=TRUE) {

  xcol <-  which(colnames(posmat)=="x")
  ycol <- which(colnames(posmat)=="y")

  #check that both columns exist
  if (length(xcol)!=1 & length(ycol)!=1) {
    stop("unique x and y columns must be defined")
  }

  if (comparison.type == "representative") {
    comp.type.int <- 0
  } else if (comparison.type == "independent") {
    comp.type.int <- 1
  } else {
    stop("unkown comparison type specified")
  }

  rc <- matrix(nrow=permutations, ncol=length(r))
  for (i in 1:permutations) {
    inds <- sample(nrow(posmat))#, replace=T)
    rc[i,] <- .C("get_tau_typed",
                 as.integer(posmat[,"type"]),
                 as.double(posmat[inds,"x"]),
                 as.double(posmat[inds,"y"]),
                 as.integer(nrow(posmat)),
                 as.integer(typeA),
                 as.integer(typeB),
                 as.double(r.low),
                 as.double(r),
                 as.integer(length(r)),
                 as.integer(1:nrow(posmat)),
                 as.integer(comp.type.int),
                 rc=double(length(r))
    )$rc
  }
  
  if (data.frame == FALSE) {
       return(rc)
  } else if (data.frame == TRUE) {
       return(data.frame(r.low=r.low, r=r, t(rc)))
  }
}



##' Simulated dataset of dengue transmission with basic reproductive number of 1
##' 
##' Dataset simulated using an agent based model with a spatially heterogeneous population structure. Infectious agents 
##' were introduced resulting in agent to agent transmission. The distance between successive cases in a transmission chain 
##' were randomly drawn from a uniform distribution U(0,100). Each infectious agent resulted in a single transmission to 
##' another agent after a delay of 15 days, reflecting the generation time of dengue. There are 11 transmission chains, 
##' each with a different genotype. The genotypes are subdivided into four serotypes.
##' 
##' @format Matrix with five columns representing the X and Y coordinates of infected individuals, the time of infection, 
##' the genotype of the infecting pathogen and the serotype of the infecting pathogen.
##' 
##' @author Justin Lessler and Henrik Salje
##' 
"DengueSimR01"


##' Simulated dataset of dengue cases with basic reproductive number of 2
##' 
##' Dataset simulated using an agent based model with a spatially heterogeneous population structure. Infectious agents 
##' were introduced resulting in agent to agent transmission. The distance between successive cases in a transmission chain 
##' were randomly drawn from a uniform distribution U(0,100). Each infectious agent resulted in transmissions to two other 
##' agents after a delay of 15 days, reflecting the generation time of dengue. There are 11 transmission chains, each with 
##' a different genotype. The genotypes are subdivided into four serotypes.
##' 
##' @format Matrix with five columns representing the X and Y coordinates of infected individuals, the time of infection, 
##' the genotype of the infecting pathogen and the serotype of the infecting pathogen.
##' 
##' @author Justin Lessler and Henrik Salje
##' 
"DengueSimR02"


##' Simulated dataset of dengue cases with representative underlying population
##' 
##' Dataset simulated using an agent based model with a spatially heterogeneous population structure. Infectious agents were 
##' introduced resulting in agent to agent transmission. The distance between successive cases in a transmission chain were 
##' randomly drawn from a uniform distribution U(0,100). Each infectious agent resulted in transmissions to two other agents 
##' after a delay of 15 days, reflecting the generation time of dengue. There are 11 transmission chains, each with a different 
##' genotype. The genotypes are subdivided into four serotypes. 500 randomly selected individuals from the underlying population 
##' also included.
##' 
##' @format Matrix with five columns representing the X and Y coordinates of infected individuals, the time of infection, the genotype 
##' of the infecting pathogen and the serotype of the infecting pathogen. Individuals representative from the underlying population 
##' have '-999'for time, genotype and serotype.
##' 
##' @author Justin Lessler and Henrik Salje
##' 
"DengueSimRepresentative"

Try the IDSpatialStats package in your browser

Any scripts or data that you put into this service are public.

IDSpatialStats documentation built on Aug. 9, 2021, 9:08 a.m.