R/dffit.R

Defines functions .dfposteriors .resample .jackknife .add.Gaussian.errors .make.prior.pdfs .corefit .make.obs.filter .make.grid .get.veff.lss .make.veff .handle.input dffit

Documented in dffit

#' Fit a generative distribution function, such as a galaxy mass function
#'
#' This function finds the most likely P-dimensional model parameters of a D-dimensional distribution function (DF) generating an observed set of N objects with D-dimensional observables x, accounting for measurement uncertainties and a user-defined selection function. For instance, if the objects are galaxies, \code{dffit} can fit a mass function (D=1), a mass-size distribution (D=2) or the mass-spin-morphology distribution (D=3). A full description of the algorithm can be found in Obreschkow et al. (2017).
#'
#' @importFrom akima interp
#' @importFrom stats optim rpois quantile approxfun cov var
#' @importFrom pracma jacobian
#'
#' @param x is an N-by-D matrix (or N-element vector if D=1) containing the observed D-dimensional properties of N objects. For instance, x can be N-element vector containing the logarithmig masses of N galaxies, to which a mass function should be fitted.
#' 
#' @param selection Specifies the effective volume \code{V(xval)} in which an object of (D-dimensional) true property (e.g. true log-mass) \code{xval} can be observed. This volume can account for complex selection criteria, as long as they can be expressed as a function of the \emph{true} properties, i.e. before measurement errors occur. This volume can be specified in five ways:\cr\cr
#' 
#' (1) \code{selection} can be a single positive number. This number will be interpreted as a constant volume, \code{V(xval)=selection}, in which all objects are fully observable. \code{V(xval)=0} is assumed outside the "observed domain". This domain is defined as \code{min(x)<=xval<=max(x)} for a scalar observable (D=1), or as \code{min(x[,j])<=xval[j]<=max(x[,j])} for all j=1,...,D if D>1. This mode can be used for volume-complete surveys or for simulated galaxies in a box.\cr\cr
#' 
#' (2) \code{selection} can be an N-element vector. The elements will be interpreted as the volumes of each galaxy. \code{V(xval)} is interpolated (linearly in \code{1/V}) for other values \code{xval}. \code{V(xval)=0} is assumed outside the observed domain, except if D=1 (as in the case of fitting mass functions), where \code{V(xval)=0} is assumed only if \code{xval<min(x)}, whereas for \code{xval>max(x)}, \code{V(xval)} is set equal to the maximum effective volume, i.e. the maximum of \code{selection}. \cr\cr
#' 
#' (3) \code{selection} can be a function of D variables, which directly specifies the effective volume for any \code{xval}, i.e. \code{Veff(xval)=selection(xval)}.\cr\cr
#' 
#' (4) \code{selection} can be a list (\code{selection = list(veff.values, veff.userfct)}) of an \code{N}-element vector \code{veff.values} and a \code{D}-dimensional function \code{veff.userfct}. In this case, the effective volume is computed using a hybrid scheme of modes (2) and (3): \code{V(xval)} will be interpolated from the N values of \code{veff.values} inside the observed domain, but set equal to \code{veff.userfct} outside this domain.\cr\cr
#' 
#' (5) \code{selection} can be a list of two functions and one 2-element vector: \code{selection = list(f, dVdr, rmin, rmax)}, where \code{f = function(xval,r)} is the isotropic selection function and \code{dVdr = function(r)} is the derivative of the total survey volume as a function of comoving distance \code{r}. The scalars \code{rmin} and \code{rmax} (can be \code{0} and \code{Inf}) are the minimum and maximum comoving distance limits of the survey. Outside these limits \code{V(xval)=0} will be assumed.\cr\cr
#' 
#' @param x.err specifies the observational uncertainties of the N data points. These uncertainties can be specified in four ways:\cr\cr
#' 
#' (1) If \code{x.err = NULL}, the measurements \code{x} will be considered exact.\cr\cr
#' 
#' (2) If \code{x.err} is a \code{N-by-D} matrix, the scalars \code{x.err[i,j]} are interpreted as the standard deviations of Gaussian uncertainties on \code{x[i,j]}.\cr\cr
#' 
#' (3) If \code{x.err} is a \code{N-by-D-by-D} array, the \code{D-by-D} matrices \code{x.err[i,,]} are interpreted as the covariance matrices of the D observed values \code{x[i,]}.\cr\cr
#' 
#' (4) \code{x.err} can also be a function of a D-vector \code{x} and an integer \code{i}, such that \code{x.err(x,i)} is the prior probability distribution function of the data point \code{i} to have the true value \code{x}. This function must be vectorized in the first argument, such that calling \code{x.err(x,i)} with \code{x} being a N-by-D matrix returns an N-element vector.\cr\cr
#' 
#' @param r Optional N-element vector specifying the comoving distances of the N objects (e.g. galaxies). This vector is only needed if \code{correct.lss.bias = TRUE}.
#' @param gdf Either a string or a function specifying the DF to be fitted. A string is interpreted as the name of a predefined mass function (i.e. functions of one obervable, \code{D=1}). Available options are \code{'Schechter'} for Schechter function (3 parameters), \code{'PL'} for a power law (2 parameters), or \code{'MRP'} for an MRP function (4 parameters). Alternatively, \code{gdf = function(xval,p)} can be any function of the \code{P} observable(s) \code{xval} and a list of parameters \code{p}. The function \code{gdf(xval,p)} must be fully vectorized in \code{xval}, i.e. it must output a vector of \code{N} elements if \code{xval} is an \code{N-by-P} array (such as the input argument \code{x}). Note that if \code{gdf} is given as a function, the argument \code{p.initial} is mandatory.
#' @param p.initial is a P-vector specifying the initial model parameters for fitting the DF.
#' @param prior is an optional function specifying the priors on the P model parameters. This function must take a P-dimensional vector \code{p} as input argument and return a scalar proportional to the natural logarithm of the prior probability of the P-dimensional model parameter \code{p}. In other words, \code{prior(p)} is an additative term for the log-likelihood. Note that \code{prior(p)} must be smooth and finite for the full parameter space. If \code{prior=NULL} (default), uniform priors are assumed.
#' @param obs.selection is an optional selection function of a D-vector \code{x}, which specifies the fraction (between 0 and 1) of data with \emph{observed} values \code{x}, whose true value passes the selection function \code{selection} can be observed. Normally this fraction is assumed to be 1 and all the selections are assumed to be specified relative to the true value in \code{selection}. However, if the data were selected, for example, with a sharp cut in the observed values, this can be specified in \code{obs.selection}. The function \code{obs.selection(x)} must be vectorized and return a vector of \code{N} elements, if \code{x} is an \code{N-by-D} matrix (such as the input argument \code{x}).
#' @param obs.sel.cov is an optional \code{D-by-D} matrix, only used if \code{obs.selection} is set. It specifies the mean covariance matrix of the data to estimate how much data has been scattered outside the observing range. If \code{obs.selection} is set, but \code{obs.sel.cov} is not specified, the code estimates \code{obs.sel.cov} from the mean errors of the data (only works for 1D data).
#' @param n.iterations Maximum number of iterations in the repeated fit-and-debias algorithm to evaluate the maximum likelihood.
#' @param method optimization method argument of \code{\link{optim}} routine
#' @param reltol relative tolerance parameter of \code{\link{optim}} routine
#' @param correct.lss.bias If \code{TRUE} the \code{distance} values are used to correct for the observational bias due to galaxy clustering (large-scale structure). The overall normalization of the effective volume is chosen such that the expected mass contained in the survey volume is the same as for the uncorrected effective volume.
#' @param lss.weight If \code{correct.lss.bias==TRUE}, this optional function of a \code{P}-vector is the weight-function used for the mass normalization of the effective volume. For instance, to preserve the number of galaxies, choose \code{lss.weight = function(x) 1}, or to perserve the total mass, choose \code{lss.weight = function(x) 10^x} (if the data \code{x} are log10-masses).
#' @param lss.errors is a logical flag specifying whether uncertainties computed via resampling should include errors due to the uncerainty of large-scale structure (LSS). If \code{TRUE} the parameter uncerainties are estimated by refitting the LSS correction at each resampling iteration. This argument is only considered if \code{correct.lss.bias=TRUE} and \code{n.bootstrap>0}.
#' @param n.bootstrap If \code{n.bootstrap} is an integer larger than one, the data is resampled \code{n.bootstrap} times using a non-parametric bootstrapping method to produce more accurate covariances. These covariances are given as matrix and as parameter quantiles in the output list. If \code{n.bootstrap = NULL}, no resampling is performed.
#' @param n.jackknife If \code{n.jackknife} is an integer larger than one, the data is jackknife-resampled \code{n.jackknife} times, removing exactly one data point from the observed set at each iteration. This resampling adds model parameters, maximum likelihood estimator (MLE) bias corrected parameter estimates (corrected to order 1/N). If \code{n.jackknife} is larger than the number of data points N, it is automatically reduced to N.  If \code{n.jackknife = NULL}, no such parameters are deterimed.
#' @param xmin,xmax,dx are \code{P}-element vectors (i.e. scalars for 1-dimensional DF) specifying the points (\code{seq(xmin[i],xmax[i],by=dx[i])}) used for some numerical integrations.
#' @param keep.eddington.bias If \code{TRUE}, the data is not corrected for Eddington bias. In this case no fit-and-debias iterations are performed and the argument \code{n.iterations} will be ignored.
#' @param write.fit If \code{TRUE}, the best-fitting parameters are displayed in the console.
#' @param add.gaussian.errors If \code{TRUE}, Gaussian estimates of the 16 and 84 percentiles of the fitted generative distribution function (gdf) are included in the sublist \code{grid} of the output.
#' @param make.posteriors If \code{TRUE}, posterior probability distributions of the observed data are evaluated from the best fitting model.
#' 
#' @details
#' For a detailed description of the method, please refer to the peer-reviewed publication by Obreschkow et al. 2017 (in prep.).
#'
#' @return The routine \code{dffit} returns a structured list, which can be interpreted by other routines, such as \code{\link{dfwrite}}, \code{\link{dfplot}}, \code{\link{dfplotcov}}, \code{\link{dfplotveff}}. The list contains the following sublists:
#' 
#' \item{data}{contains the input arguments \code{x}, \code{x.err}, \code{r} and \code{lss.weight}, as well as the integers \code{n.data} and \code{n.dim} specifying the number of objects (N) to be fitted and the dimension (D) of the observables. In other words, \code{n.data} and \code{n.dim} are the number of rows and columns of \code{x}, respectively.}
#' 
#' \item{selection}{is a list describing the selection function of the data used when fitting the generative DF. The most important entries in this list are:\cr\cr
#' \code{veff} is a function of a D-dimensional vector, specifying the effective volume associated with an object of properties xval. If LSS is corrected for, i.e. if the argument \code{correct.lss.bias} is set to \code{TRUE}, this function is the final effecive volume, including the effect of LSS.\cr\cr
#' \code{veff.no.lss} is a function of a D-dimensional vector, specifying the effecive volume associate with an object, if LSS were not accounted for. This function is indentical to \code{veff}, if LSS is not accounted for in the fit, i.e. if the argument \code{correct.lss.bias} is set to \code{FALSE}.\cr\cr
#' \code{mode} is an integer specifying the format of the input argument \code{selection}.}
#' 
#' \item{fit}{is a list describing the fitted generative distribution function. Its most important entries are:\cr\cr
#' \code{p.best} is a P-vector giving the most likely model parameters according to the MML method.\cr\cr
#' \code{p.sigma} and \code{p.covariance} are the standard deviations and covariance matrices of the best-fitting parameters in the Gaussian approximation from the Hessian matrix of the modified likelihood function.\cr\cr
#'  \code{gdf} is a function of a D-dimensional vector, which is the generative DF, evaluated at the parameters \code{p.best}.\cr\cr
#'  \code{scd} is a function of a D-dimensional vector, which gives the predicted source counts of the most likely model, i.e. scd(x)=gdf(x)*veff(x).}
#' 
#' \item{posteriors}{is a list specifying the posterior PDFs of the observed data, given the best-fitting model. It contains the following entries:\cr\cr
#' \code{x.mean} is an N-by-D dimensional array giving the D-dimensional means of the posterior PDFs of the N objects.\cr\cr
#' \code{x.mode} is an N-by-D dimensional array giving the D-dimensional modes of the posterior PDFs of the N objects.\cr\cr
#' \code{x.stdev} is an N-by-D dimensional array giving the D-dimensional standard deviations of the posterior PDFs of the N objects.\cr\cr
#' \code{x.random} is an N-by-D dimensional array giving one random D-dimensional value drawn from the posterior PDFs of each of the N objects.}
#' 
#' \item{model}{is a list describing the generative DF used to model the data. The main entries of this list are:\cr\cr
#' \code{gdf(xval,p)} is the generative DF to be fitted, written as phi(x|theta) in the reference publication.\cr\cr
#' \code{gdf.equation} is a text-string representing the analytical equation of \code{gdf}.\cr\cr
#' \code{parameter.names} is a P-vector of expressions (see \code{\link{expression}}), specifying the names of the parameters.\cr\cr
#' \code{n.para} is an integer specifying the number P of model parameters.}
#' 
#' \item{grid}{is a list of arrays with numerical evaluations of different functions on a grid in the D-dimensional observable space. This grid is used for numerical integrations and graphical representations. The most important list entries are:\cr\cr
#' \code{x} is a M-by-D array of M points, defining a regular Cartesian grid in the D-dimensional observable space.\cr\cr
#' \code{xmin} and \code{xmax} are D-vectors specifying the lower and upper boundary of the grid in the D-dimensional observable space.\cr\cr
#' \code{dx} is a D-vector specifying the steps between grid points.\cr\cr
#' \code{dvolume} is a number specifying the D-dimensional volume associated with each grid point.\cr\cr
#' \code{n.points} is the number M of grid points.\cr\cr
#' \code{gdf} is an M-vector of values of the best-fitting generative DF at each grid point. The additional entries \code{gdf.error.neg} and \code{gdf.error.pos} define the 68\%-confidence range in the Hessian approximation of the parameter covariances. The optional entries \code{gdf.quantile.#} are different quantiles, generated if the input argument \code{n.bootstrap} is set.\cr\cr
#' \code{veff} is an M-vector of effective volumes at each grid point.\cr\cr
#' \code{scd} is an M-vector of predicted source counts according to the best-fitting model.\cr\cr
#' \code{scd.posterior} is an M-vector of observed source counts derived from the posterior PDFs of each object.\cr\cr
#' \code{effective.counts} is an M-vector of fractional source counts derived from the posterior PDFs of each object.}
#' 
#' \item{options}{is a list of various optional input arguments of \code{dffit}.}
#'
#' @keywords schechter function
#' @keywords mass function
#' @keywords fit
#'
#' @examples
#' # For a quick overview of some key functionalities run
#' dfexample()
#' # with varying integer arguments 1, 2, 3, 4.
#' 
#' # The following examples introduce the basics of dftools step-by step.
#' # First, generate a mock sample of 1000 galaxies with 0.5dex mass errors, drawn from
#' # a Schechter function with the default parameters (-2,11,-1.3):
#' dat = dfmockdata(n=1000, sigma=0.5)
#' 
#' # show the observed and true log-masses (x and x.true) as a function of true distance r
#' plot(dat$r,dat$x,col='grey'); points(dat$r,dat$x.true,pch=20)
#' 
#' # fit a Schechter function to the mock sample without accounting for errors
#' survey1 = dffit(dat$x, dat$veff)
#' 
#' # plot fit and add a black dashed line showing the input MF
#' ptrue = dfmodel(output='initial')
#' mfplot(survey1, xlim=c(1e6,2e12), ylim=c(2e-4,2),
#' show.data.histogram = TRUE, p = ptrue, col.fit = 'purple')
#' 
#' # now, do the same again, while accountting for measurement errors in the fit
#' # this time, the posterior data, corrected for Eddington bias, is shown as black points
#' survey2 = dffit(dat$x, dat$veff, dat$x.err)
#' mfplot(survey2, show.data.histogram = NA, add = TRUE)
#'
#' # show fitted parameter PDFs and covariances with true input parameters as black points
#' dfplotcov(list(survey2,survey1,ptrue),pch=c(20,20,3),col=c('blue','purple','black'),nstd=15)
#'
#' # show effective volume function
#' dfplotveff(survey2)
#'
#' # now create a smaller survey of only 30 galaxies with 0.5dex mass errors
#' dat = dfmockdata(n=30, sigma=0.5)
#' 
#' # fit a Schechter function and determine uncertainties by resampling the data
#' survey = dffit(dat$x, dat$veff, dat$x.err, n.bootstrap = 30)
#' 
#' # show best fit with 68% Gaussian uncertainties from Hessian and posterior data as black points
#' mfplot(survey, show.data.histogram = TRUE, uncertainty.type = 1)
#' 
#' # show best fit with 68% and 95% resampling uncertainties and posterior data as black points
#' mfplot(survey, show.data.histogram = TRUE, uncertainty.type = 3)
#' 
#' # add input model as dashed lines
#' lines(10^survey$grid$x, survey$model$gdf(survey$grid$x,ptrue), lty=2)
#'
#' @author Danail Obreschkow
#'
#' @export

dffit <- function(x,
                  selection = NULL,
                  x.err = NULL,
                  r = NULL,
                  gdf = 'Schechter',
                  p.initial = NULL,
                  prior = NULL,
                  obs.selection = NULL,
                  obs.sel.cov = NULL,
                  n.iterations = 100,
                  method = 'Nelder-Mead',
                  reltol = 1e-10,
                  correct.lss.bias = FALSE,
                  lss.weight = NULL,
                  lss.errors = TRUE,
                  n.bootstrap = NULL,
                  n.jackknife = NULL,
                  xmin = NULL,
                  xmax = NULL,
                  dx = 0.01,
                  keep.eddington.bias = FALSE,
                  write.fit = FALSE,
                  add.gaussian.errors = TRUE,
                  make.posteriors = TRUE) {
  # Set timer
  tStart = Sys.time()
  
  # Initialize main dataframe
  survey = list(data = list(x = x, x.err = x.err, r = r, lss.weight = lss.weight),
                selection = list(),
                model = list(),
                grid = list(xmin = xmin, xmax = xmax, dx = dx),
                fit = list(),
                options = list(p.initial = p.initial, prior = prior,
                               n.iterations = n.iterations,
                               n.bootstrap = n.bootstrap, n.jackknife = n.jackknife,
                               keep.eddington.bias = keep.eddington.bias,
                               correct.lss.bias = correct.lss.bias,
                               lss.weight = lss.weight, lss.errors = lss.errors,
                               method = method, reltol = reltol),
                tmp = list(selection = selection, gdf = gdf, obs.selection = obs.selection, obs.sel.cov = obs.sel.cov))
  
  # Check and pre-process input arguments
  survey = .handle.input(survey)
  survey = .make.veff(survey)
  survey = .make.grid(survey)
  survey = .make.obs.filter(survey)
  survey$tmp$rho.observed = .make.prior.pdfs(survey)
  
  # Find most likely generative model
  fit = .corefit(survey)
  if (survey$options$correct.lss.bias) survey$selection$veff = .get.veff.lss(survey,fit$p.best)
  survey$fit = list(p.best = fit$p.best, p.sigma = sqrt(diag(fit$p.covariance)), p.covariance = fit$p.covariance,
                    gdf = function(x) survey$model$gdf(x,fit$p.best),
                    scd = function(x) survey$model$gdf(x,fit$p.best)*survey$selection$veff(x),
                    lnL = fit$lnL,
                    ln.evidence = fit$ln.evidence,
                    status = fit$status)
  survey$grid$gdf = c(survey$fit$gdf(survey$grid$x))
  survey$grid$veff = c(survey$selection$veff(survey$grid$x))
  survey$grid$scd = survey$grid$gdf*survey$grid$veff
  
  # Determine Gaussian uncertainties
  if (survey$fit$status$converged & add.gaussian.errors) survey = .add.Gaussian.errors(survey)
  
  # Resample to determine more accurate uncertainties with quantiles
  if (survey$fit$status$converged & !is.null(survey$options$n.bootstrap)) survey = .resample(survey)
  
  # Resample to determine more accurate uncertainties with quantiles
  if (survey$fit$status$converged & !is.null(survey$options$n.jackknife)) survey = .jackknife(survey)
  
  # Make posteriors
  if (survey$fit$status$converged & !is.null(survey$data$x.err) & make.posteriors) survey = .dfposteriors(survey)
  
  # Write best fitting parameters
  if (write.fit) dfwrite(survey)

  # Finalize
  survey$fit$status$walltime = as.double(Sys.time())-as.double(tStart)
  survey[which(names(survey)=='tmp')] = NULL
  invisible(survey)

}

.handle.input <- function(survey) {
  
  # Handle x and gridding
  if (length(survey$data$x)<1) stop('Give at least one data point.')
  if (is.null(dim(survey$data$x))) {
    survey$data$x = cbind(as.vector(survey$data$x)) # make col-vector
  } else if (length(dim(survey$data$x))==1) {
    survey$data$x = cbind(survey$data$x)
  } else if (length(dim(survey$data$x))!=2) {
    stop('x cannot have more than two dimensions.')
  }
  survey$data$n.data = dim(survey$data$x)[1]
  survey$data$n.dim = dim(survey$data$x)[2]
  if (length(survey$grid$dx)!=survey$data$n.dim) {
    if (length(survey$grid$dx)==1) {
      survey$grid$dx = rep(survey$grid$dx,survey$data$n.dim)
    } else {
      stop('dx must be a P-element vector, where P is the number of columns of x.')
    }
  }
  
  # initialize xmin xmax
  xmin.auto = xmax.auto = rep(NA,survey$data$n.dim)
  for (i in seq(survey$data$n.dim)) {
    dx = max(survey$data$x[,i])-min(survey$data$x[,i])
    xmin.auto[i] = min(survey$data$x[,i])-dx
    xmax.auto[i] = max(survey$data$x[,i])+dx
  }
  
  # Handle x.err
  if (!is.null(survey$data$x.err)) {
    if (is.function(survey$data$x.err)) {
      # ok
    } else {
      if (all(survey$data$x.err==0)) {
        survey$data$x.err = NULL
      } else {
        if (is.null(dim(survey$data$x.err))) {
          survey$data$x.err = cbind(as.vector(survey$data$x.err)) # make col-vector
        } else if (length(dim(survey$data$x.err))==1) {
          survey$data$x.err = cbind(survey$data$x.err)
        } else  if (length(dim(survey$data$x.err))==2) {
          if (any(dim(survey$data$x)!=dim(survey$data$x.err))) stop('Size of x.err not compatible with size of x.')
        } else if (length(dim(survey$data$x.err))==3) {
          if (survey$data$n.dim==1) stop('For one-dimensional distribution function x.err cannot have 3 dimensions.')
          if (!(dim(survey$data$x.err)[1]==survey$data$n.data & dim(survey$data$x.err)[2]==survey$data$n.dim & dim(survey$data$x.err)[3]==survey$data$n.dim)) {
            stop('Size of x.err not compatible with size of x.')
          }
        } else {
          stop('x.err cannot have more than three dimensions.')
        }
        if (min(survey$data$x.err)<0) stop('All values of x.err must be positive.')
        if (length(dim(survey$data$x.err))==2) {
          for (i in seq(survey$data$n.dim)) {
            derr = max(survey$data$x.err[,i])
            xmin.auto[i] = xmin.auto[i]-derr
            xmax.auto[i] = xmax.auto[i]+derr
          }
        } else if (length(dim(survey$data$x.err))==3) {
          for (i in seq(survey$data$n.dim)) {
            derr = sqrt(max(survey$data$x.err[,i,i]))
            xmin.auto[i] = xmin.auto[i]-derr
            xmax.auto[i] = xmax.auto[i]+derr
          }
        }
      }
    }
  }
  
  # finalize xmin & xmax
  if (is.null(survey$grid$xmin)) {
    survey$grid$xmin = xmin.auto
  } else {
    if (length(survey$grid$xmin)!=survey$data$n.dim) stop('xmin must be a P-element vector, where P is the number of columns of x.')
    for (i in seq(survey$data$n.dim)) {
      if (min(survey$data$x[,i])<survey$grid$xmin[i]) stop('xmin cannot be larger than smallest observed value x.')
    }
  }
  if (is.null(survey$grid$xmax)) {
    survey$grid$xmax = xmax.auto
  } else {
    if (length(survey$grid$xmax)!=survey$data$n.dim) stop('xmax must be a P-element vector, where P is the number of columns of x.')
    for (i in seq(survey$data$n.dim)) {
      if (max(survey$data$x[,i])>survey$grid$xmax[i]) stop('xmax cannot be smaller than largest observed value x+x.err.')
    }
  }
  
  # Handle distance
  if (!is.null(survey$data$r)) {
    survey$data$r = as.vector(survey$data$r)
    if (length(survey$data$r)!=survey$data$n.data) stop('The number of distance values must be equal to the number of data points (=number of columns of x).')
    if (min(survey$data$r)<=0) stop('All distance values must be positive.')
  }
  
  # Handle correct.lss.bias
  if (survey$options$correct.lss.bias) {
    if (is.null(survey$data$r)) stop('Distances must be given of correct.lss.bias = TRUE.')
  }
  
  # Handle priors
  if (is.null(survey$options$prior)) {
    survey$options$prior = function(p) 0
  }
  
  # Handle gdf
  if (is.function(survey$tmp$gdf)) {
    if (is.null(survey$options$p.initial)) stop('For user-defined distribution functions initial parameters must be given.')
    survey$model$gdf = survey$tmp$gdf
    survey$model$gdf.equation = NULL
    survey$model$parameter.names = NULL
  } else {
    if (is.null(survey$options$p.initial)) survey$options$p.initial = dfmodel(output = 'initial', type = survey$tmp$gdf)
    survey$model$gdf = function(x,p) {dfmodel(x, p, type = survey$tmp$gdf)}
    survey$model$gdf.equation = dfmodel(output = 'equation', type = survey$tmp$gdf)
    survey$model$parameter.names = dfmodel(output = 'names', type = survey$tmp$gdf)
  }
  survey$model$n.para = length(survey$options$p.initial)
  
  # Handle n.iterations
  if (is.null(survey$options$n.iterations)) stop('n.iterations must be a positive integer.')
  if (survey$options$n.iterations<1) stop('n.iterations must be a positive integer.')
  
  # Handle n.bootstrap
  if (!is.null(survey$options$n.bootstrap)) {
    if (survey$options$n.bootstrap<2) stop('n.bootstrap must be 2 or larger.')
  }
  
  # Handle n.jackknife
  if (!is.null(survey$options$n.jackknife)) {
    if (survey$options$n.jackknife<2) stop('n.jackknife must be 2 or larger.')
  }
  
  invisible(survey)
}

#' @export
.make.veff = function(survey) {
  
  # Generates the function veff.function(xval) from various selection function types.
  
  # Handle selection
  if (is.null(survey$tmp$selection)) stop('A selection function must be given.')
  s = survey$tmp$selection
  x = survey$data$x
  r = survey$data$r
  n.dim = survey$data$n.dim
  n.data = survey$data$n.data
  xmin = apply(x,2,min)
  xmax = apply(x,2,max)
  mode = NULL
  veff.values = NULL
  veff.userfct = NULL
  veff.function = NULL
  f.function = NULL
  dVdr = NULL
  rmin = NULL
  rmax = NULL
  
  # Mode 1: Constant effective volume inside observed domain
  if (is.double(s)) {
    if (length(s)==1) {
      if (s<=0) stop('s = Vconstant mustt be positive.')
      mode = 1
      veff.function.elemental = function(xval) {
        if (any(xval<xmin) | any(xval>xmax)) {
          return(0)
        } else {
          return(s)
        }
      }
    }
  }
  
  # Mode 2: Interpolated effective volume inside observed domain
  if (is.double(s)) {
    if (length(s)==n.data) {
      if (min(s)<=0) stop('All values of selection (=Veff) must be positive.')
      mode = 2
      veff.values = s
      if (n.dim==1) {
        fapprox = approxfun(x[,1],1/veff.values,rule=2)
        veff.max = max(veff.values)
        veff.function.elemental = function(xval) {
          if (xval<xmin) {
            return(0)
          } else if (xval>xmax) {
            return(veff.max)
          } else {
            z = 1/fapprox(xval)
            if (is.na(z)) {return(0)} else {return(z)}
          }
        }
      } else if (n.dim==2) {
        vapprox = function(xval) {
          z = 1/(akima::interp(x[,1],x[,2],1/veff.values,xval[1],xval[2],duplicate='mean'))$z
          if (is.na(z)) {return(0)} else {return(z)}
        }
        veff.function.elemental = function(xval) {
          if (any(xval<xmin) | any(xval>xmax)) {
            return(0)
          } else {
            return(vapprox(xval))
          }
        }
      } else {
        stop('Linear interpolation of Veff not implemented for DF with more than 2 dimensions. Use a different selection type.')
      }
    }
  }
  
  # Mode 3: Effective volume given directly
  if (is.function(s)) {
    mode = 3
    test = try(s(rbind(x)))
    if (is.double(test) & length(test)==n.data) {
      veff.function = s
    } else {
      veff.function.elemental = s
    }
    veff.userfct = s
  }
  
  # Mode 4: Hybrid of 2 and 3
  if (is.list(s)) {
    if (length(s)==2) {
      if (is.double(s[[1]]) & is.function(s[[2]])) {
        veff.values = s[[1]]
        test = try(s[[2]](rbind(x)))
        if (is.double(test) & length(test)==n.data) {
          veff.userfct = function(xval) s[[2]](rbind(xval))
        } else {
          veff.userfct = s[[2]]
        }
        if (min(veff.values)<=0) stop('All values of selection (=Veff) must be positive.')
        mode = 4
        if (n.dim==1) {
          vapprox = function(xval) {
            f = approxfun(x[,1],1/veff.values,rule=2)
            return(1/f(xval))
          }
        } else if (n.dim==2) {
          vapprox = function(xval) {
            z = 1/(akima::interp(x[,1],x[,2],1/veff.values,xval[1],xval[2],duplicate='mean'))$z
            if (is.na(z)) {return(veff.userfct(xval))} else {return(z)}
          }
        } else {
          stop('Linear interpolation of Veff not implemented for DF with more than 2 dimensions. Use a different selection type.')
        }
        veff.function.elemental = function(xval) {
          if (any(xval<xmin) | any(xval>xmax)) {
            return(veff.userfct(xval))
          } else {
            return(vapprox(xval))
          }
        }
      }
    }
  }
  
  # Mode 5: Selection given as {f(x,r), dVdr(r), rmin, rmax}
  if (is.list(s)) {
    if (length(s)==4) {
      if (is.function(s[[1]]) & is.function(s[[2]]) & is.double(s[[3]]) & is.double(s[[4]])) {
        mode = 5
        rmin = s[[3]]
        rmax = s[[4]]
        if (!is.null(r)) {
          if (rmin>min(r)) stop('rmin cannot in selection = list(f, dVdr, rmin, rmax) cannot be larger than minimal observed r')
          if (rmax<max(r)) stop('rmax cannot in selection = list(f, dVdr, rmin, rmax) cannot be smaller than maximal observed r')
        }
        test = try(s[[1]](NA,NA)*s[[2]](NA),silent=TRUE)
        if (!is.double(test)) stop('In the argument selection = list(f, dVdr, rmin, rmax), the functions f(xval,r) and dVdr(r) must work if r is a vector.')
        f.function = s[[1]]
        dVdr = s[[2]]
        veff.function.elemental = function(xval) {
          f = function(r) {s[[1]](xval,r)*s[[2]](r)}
          return(integrate(f,rmin,rmax,stop.on.error=FALSE)$value)
        }
      }
    }
  }
  
  if (is.null(mode)) stop('Unknown selection format.')
  
  # apply to all elements
  if (is.null(veff.function)) {
    veff.function = function(xval) {
      if (length(dim(xval))==2) {
        return(apply(xval,1,veff.function.elemental))
      } else if (length(dim(xval))==0) {
        if (n.dim==1) {
          return(sapply(xval,veff.function.elemental))
        } else {
          if (length(xval)!=n.dim) stop('Incorrect argument.')
          return(veff.function.elemental(xval))
        }
      } else {
        stop('Incorrect argument.')
      }
    }
  }
  
  survey$selection$veff = veff.function
  survey$selection$veff.no.lss = veff.function
  survey$selection$veff.input.values = veff.values
  survey$selection$veff.input.function = veff.userfct
  survey$selection$f = f.function
  survey$selection$dVdr = dVdr
  survey$selection$rmin = rmin
  survey$selection$rmax = rmax
  survey$selection$mode = mode
  
  invisible(survey)
}

.get.veff.lss = function(survey,p) {
  
  if (survey$selection$mode!=5) {
    cat('To correct LSS bias, the selection must be specified in the format\n')
    cat('selection = list(f, dVdr, rmin, rmax)\n')
    stop()
  }
  
  # initialize input
  x = survey$data$x
  r = survey$data$r
  n.dim = survey$data$n.dim
  n.data = survey$data$n.data
  rmin = survey$selection$rmin
  rmax = survey$selection$rmax
  simpson.integration = n.dim==1
  
  # evaluate integrals
  integrand.lss = function(x,r) {survey$selection$f(x,r)*survey$model$gdf(x,p)}
  integral = array(NA,n.data)
  if (simpson.integration) {
    for (i in seq(n.data)) {
      integral[i] = integrate(integrand.lss,survey$grid$xmin,survey$grid$xmax,r[i],stop.on.error=FALSE)$value
    }
  } else {
    for (i in seq(n.data)) {
      integral[i] = sum(integrand.lss(survey$grid$x,r[i]))*survey$grid$dvolume
    }
  }
  
  # make veff.lss function
  veff.lss.function.elemental = function(xval) {
    list = survey$selection$f(xval,r)>0
    return(sum(survey$selection$f(xval,r[list])/integral[list]))
  }
  veff.lss.scale = Vectorize(veff.lss.function.elemental)
  
  # renormalize veff to conserve total
  if (is.null(survey$options$lss.weight)) {
    weight = function(x) 1
  } else {
    weight = survey$options$lss.weight
  }
  int.ref = function(x) survey$selection$veff.no.lss(x)*survey$model$gdf(x,p)*weight(x)
  int.exp = function(x) veff.lss.scale(x)*survey$model$gdf(x,p)*weight(x)
  if (simpson.integration) {
    reference = integrate(int.ref,survey$grid$xmin,survey$grid$xmax,stop.on.error=FALSE)$value
    expectation = integrate(int.exp,survey$grid$xmin,survey$grid$xmax,stop.on.error=FALSE)$value
  } else {
    reference = sum(int.ref(survey$grid$x))*survey$grid$dvolume
    expectation = sum(int.exp(survey$grid$x))*survey$grid$dvolume
  }
  normalization.factor = reference/expectation
  veff.lss = function(x) veff.lss.scale(x)*normalization.factor
  
  # return output
  return(veff.lss)
}

.make.grid = function(survey) {
  
  n.dim = survey$data$n.dim
  x.grid = list()
  nx = array(NA,n.dim)
  for (i in seq(n.dim)) {
    if (survey$grid$xmax[i]<survey$grid$xmin[i]+survey$grid$dx[i]) stop('xmax cannot be smaller than xmin+dx.')
    x.grid[[i]] = seq(survey$grid$xmin[i],survey$grid$xmax[i],survey$grid$dx[i])
    nx[i] = length(x.grid[[i]])
  }
  x.mesh.dv = prod(survey$grid$dx)
  x.mesh = array(NA,c(prod(nx),n.dim))
  k1 = 1
  k2 = prod(nx)
  for (i in seq(n.dim)) {
    k2 = k2/nx[i]
    x.mesh[,i] = rep(rep(x.grid[[i]],each=k1),k2)
    k1 = k1*nx[i]
  }
  survey$grid$x = x.mesh
  survey$grid$dvolume = x.mesh.dv
  survey$grid$veff = survey$selection$veff(x.mesh)
  survey$grid$n.points = dim(x.mesh)[1]
  invisible(survey)
}

.make.obs.filter = function(survey) {
  
  # Make filter function that corrects the density of true x-values for the fact that a fraction of these values have been scattered outside the observed range of x
  # (important, for example, if there is a sharp sample cut in the observed input values of x)
  
  # Handle observational selection
  survey$selection$obs.selection = survey$tmp$obs.selection
  if (!is.null(survey$tmp$obs.selection)) {
    if (!is.function(survey$tmp$obs.selection)) {
      stop('obs.selection must be a function of a D-dimensional vector')
    }
  }
  if (is.null(survey$selection$obs.selection)) {
    survey$selection$obs.sel.cov = NULL
  } else {
    if (is.null(survey$tmp$obs.sel.cov)) {
      if (is.null(survey$data$x.err)) {
        survey$selection$obs.sel.cov = 1e-10
      } else {
        survey$selection$obs.sel.cov = max(1e-10,mean(survey$data$x.err^2))
      }
    } else {
      survey$selection$obs.sel.cov = max(1e-10,survey$tmp$obs.sel.cov)
    }
  }
  
  # put filter on grid
  n.mesh = dim(survey$grid$x)[1]
  n.dim = dim(survey$data$x)[2]
  survey$grid$obs.filter = rep(1,n.mesh)
  if (!is.null(survey$selection$obs.selection)) {
    inv.covariance = solve(survey$selection$obs.sel.cov)
    gauss = rep(NA,n.mesh)
    for (i in seq(n.mesh)) {
      mu = t(array(survey$grid$x[i,],c(n.dim,n.mesh)))
      dx = survey$grid$x-mu
      for (j in seq(n.mesh)) {
        gauss[j] = exp(-dx[j,]%*%inv.covariance%*%dx[j,]/2)
      }
      survey$grid$obs.filter[i] = sum(gauss*survey$selection$obs.selection(survey$grid$x))/sum(gauss)
    }
  }
  
  invisible(survey)
  
}

#' @export
.corefit <- function(survey, supress.warning = FALSE) {
  
  # simplify input
  x = survey$data$x
  gdf = survey$model$gdf
  p.initial = survey$options$p.initial
  x.mesh = survey$grid$x
  x.mesh.dv = survey$grid$dvolume
  keep.eddington.bias = survey$options$keep.eddington.bias
  parameter.prior = survey$options$prior
  obs.filter = survey$grid$obs.filter
  
  # get array sizes
  n.data = dim(x)[1]
  n.dim = dim(x)[2]
  n.mesh = dim(x.mesh)[1]
  
  # Input handling
  if (!survey$options$correct.lss.bias) veff.mesh = c(survey$grid$veff)
  if (is.null(survey$data$x.err) & !survey$options$correct.lss.bias) {
    n.iterations = 1
  } else {
    n.iterations = survey$options$n.iterations
  }
  
  # Iterative algorithm
  running = TRUE
  k = 0
  offset = 0
  chain = array(NA,c(n.iterations,length(p.initial)+1))
  
  # definie debiasing function
  debias = function(p) {
    rho.unbiased = array(0,n.mesh)
    if (is.null(survey$data$x.err) | keep.eddington.bias) {
      for (i in seq(n.data)) {
        rho.unbiased = rho.unbiased+survey$tmp$rho.observed[[i]]
      }
    } else {
      # predicted source counts (up to a factor x.mesh.dv)
      prior = gdf(x.mesh,p)*veff.mesh
      prior[!is.finite(prior)] = 0
      prior = pmax(0,prior)
      for (i in seq(n.data)) {
        rho.corrected = survey$tmp$rho.observed[[i]]*prior
        rho.corrected = rho.corrected/(sum(rho.corrected)*x.mesh.dv)
        rho.unbiased = rho.unbiased+rho.corrected
      }
    }
    return(rho.unbiased)
  }
  
  while (running) {
    
    k = k+1
    
    # determine veff LSS
    if (survey$options$correct.lss.bias) {
      veff.lss = .get.veff.lss(survey,p.initial)
      veff.mesh = c(veff.lss(x.mesh))
    }
    
    # make unbiased source density function
    rho.unbiased = debias(p.initial)
    
    # make -ln(L)
    neglogL = function(p) {
      phi = gdf(x.mesh,p)*obs.filter
      # safety operations (adding about 50% computation time)
      phi[!is.finite(phi)] = 0
      phi = pmax(.Machine$double.xmin,phi) # also vectorizes the array
      # end safety operations
      return(sum(phi*veff.mesh-log(phi)*rho.unbiased)*x.mesh.dv-offset-parameter.prior(p))
    }
    
    # test
    if (all(!is.finite(gdf(x.mesh,p.initial)))) stop('cannot evaluate GDF at initial parameters provided')
    test = try(neglogL(p.initial),silent=TRUE)
    if (!is.finite(test)) stop('cannot evaluate likelihood at initial parameters provided')
    
    # maximize ln(L)
    opt = optim(p.initial,neglogL,hessian=TRUE,method=survey$options$method,control=list(maxit=1e5,reltol=survey$options$reltol))
    offset = opt$value+offset
    chain[k,] = c(opt$par,opt$value)
    
    # assess convergence
    if ((is.null(survey$data$x.err) & !survey$options$correct.lss.bias) | keep.eddington.bias) { # exist without extra iterations
      converged = opt$convergence==0
      running = FALSE
    } else {
      # asses convergence
      if (k==1) {
        converged = FALSE
        d.old = Inf
      } else {
        d = abs(opt$value-value.old)
        converged = d>=d.old
        d.old = d
      }
      value.old = opt$value
      
      if (converged) {
        running = FALSE
      } else if (k == n.iterations) {
        converged = FALSE
        running = FALSE
        if (!supress.warning) {
          cat('WARNING: Maximum number of iteration reached. Consider increasing n.iterations and/or providing better initial parameters.\n')
        }
      }
      
      # prepare initial values for next iteration
      p.initial = opt$par
    }
  }
  
  # make output
  if (det(opt$hessian)<1e-12) {
    converged = FALSE
    cov = NULL
    if (!supress.warning) {
      cat('WARNING: Fit ill-conditionned. Consider providing better initial parameters or selection arguments.\n')
    }
    ln.evidence = FALSE
  } else {
    cov = solve(opt$hessian)
    n.para = length(opt$par)
    ln.evidence = -offset+0.5*n.para*log(2*pi)+0.5*log(det(cov))
  }
  
  # finalize output
  fit = list(p.best = opt$par, p.covariance = cov, lnL = function(p) -neglogL(p),
             status = list(n.iterations = k, converged = converged, chain = chain[1:k,]), ln.evidence = ln.evidence)
  return(fit)
  
}

.make.prior.pdfs = function(survey) {
  
  # simplify input
  x = survey$data$x
  x.err = survey$data$x.err
  x.mesh = survey$grid$x
  x.mesh.dv = survey$grid$dvolume
  
  # get array sizes
  n.data = dim(x)[1]
  n.dim = dim(x)[2]
  n.mesh = dim(x.mesh)[1]
  
  rho.observed = list()

  if (is.null(x.err)) {
  
    for (i in seq(n.data)) {
      difference = colSums(abs(t(x.mesh)-x[i,]))
      index = which.min(difference)[1]
      rho.observed[[i]] = array(0,n.mesh)
      rho.observed[[i]][index] = rho.observed[[i]][index]+1/x.mesh.dv
    }
    
  } else {
    
    if (is.function(x.err)) {
      
      for (i in seq(n.data)) {
        rho.observed[[i]] = c(x.err(c(x.mesh),i))
      }
      
    } else {
    
      # Make inverse covariances
      invC = array(NA,c(n.data,n.dim,n.dim))
      if (length(dim(x.err))==2) {
        if (!(dim(x.err)[1]==n.data & dim(x.err)[2]==n.dim)) {
          stop('Unknown format for x.err in .corefit.')
        }
        if (n.dim==1) {
          for (i in seq(n.data)) {
            invC[i,,] = 1/max(survey$grid$dx/2.35,x.err[i,])^2
          }
        } else {
          for (i in seq(n.data)) {
            invC[i,,] = diag(1/max(survey$grid$dx/2.35,x.err[i,])^2)
          }
        }
      } else if (length(dim(x.err))==3) {
        if (n.dim==1) stop('Unknown format for x.err in .corefit.')
        if (!(dim(x.err)[1]==n.data & dim(x.err)[2]==n.dim & dim(x.err)[3]==n.dim)) {
          stop('Unknown format for x.err in .corefit.')
        }
        for (i in seq(n.data)) {
          invC[i,,] = solve(x.err[i,,])
        }
      } else {
        stop('Unknown format for x.err in .corefit.')
      }
      
      # make a priori observed PDFs
      rho.observed = list()
      for (i in seq(n.data)) {
        d = x[i,]-t(x.mesh)
        rho.observed[[i]] = exp(-colSums(d*(invC[i,,]%*%d))/2)
      }
    }
    
    # normalize all prior PDFs
    for (i in seq(n.data)) {
      rho.observed[[i]] = rho.observed[[i]]/sum(rho.observed[[i]])/x.mesh.dv
    }
  }
  
  return(rho.observed)
  
}

.add.Gaussian.errors <- function(survey) {
  
  f = function(p) log(survey$model$gdf(survey$grid$x,p))
  J = jacobian(f,survey$fit$p.best)
  C.psi = J%*%survey$fit$p.covariance%*%t(J)
  sd.psi = sqrt(diag(C.psi))
  survey$grid$gdf.error.neg = (1-exp(-sd.psi))*survey$grid$gdf
  survey$grid$gdf.error.pos = (exp(+sd.psi)-1)*survey$grid$gdf
  invisible(survey)
  
}

.jackknife = function(survey) {
  
  # input handling
  n.data = survey$data$n.data
  np = survey$model$n.para
  if (n.data<=np) stop('Jackknifing requires more data points than model parameters.')
  n.jn = min(n.data,survey$options$n.jackknife)
  if (survey$options$n.jackknife<2) stop('n.jackknife must be at least 2.')
  
  # set up jackknife survey
  jn = survey
  jn$options$p.initial = survey$fit$p.best
  jn$grid$veff = survey$grid$veff*(n.data-1)/n.data
  jn$data$n.data = n.data-1
  jn$options$n.iterations = 1
  
  # run jackknifing resampling
  rejected = sample(seq(n.data),size=n.jn,replace=FALSE)
  p.new = array(NA,c(n.jn,np))
  for (i in seq(n.jn)) {
    cat(sprintf('\rJackknifing %4.2f%%',100*i/n.jn))
    list = setdiff(seq(n.data),rejected[i])
    jn$data$x = as.matrix(survey$data$x[list,])
    jn$tmp$rho.observed = survey$tmp$rho.observed
    jn$tmp$rho.observed[[i]] = NULL
    p.new[i,] = .corefit(jn, supress.warning = TRUE)$p.best
  }
  cat('\r')
  
  # correct estimator bias
  p.reduced = apply(p.new, 2, mean, na.rm = T)
  survey$fit$p.best.mle.bias.corrected = n.data*survey$fit$p.best-(n.data-1)*p.reduced
  survey$fit$gdf.mle.bias.corrected = function(x) survey$model$gdf(x,survey$fit$p.best.mle.bias.corrected)
  survey$fit$scd.mle.bias.corrected = function(x) survey$fit$gdf.mle.bias.corrected(x)*survey$selection$veff(x)
  survey$grid$gdf.mle.bias.corrected = survey$fit$gdf.mle.bias.corrected(survey$grid$x)
  survey$grid$scd.mle.bias.corrected = survey$fit$scd.mle.bias.corrected(survey$grid$x)
  
  invisible(survey)
  
}

.resample = function(survey) {
  
  # input handling
  n.data = survey$data$n.data
  np = survey$model$n.para
  if (n.data<2*np) stop('Bootstrapping requires at least twice as many data points as model parameters.')
  if (survey$options$n.bootstrap<2) stop('n.bootstrap must be at least 2.')
  
  # set up resample survey
  b = survey
  b$options$p.initial = survey$fit$p.best
  b$selection$veff = b$selection$veff.no.lss = survey$selection$veff
  b$options$correct.lss.bias = survey$options$correct.lss.bias & survey$options$lss.errors
  
  # randomly resample and refit the DF
  p.new = array(NA,c(survey$options$n.bootstrap,np))
  for (iteration in seq(survey$options$n.bootstrap)) {
    cat(sprintf('\rResampling: %4.2f%%',100*iteration/survey$options$n.bootstrap))
    b$data$n.data = max(np,rpois(1,n.data))
    s = sample.int(n.data,b$data$n.data,replace=TRUE)
    b$data$x = as.matrix(survey$data$x[s,])
    b$tmp$rho.observed = list()
    for (i in seq(b$data$n.data)) {
      b$tmp$rho.observed[[i]] = survey$tmp$rho.observed[[s[i]]]
    }
    if (!is.null(survey$data$r)) b$data$r = survey$data$r[s]
    p.new[iteration,] = .corefit(b, supress.warning = TRUE)$p.best
  }
  cat('\r')
  
  # compute covariance
  survey$fit$p.covariance.resample = array(NA,c(np,np))
  for (i in seq(np)) {
    for (j in seq(np)) {
      survey$fit$p.covariance.resample[i,j] = var(p.new[1:iteration,i],p.new[1:iteration,j],na.rm=TRUE)
    }
  }
  
  # make parameter quantiles
  q = c(0.02,0.16,0.84,0.98)
  p.quant = array(NA,c(length(q),np))
  for (i in seq(np)) {
    p.quant[,i] = quantile(p.new[,i],q,names=FALSE,na.rm=TRUE)
  }
  survey$fit$p.quantile.02 = p.quant[1,]
  survey$fit$p.quantile.16 = p.quant[2,]
  survey$fit$p.quantile.84 = p.quant[3,]
  survey$fit$p.quantile.98 = p.quant[4,]

  # make DF quantiles
  s = array(NA,c(survey$options$n.bootstrap,survey$grid$n.points))
  for (i in seq(survey$options$n.bootstrap)) {
    s[i,] = survey$model$gdf(survey$grid$x,p.new[i,])
  }
  y.quant = array(NA,c(4,survey$grid$n.points))
  for (i in seq(survey$grid$n.points)) {
    list = !is.na(s[,i]) & is.finite(s[,i]) & (s[,i]>0)
    y.quant[,i] = quantile(s[list,i],q,names=FALSE)
  }
  survey$grid$gdf.quantile.02 = y.quant[1,]
  survey$grid$gdf.quantile.16 = y.quant[2,]
  survey$grid$gdf.quantile.84 = y.quant[3,]
  survey$grid$gdf.quantile.98 = y.quant[4,]
  
  invisible(survey)

}

.dfposteriors <- function(survey) {
  
  if (is.null(survey$data$x.err)) stop('Posterior data PDFs can only be produced if the data is uncertain, i.e. if x.err is given.')
  
  # Input handling
  x = survey$data$x
  x.err = survey$data$x.err
  x.mesh = survey$grid$x
  x.mesh.dv = survey$grid$dvolume
  n.data = dim(x)[1]
  n.dim = dim(x)[2]
  n.mesh = dim(x.mesh)[1]
  
  # Make priors
  prior = survey$grid$scd
  prior[!is.finite(prior)] = 0
  prior = pmax(0,prior)
  
  # produce posteriors
  m0 = m1 = md = array(NA,c(n.data,n.dim))
  rho.unbiased = rho.unbiased.sqr = array(0,dim(x.mesh)[1])
  for (i in seq(n.data)) {
    
    # make posterior PDF for data point i
    rho.corrected = survey$tmp$rho.observed[[i]]*prior
    s = sum(rho.corrected)
    rho.unbiased = rho.unbiased+rho.corrected/(s*x.mesh.dv)
    rho.unbiased.sqr = rho.unbiased.sqr+(rho.corrected/(s*x.mesh.dv))^2
    
    # mean, standard deviation and mode
    for (j in seq(n.dim)) {
      m0[i,j] = sum(x.mesh[,j]*rho.corrected)/s
      m1[i,j] = sqrt(sum((x.mesh[,j]-m0[i,j])^2*rho.corrected)/s)
    }
    md[i,] = x.mesh[which.max(rho.corrected),]
  }
  
  survey$posterior = list(x.mean = m0, x.stdev = m1, x.mode = md, x.random = m0+m1*array(rnorm(n.data*n.dim),c(n.data,n.dim)))
  survey$grid$scd.posterior = rho.unbiased
  survey$grid$effective.counts = rho.unbiased^2/rho.unbiased.sqr # this equation gives the effective number of sources per bin
  survey$grid$effective.counts[!is.finite(survey$grid$effective.counts)] = 0
  
  invisible(survey)
}
obreschkow/dftools documentation built on June 25, 2021, 10:45 p.m.