dffit: Fit a distribution function, such as a galaxy mass function

Description Usage Arguments Value Author(s) Examples

View source: R/dffit.R


This function fits galaxy mass function (MF) to a discrete set of N galaxies with noisy data. More generally, dffit finds the most likely P-dimensional distribution function (DF) generating N objects i=1,...,N with uncertain measurements P observables. For instance, if the objects are galaxies, it can fit a MF (P=1), a mass-size distribution (P=2) or the mass-spin-morphology distribution (P=3). A full description of the algorithm can be found in Obreschkow et al. (2017).


dffit(x, selection = NULL, x.err = NULL, distance = NULL,
  phi = "Schechter", p.initial = NULL, n.iterations = 100,
  n.resampling = NULL, correct.mle.bias = FALSE, correct.lss.bias = FALSE,
  lss.sigma = NULL, write.fit = TRUE, x.grid = list(seq(4, 12, 0.01)))



Normally x is a N-element vector, representing the log-masses (log10(M/Msun)) of N galaxies. More generally, x can be either a vector of N elements or a matrix of N-by-P elements, containing the values of one or P observables of N objects, respectively.


Specifies the effective volume Veff(xval) in which a galaxy of log-mass xval can be observed; or, more generally, the volume in which an object of observed values xval[1:P] can be observed. This volume can be specified in five ways: (1) If selection is a single positive number, it will be interpreted as a constant volume, Veff(xval)=selection, in which all galaxies are fully observable. Veff(xval)=0 is assumed outside the "observed domain". This domain is defined as min(x)<=xval<=max(x) for one observable (P=1), or as min(x[,j])<=xval[j]<=max(x[,j]) for all j=1,...,P if P>1. This mode can be used for volume-complete surveys or for simulated galaxies in a box. (2) If selection is a vector of N elements, they will be interpreted as the effective volumes for each of the N galaxies. Veff(xval) is interpolated (linearly in 1/Veff) for other values xval. Veff(xval)=0 is assumed outside the observed domain. (3) selection can be a function of P variables, which directly specivies the effective volume for any xval, i.e. Veff(xval)=selection(xval). (4) selection can also be a list (selection = list(Veff.values, Veff.fct)) of an N-element vector Veff.values and a P-dimensional function Veff.fct. In this case, the effective volume is computed using a hybrid scheme of modes (2) and (3): Veff(xval) will be interpolated from the N values of Veff.values inside the observed domain, but set equal to Veff.fct outside this domain. (5) Finally, selection can be a list of two functions and two numbers: selection = list(f, V, rmin, rmax), where f = function(xval,r) is the isotropic selection function and V = function(r) is the total survey volume as a function of comoving distance r, and rmin and rmax are the distance limits of the survey.


Optional vector or array specifying the observational errors of x. If x is a vector then x.err must also be a vector of same length. Its elements are interpreted as the standard deviations of Gaussian uncertainties in x. If x is a N-by-P matrix representing N objects with P observables, then x.err must be either a N-by-P matrix or a N-by-P-by-P array. In the first case, the elements x.err[i,] are interpreted as the standard deviations of Gaussian uncertainties on x[i,]. In the second case, the P-by-P matrices x.err[i,,] are interpreted as the covariance matrices of the P observed values x[i,].


Optional vector of N elements specifying the comoving distances of the N galaxies. This vector is only needed if correct.lss.bias = TRUE.


Either a string or a function specifying the DF to be fitted. A string is interpreted as the name of a predefined mass function. Available options are 'Schechter' for Schechter function (3 parameters), 'PL' for a power law (2 parameters), or 'MRP' for an MRP function (4 parameters). Alternatively, phi = function(xval,p) can be any function of the P observable(s) xval and a list of parameters p. IMPORTANT: The function phi(xval,p) must be fully vectorized in xval, i.e. it must output a vector of N elements if xval is an N-by-P array (such as x). Note that if phi is given as a function, the argument p.initial is mandatory.


Initial model parameters for fitting the DF.


Maximum number of iterations in the repeated fit-and-debias algorithm to evaluate the maximum likelihood.


Integer (>0) specifying the number of iterations for the resampling of the most likely DF used to evaluate realistic parameter uncertainties with quantiles. If n.resampling = NULL, no resampling is performed.


The maximum likelihood estimator (MLE) of a finite dataset can be biased - a general property of the ML approach. If TRUE, dffit also outputs the parameters, where this estimator bias has been corrected, to first order in 1/N, using jackknifing.


If TRUE the distance values are used to correct for the observational bias due to galaxy clustering (large-scale structure).


Cosmic variance of survey volume. Explicitly, lss.sigma is interpreted as the expected relative RMS on the total number of galaxies in the survey volume. This value must be determined from a cosmological model.


If TRUE, the best-fitting parameters are displayed in the console.


sets the grid on which the numerical integration is performed. This grid must be specified as x.grid = list(x1, ..., xp), where xi is an equally spaced vector with the grid values for the i-th observable. In the case of a MF (a 1-dimensional DF), x.grid = list(seq(xmin,xmax,by=dx)), where xmin and xmax are the minimum/maximum values of the log-mass considered in the numerical integratino and dx is the grid spacing.


Returns a structured list. The sublist input contains all relevant input arguments. The sublist fit contains all the output arguments of the MLE algorithm. The output can be visualized using mfplot, plot.df and dfwrite.


Danail Obreschkow


# basic example
data = mfdata()
df = dffit(data$x, data$selection)
mfplot(df, xlim=c(2e6,5e10))

# show fitted parameter PDFs and covariances

# show fitted effective volume function

# include measurement errors in fit and determine Schechter function uncertainties from resampling
df = dffit(data$x, data$selection, data$x.err, n.resampling = 1e2)
mfplot(df, uncertainty.type=3, nbins=10, bin.xmin=6.5, bin.xmax=9.5, xlim=c(2e6,5e10), ylim=c(2e-3,1.5))

# evaluate bias corrected maximum-likelihood estimator and add to plot
df = dffit(data$x, data$selection, data$x.err, write.fit=FALSE, correct.mle.bias=TRUE)
mfplot(df, show.uncertainties=FALSE, nbins=0, show.bias.correction=TRUE, col.bias.correction="blue", add=TRUE)

# evaluate posteriors of the mass measurements and visualize the change in the mass mode between observation and posterior
# i.e. the Eddington bias correction
df = posterior.data(df)
plot(data$x,10^df$posterior$x.mode.correction,log='x',pch=20,xlab='Mass',ylab='Eddington bias correction = posterior/observed mass mode')

obreschkow/gdftools documentation built on June 10, 2017, 12:40 a.m.