# dffit: Fit a distribution function, such as a galaxy mass function In obreschkow/gdftools: Fitting of generative distribution functions such as galaxy mass functions

## Description

This function fits galaxy mass function (MF) to a discrete set of `N` galaxies with noisy data. More generally, `dffit` finds the most likely `P`-dimensional distribution function (DF) generating `N` objects `i=1,...,N` with uncertain measurements `P` observables. For instance, if the objects are galaxies, it can fit a MF (`P=1`), a mass-size distribution (`P=2`) or the mass-spin-morphology distribution (`P=3`). A full description of the algorithm can be found in Obreschkow et al. (2017).

## Usage

 ```1 2 3 4``` ```dffit(x, selection = NULL, x.err = NULL, distance = NULL, phi = "Schechter", p.initial = NULL, n.iterations = 100, n.resampling = NULL, correct.mle.bias = FALSE, correct.lss.bias = FALSE, lss.sigma = NULL, write.fit = TRUE, x.grid = list(seq(4, 12, 0.01))) ```

## Arguments

 `x` Normally `x` is a `N`-element vector, representing the log-masses (log10(M/Msun)) of `N` galaxies. More generally, `x` can be either a vector of `N` elements or a matrix of `N-by-P` elements, containing the values of one or `P` observables of `N` objects, respectively. `selection` Specifies the effective volume `Veff(xval)` in which a galaxy of log-mass `xval` can be observed; or, more generally, the volume in which an object of observed values `xval[1:P]` can be observed. This volume can be specified in five ways: (1) If `selection` is a single positive number, it will be interpreted as a constant volume, `Veff(xval)=selection`, in which all galaxies are fully observable. `Veff(xval)=0` is assumed outside the "observed domain". This domain is defined as `min(x)<=xval<=max(x)` for one observable (`P=1`), or as `min(x[,j])<=xval[j]<=max(x[,j])` for all `j=1,...,P` if `P>1`. This mode can be used for volume-complete surveys or for simulated galaxies in a box. (2) If `selection` is a vector of `N` elements, they will be interpreted as the effective volumes for each of the `N` galaxies. `Veff(xval)` is interpolated (linearly in `1/Veff`) for other values `xval`. `Veff(xval)=0` is assumed outside the observed domain. (3) `selection` can be a function of `P` variables, which directly specivies the effective volume for any `xval`, i.e. `Veff(xval)=selection(xval)`. (4) `selection` can also be a list (`selection = list(Veff.values, Veff.fct)`) of an `N`-element vector `Veff.values` and a `P`-dimensional function `Veff.fct`. In this case, the effective volume is computed using a hybrid scheme of modes (2) and (3): `Veff(xval)` will be interpolated from the `N` values of `Veff.values` inside the observed domain, but set equal to `Veff.fct` outside this domain. (5) Finally, `selection` can be a list of two functions and two numbers: `selection = list(f, V, rmin, rmax)`, where `f = function(xval,r)` is the isotropic selection function and `V = function(r)` is the total survey volume as a function of comoving distance `r`, and `rmin` and `rmax` are the distance limits of the survey. `x.err` Optional vector or array specifying the observational errors of `x`. If `x` is a vector then `x.err` must also be a vector of same length. Its elements are interpreted as the standard deviations of Gaussian uncertainties in `x`. If `x` is a `N-by-P` matrix representing `N` objects with `P` observables, then `x.err` must be either a `N-by-P` matrix or a `N-by-P-by-P` array. In the first case, the elements `x.err[i,]` are interpreted as the standard deviations of Gaussian uncertainties on `x[i,]`. In the second case, the `P-by-P` matrices `x.err[i,,]` are interpreted as the covariance matrices of the `P` observed values `x[i,]`. `distance` Optional vector of `N` elements specifying the comoving distances of the `N` galaxies. This vector is only needed if `correct.lss.bias = TRUE`. `phi` Either a string or a function specifying the DF to be fitted. A string is interpreted as the name of a predefined mass function. Available options are `'Schechter'` for Schechter function (3 parameters), `'PL'` for a power law (2 parameters), or `'MRP'` for an MRP function (4 parameters). Alternatively, `phi = function(xval,p)` can be any function of the `P` observable(s) `xval` and a list of parameters `p`. IMPORTANT: The function `phi(xval,p)` must be fully vectorized in `xval`, i.e. it must output a vector of `N` elements if `xval` is an `N-by-P` array (such as `x`). Note that if `phi` is given as a function, the argument `p.initial` is mandatory. `p.initial` Initial model parameters for fitting the DF. `n.iterations` Maximum number of iterations in the repeated fit-and-debias algorithm to evaluate the maximum likelihood. `n.resampling` Integer (>0) specifying the number of iterations for the resampling of the most likely DF used to evaluate realistic parameter uncertainties with quantiles. If `n.resampling = NULL`, no resampling is performed. `correct.mle.bias` The maximum likelihood estimator (MLE) of a finite dataset can be biased - a general property of the ML approach. If `TRUE`, `dffit` also outputs the parameters, where this estimator bias has been corrected, to first order in `1/N`, using jackknifing. `correct.lss.bias` If `TRUE` the `distance` values are used to correct for the observational bias due to galaxy clustering (large-scale structure). `lss.sigma` Cosmic variance of survey volume. Explicitly, `lss.sigma` is interpreted as the expected relative RMS on the total number of galaxies in the survey volume. This value must be determined from a cosmological model. `write.fit` If `TRUE`, the best-fitting parameters are displayed in the console. `x.grid` sets the grid on which the numerical integration is performed. This grid must be specified as `x.grid = list(x1, ..., xp)`, where `xi` is an equally spaced vector with the grid values for the i-th observable. In the case of a MF (a 1-dimensional DF), `x.grid = list(seq(xmin,xmax,by=dx))`, where `xmin` and `xmax` are the minimum/maximum values of the log-mass considered in the numerical integratino and `dx` is the grid spacing.

## Value

Returns a structured list. The sublist `input` contains all relevant input arguments. The sublist `fit` contains all the output arguments of the MLE algorithm. The output can be visualized using `mfplot`, `plot.df` and `dfwrite`.

## Author(s)

Danail Obreschkow

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25``` ```# basic example data = mfdata() df = dffit(data\$x, data\$selection) mfplot(df, xlim=c(2e6,5e10)) # show fitted parameter PDFs and covariances dfplotcov(df) # show fitted effective volume function dfplotveff(df) # include measurement errors in fit and determine Schechter function uncertainties from resampling df = dffit(data\$x, data\$selection, data\$x.err, n.resampling = 1e2) mfplot(df, uncertainty.type=3, nbins=10, bin.xmin=6.5, bin.xmax=9.5, xlim=c(2e6,5e10), ylim=c(2e-3,1.5)) # evaluate bias corrected maximum-likelihood estimator and add to plot df = dffit(data\$x, data\$selection, data\$x.err, write.fit=FALSE, correct.mle.bias=TRUE) mfplot(df, show.uncertainties=FALSE, nbins=0, show.bias.correction=TRUE, col.bias.correction="blue", add=TRUE) # evaluate posteriors of the mass measurements and visualize the change in the mass mode between observation and posterior # i.e. the Eddington bias correction df = posterior.data(df) plot(data\$x,10^df\$posterior\$x.mode.correction,log='x',pch=20,xlab='Mass',ylab='Eddington bias correction = posterior/observed mass mode') lines(10^seq(5,10,by=0.01),df\$fit\$functions\$source.count(seq(5,10,by=0.01))*1e-2+1) abline(h=1,v=5.4e7) ```

obreschkow/gdftools documentation built on June 10, 2017, 12:40 a.m.