dfmockdata: Generate mock data
In obreschkow/dftools: Fitting distribution functions such as galaxy mass functions

Description Usage Arguments Value Author(s) See Also Examples

View source: R/dfmockdata.R

This function produces a mock survey with observed log-masses x.obs with Gaussian uncertainties and distances r, using a custom mass function (MF) and selection function.

dfmockdata(
  n = NULL,
  seed = 1,
  veff = NULL,
  f = NULL,
  dVdr = NULL,
  gdf = function(x) dfmodel(x, c(-2, 11, -1.3), type = "Schechter"),
  g = NULL,
  sigma = 0,
  rmin = 0,
  rmax = 20,
  xmin = 2,
  xmax = 13,
  shot.noise = FALSE,
  verbose = FALSE
)

`n`	Number of objects (galaxies) to be generated. If `n=NULL`, the number is determined from the mass function (`gdf`) and the selection criteria (specified by `f` and `dVdr`). Otherwise, the survey volume (specified by the derivative `dVdr`) is automatically multiplied by the scaling factor required to obtain the requested number of objects `n`.
`seed`	An interger number used as seed for the random number generator. If you wish to generate different realizations, with the same survey specifications, it suffices to vary this number.
`veff`	is the effective volume function `veff(x)`, definied as the cosmic volume in which sources of log-mass `x` can be detected by the survey. If this function is specified, `f`, `dVdr` and `g` cannot be specified.
`f`	is the selection function `f(x,r)`, giving the ratio between the expected number of detected galaxies and true galaxies of log-mass `x` and comoving distance `r`. Normally this function is bound between 0 and 1. It takes the value 1 at distances, where objects of mass `x` are easily detected, and 0 at distances, where such objects are impossible to detect. A rapid, continuous drop from 1 to 0 normally occurs at the limting distance `rmax`, at which a galaxy of log-mass `x` can be picked up. `f(x,r)` can never by smaller than 0, but values larger than 1 are conceivable, if there is a large number of false positive detections in the survey. The default is `f = function(x,r) erf((1-1e3r/sqrt(10^x))20)*0.5+0.5`, which mimiks a sensitivity-limited survey with a fuzzy limit.
`dVdr`	is the function `dVdr(r)`, spedifying the derivative of the survey volume `V(r)` as a function of comoving distance `r`. This survey volume is simply the total observed volume, irrespective of the detection probability, which is already specified by the function `f`. Normally, the survey volume is given by `V(r)=Omegar^3/3`, where `Omega` is the solid angle of the survey. Hence, the derivative is `dVdr(r)=Omegar^2`. The default is `Omega=2.13966` [sterradians], chosen such that the expected number of galaxies is exactly 1000 when combined with the default selection function `f(x,r)`.
`gdf`	is the 'generative distribution function', i.e. the underlying mass function, from which the galaxies are drawn. This function is a function of log-mass `x`. It returns the expected number of galaxies per unit of cosmic volume `V` and log-mass `x`. The default is a Schechter function.
`g`	function of distance `r` descibing the number-density variation of galaxies due to cosmic large-scale structure (LSS). Explicitly, `g(r)>0` is the number-density at `r`, relative to the number-density without LSS. Values between 0 and 1 are underdense regions, values larger than 1 are overdense regions. In the absence of LSS, `g(r)=1`. Note that g is automatically rescaled, such that its average value in the survey volume is 1.
`sigma`	Gaussian observing errors in log-mass `x`, which are automatically added to the survey. `sigma` can either be (1) a scalar, (2) a vector of `n` elements, or a function of the true log-mass `x`.
`rmin, rmax`	Minimum and maximum distance of the survey. Outside these limits the function `f(x,r)` will automatically be assumed to be 0.
`xmin, xmax`	Minimum and maximum log-mass in the survey. For optimal performance, specify these boubdaries in such a way that they certainly contain all sources generated by the survey, but don't span a much larger range.
`shot.noise`	Logical flag. If set to `TRUE`, the number of galaxies in the survey can differ from the expected number, following a Poisson distribution.
`verbose`	Logical flag. If set to `TRUE`, some information will be displayed in the console while generating the mock survey.

dfmockdata returns a list of arrays and scalars:

`x`	Array of observed log-mass.
`x.err`	Gaussian uncertainties on x.
`x.true`	Array of true log-masses, i.e. the values of `x` before they were perturbed by random uncertainties `x.err`.
`r`	Array of comoving distances, only available if a function `f` is given.
`f`	Selection function provided as input argument.
`g`	Cosmic LSS function provided as input argument.
`dVdr`	Derivative of survey volume provided as input argument, but rescaled to the requested number of galaxies `n`.
`veff`	Function returning the effective volume as a function of log-mass `x`.
`veff.values`	Array of effective volumes for each galaxy.
`scd`	Function returning the expected source count density as a function of log-mass `x`.
`rmin,rmax`	Range of comoving distances `r`, spanned by the survey. Same as input arguments.
`xmin,xmax`	Range of log-masses `x` provided as input argument. This range is generally larger than the range spanned by the values of `x` and is meant to span the maximally conceivable range of `x` given the survey specifications.
`rescaling.factor`	Value of rescaling factor applied to the cosmic volume to match the requested number of galaxies `n`.

Danail Obreschkow

dffit

# draw 1000 galaxies with mass errors of 0.3 dex from a Schechter function
# with parameters (-2,11,-1.3) and a preset selection function
mock = dfmockdata(sigma = 0.3)

# plot the distance-log(mass) relation of observed data, true data, and approximate survey limit
plot(mock$r,mock$x,col='blue')
points(mock$r,mock$x.true,pch=20)
x = seq(5,11,0.01)
lines(1e-2*sqrt(10^x),x,col='red')

# These data can then be used to fit a MF in several ways. For instance,
# assuming that the effective volume function Veff(x) is known:
selection = mock$veff
survey = dffit(mock$x, selection, mock$x.err)

# or assuming that Veff is known only on a galaxy-by-galaxy basis
selection = mock$veff.values
dffit(mock$x, selection, mock$x.err)

# or assuming that Veff is known on a galaxy-by-balaxy basis, but approximate analytically
# outside the range of observed galaxy masses
selection = list(mock$veff.values, mock$veff)
dffit(mock$x, selection, mock$x.err)

# or assuming that the full selection function f(x,r) and the observing volume
# derivative dVdr(r) are known
selection = list(mock$f, mock$dVdr, mock$rmin,mock$rmax)
dffit(mock$x, selection, mock$x.err)