edcdf: Cumulative distribution function of time of schooling

Description Usage Arguments Details Value References See Also Examples

View source: R/edcdf.R


edcdf is a function to graph the CDF of time of schooling for any group of countries using the set of estimates developed in Jorda and Alonso (2017).


edcdf(countries, init.y, final.y, database)



character vector with the country codes of the countries to be used. Some macro-regions are already defined and can be used instead of the country codes: South Asia, Europe and Central Asia, Middle East and North Africa, Latin America and the Caribbean, Advanced Economies, Sub-Saharan Africa, East Asia and the Pacific. (see data_country).


the first year in which the function is calculated. Available years are 1970, 1975, 1980, 1985, 1990, 1995, 2000, 2005, 2010.


the last year in which the function is calculated Available years are 1970, 1975, 1980, 1985, 1990, 1995, 2000, 2005, 2010.


population subgrup for which the function is calculated. The following options are available:

  1. "total15": Total population aged over-15.

  2. "total25": Total population aged over-25.

  3. "male15": Male population aged over-15.

  4. "male25": Male population aged over-25.

  5. "female15": Female population aged over-15.

  6. "female25": Female population aged over-25.


We use the set of estimates developed in Jorda and Alonso (2017), where the generalized gamma distribution (Stacy, 1962) is used to model the time that individuals attend school until they complete the educational cycle or decide to drop out. The reason is twofold; first, the generalized gamma distribution is a parsimonious model that nests most of the parametric assumptions described in the literature (see, Marshall and Olkin, 2007). Second, the generalized gamma distribution is able to model one- and zero-mode distributions and to represent several types of hazard rates.The flexibility of this model to consider such heterogeneity, makes it an outstanding candidate to model the distribution of education. It is important to highlight that this parametric model includes as particular cases most of the distributions commonly used in survival analysis, including the Weibull, the exponential, and the gamma distributions, so it would converge to any of its special cases if needed.

To accommodate time and country varying parameters, the distribution of education of each country and year is estimated by non-linear least squares (see, Jorda and Alonso (2017) for further description on the estimation strategy).The distribution of education of a particular group or region of countries is simply defined as a mixture of the national distributions, weighted by their population shares.


edcdf returns a graph of the evolution of the CDF of education over the specified period.


Jorda, V. and Alonso, J.M. (2017). New estimates on educational attainment using a continuous approach (1970-2010), World Development, 90, 281 - 293. http://www.sciencedirect.com/science/article/pii/S0305750X16305010

Marshall, A. W. and Olkin, I. (2007). Life distributions. Structure of nonparametric, semiparametric, and parametric families. New York: Springer.

Stacy, E. W. (1962). A generalization of the gamma distribution. Annals of Mathematical Statistics, 33, 1187 - 1192.

See Also

GenGamma.orig, data_country. Visit http://www.educationdata.unican.esfor more information on the constructoin of the dataset and the available countries.


edcdf(countries = "South Asia", init.y = 1980, final.y = 1990, database = "female25")
edcdf(countries = c("DNK", "FIN", "ISL", "NOR", "SWE"),init.y = 1995,
final.y = 2010, database = "male25")

educineq documentation built on May 30, 2017, 3:47 a.m.