knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

childdevdata

Lifecycle: stable CRAN status DOI

The goal of childdevdata is to support innovation in child development. The package

  1. Makes anonymous microdata available to the research community;
  2. Adopts a simple naming schema for developmental milestones;
  3. Supports multiple measurement instruments;
  4. Eases joint analyses of the data.

The current version bundles milestone data from ten studies, containing 1,116,061 assessments made on 10,831 unique children during 28,465 visits, covering 21 different instruments.

Installation

You can install the released version of childdevdata from CRAN with

install.packages("childdevdata")

You can install the development version of childdevdata from GitHub with

install.packages("remotes")
remotes::install_github("d-score/childdevdata")

Example

The following example visualises how the proportion of toddlers that are able to walk increases with age.

library(childdevdata)
library(ggplot2)

# we use the Dutch SMOCC data
data <- with(gcdg_nld_smocc, 
             data.frame(age = round(agedays/365.25, 4),
                        walk = ddigmd068))
ggplot(na.omit(data), aes(age, walk)) +
  geom_point(cex = 0.5) +
  geom_smooth(method = "gam", formula = y ~ s(x, bs = "cs"), 
              se = FALSE, lwd = 0.5) +
  theme_bw()

Overview of available dataset and documentation

The package contains multiple datasets. Obtain the list of datasets by

data(package = "childdevdata")$results[, "Item"]

The documentation of the data can be found by typing into the console:

?gcdg_col_lt42m

The size of the data is

dim(gcdg_col_lt42m)

The first six rows and first nine columns are

head(gcdg_col_lt42m[, 1:9])

The first seven columns are administrative and background variables. Column numbers eight and up hold the milestone scores.

Combining data

Concatenating two or more data is straightforward using dplyr. The following code concatenates all publicly available GCDG datasets.

library(dplyr)
alldata <- bind_rows(gcdg_chl_1, gcdg_chn, gcdg_col_lt42m, gcdg_col_lt45m, gcdg_ecu, 
                     gcdg_jam_lbw, gcdg_jam_stunted, gcdg_mdg, gcdg_nld_smocc, gcdg_zaf)
dim(alldata)

Both the number of rows and the number of columns have increased. Milestones not appearing in a particular data obtain all missing (NA) scores.

The number of records per cohort by sex is

table(alldata$cohort, alldata$sex)

Calculating D-score and DAZ

The dscore package calculates the D-score [@vanbuuren2014] and the D-score adjusted for age Z-score (DAZ) for all cases:

library(dscore)
alldata$age <- round(alldata$agedays/365.25, 4)
d <- dscore(alldata)
head(d)
dim(d)

We visualise the D-score distribution by age per cohort as

alldata <- bind_cols(alldata, d)
ggplot(alldata, aes(age, d, group = cohort)) +
  geom_point(cex = 0.3) +
  facet_wrap(~ cohort) +
  ylab("D-score") + xlab("Age (years)") +
  theme_bw()

Why this package?

We all want our children to grow and prosper. While there is no shortage of apps and instruments to track child development, it is often unclear which data went into the construction of these tools. In order to improve measurement and norm setting of child development, we need child-level response data per milestone and age. However, no such public dataset seem to exist. The childdevdata package fills that void.

The package grew out of a project in which we collected milestone data from 16 cohorts. See @weber2019 and http://d-score.org/dbook2/ for results. Ten cohort owners graciously decided to make their data available for third parties. We are grateful to them.

How to use the data?

Tremendous effort has gone into the collection and harmonisation of the data. You can use the data in this package under the CC BY 4.0 license. Basically, this means that you may share and adapt the data, on the condition that you give appropriate credit and clearly indicate any changes you've made. See the license text for details.

We expect that you will properly cite the source data when you use the data in your own product or publication, as follows:

The citation of the childevdata data package is

@software{stef_van_buuren_2021_4700229,
  author       = {Stef van Buuren and
                  Iris Eekhout and
                  Marta Rubio Codina and
                  Orazio Attanasio and
                  Costas Meghir and
                  Emla Fitzsimons and
                  Sally Grantham-McGregor and
                  Maria Caridad Araujo and
                  Susan Walker and
                  Susan Chang and
                  Christine Powell and
                  Ann Weber and
                  Lia Fernald and
                  Paul Verkerk and
                  Linda Richter and
                  Betsy Lozoff},
  title        = {D-score/childdevdata: childdevdata 1.1.0},
  month        = apr,
  year         = 2021,
  publisher    = {Zenodo},
  version      = {v1.1.0},
  doi          = {10.5281/zenodo.4700229},
  url          = {https://doi.org/10.5281/zenodo.4700229}
}

Want to contribute?

Do you have similar data and want to help others to advance the field? Please let us know. We hope that the childdevdata package may continue to grow into a valuable resource for developers and researchers worldwide.

Acknowledgement

This study was supported by the Bill & Melinda Gates Foundation. The contents are the sole responsibility of the authors and may not necessarily represent the official views of the Bill & Melinda Gates Foundation or other agencies that may have supported the primary data studies used in the present study.

References



D-score/childdevdata documentation built on April 26, 2023, 8:21 a.m.