knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "tools/figs/README-", message = FALSE, warning = FALSE )
A project to generate realistic synthetic unit-level longitudinal education data to empower collaboration in education analytics.
The package is organized into the following functions:
simpop()
is the overall function that runs the simulation, this function calls
many subfunctions to simulate different elements of the student datacleaners
are functions which take the output from the simpop
function and
reshape it into data formats for different analyses. Currently only two cleaners
are supported -- CEDS
and sdp_cleaner()
which prepare the data into a CEDS
like format and into the Strategic Data Project college-going analysis file
specification respectively.sim_control()
-- a function that controls all of the parameters of the simpop
simulation. The details of this function are covered in the vignettes.To use OpenSDPsynthR
, follow the instructions below:
The development version of the package is able to be installed using the
install_github()
. To use this command you will need to install the devtools
package.
devtools::install_github("opensdp/OpenSDPsynthR")
Load the package
library(OpenSDPsynthR)
The main function of the package is simpop
which generates a list of data
elements corresponding to simulated educational careers, K-20, for a user
specified number of students. In R, a list is a data structure that can contain
multiple data elements of different structures. This can be used to emulate
the multiple tables of a Student Information System (SIS).
out <- simpop(nstu = 500, seed = 213, control = sim_control(nschls = 3))
Currently ten tables are produced:
names(out)
Data elements produced include:
There are two tables of metadata about the assessment data above to be used in cases where multiple types of student assessment are analyzed together.
table_names <- data.frame(table = NULL, column = NULL) for(i in seq_along(out)){ table_name <- names(out)[[i]] columns <- names(out[[i]]) tmp <- data.frame(table = table_name, column = columns, stringsAsFactors = FALSE) table_names <- bind_rows(table_names, tmp) }
head(out$demog_master %>% arrange(sid) %>% select(1:4)) head(out$stu_year, 10)
You can reformat the synthetic data for use in specific types of projects. Currently two functions exist to format the simulated data into an analysis file matching the SDP College-going data specification and a CEDS-like data specification. More of these functions are planned in the future.
cgdata <- sdp_cleaner(out) ceds <- ceds_cleaner(out)
By default, you only need to specify the number of students to simulate to the
simpop
command. The package has default simulation parameters that will result
in creating a small school district with two schools.
names(sim_control())
These parameters can have complex structures to allow for conditional and random generation of data. Parameters fall into four categories:
simglm
functionFor more details, see the simulation control vignette.
vignette("Controlling the Data Simulation", package = "OpenSDPsynthR")
OpenSDPsynthR
is part of the OpenSDP project.
OpenSDP is an online, public repository of analytic code, tools, and training intended to foster collaboration among education analysts and researchers in order to accelerate the improvement of our school systems. The community is hosted by the Strategic Data Project, an initiative of the Center for Education Policy Research at Harvard University. We welcome contributions and feedback.
These materials were originally authored by the Strategic Data Project.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.