library(SGP) library(SGPdata) is_html_output = function() { knitr::opts_knit$get("rmarkdown.pandoc.to")=="html" } knitr::opts_chunk$set( collapse=TRUE, comment="", prompt=TRUE, fig.dpi=96) if (is_html_output()) { options(width=1000) }
There a two common formats for representing longitudinal (time dependent) student assessment data: WIDE and LONG format. For WIDE format data, each case/row represents a unique student and columns represent variables associated with the student at different times. For LONG format data, time dependent data for the student is spread out across multiple rows in the data set. The SGPdata package, installed when one installs the SGP package, includes exemplar WIDE and LONG data sets (sgpData and sgpData_LONG, respectively) to assist in setting up your data.
Deciding whether to format in WIDE or LONG format is driven by many conditions. In terms of the analyses that can be performed using the SGP package,
the WIDE data format is used by the lower level functions studentGrowthPercentiles
and studentGrowthProjections
whereas the higher level wrapper functions
utilize the LONG data format. For all but the simplest, one-off, analyses, you're likely better off formatting your data in the LONG format and using
the higher level functions. This is particularly true is you plan on running SGP analyses operationally year after year where LONG data has numerous preparation
and storage benefits over WIDE data.
Longitudinal data in WIDE format is usually the most "intuitive" longitudinal format for those new to longitudinal/time-dependent data. Each row of the data set provides all the data for the individual case with the variable names indicating what time period the data is from. Though intuitive, the data is often difficult to work with, particularly in situations where data is frequently added to the
The data set sgpData
is an anonymized, panel data set comprisong 5 years of annual, vertically scaled, assessment data in WIDE format. This exemplar data set
models the format for data used with the lower level studentGrowthPercentiles
and
studentGrowthProjections
functions.
head(sgpData)
The Wide data format illustrated by sgpData
and utilized by the SGP package can accomodate any number of occurrences but must follow a specific column order.
Variable names are irrelevant, position in the data set is what's important:
In sgpData
above, the first column, ID, provides the unique student identifier. The next 5 columns, GRADE_2013, GRADE_2014, GRADE_2015, GRADE_2016, and
GRADE_2017, provide the grade level of the student assessment score in each of the 5 years. The last 5 columns, SS_2013, SS_2014, SS_2015, SS_2016, and
SS_2017, provide the scale scores associated with the student in each of the 5 years. In most cases the student does not have 5 years of test data so the data
shows the missing value (NA).
Using wide-format data like sgpData
with the SGP package is, in general, straight forward.
sgp_g4 <- studentGrowthPercentiles( panel.data=sgpData, sgp.labels=list(my.year=2015, my.subject="Reading"), percentile.cuts=c(1,35,65,99), grade.progression=c(3,4))
Please consult the SGP data analysis vignette for more comprehensive documentation on how to use sgpData
(and WIDE data formats in general)
for SGP analyses.
The data set sgpData_LONG
is an anonymized, panel data set comprising 5 years of annual, vertcially scaled, assessment data in LONG format for two content areas
(ELA and Mathematics). This exemplar data set models the format for data used with the higher level functions
abcSGP
, prepareSGP
,
analyzeSGP
, combineSGP
,
summarizeSGP
, visualizeSGP
, and outputSGP
head(sgpData_LONG)
We recommend LONG formated data for use with operational analyses. Managing data in long format is more simple than data in the wide format. For example,
when updating analyses with another year of data, the data is appended onto the bottom of the currently existing long data set. All higher level
functions in the SGP package are designed for use with LONG format data. In addition, these functions often assume the existence of state specific meta-data
in the embedded SGPstateData meta-data.
See the SGP package documentation for more comprehensive documentation on how to use sgpData
for SGP calculations.
There are 7 required variables when using LONG data with SGP analyses: VALID_CASE
, CONTENT_AREA
, YEAR
, ID
, SCALE_SCORE
, GRADE
and ACHIEVEMENT_LEVEL
(on required if running student growth projections). LAST_NAME
and FIRST_NAME
are required if creating individual level student
growth and achievement plots. All other variables are demographic/student categorization variables used for creating student aggregates by the
summarizeSGP
function.
The sgpData_LONG
data set contains data for 5 years across 2 content areas (ELA and Mathematics)
The data set sgptData_LONG
is an anonymized, panel data set comprising 8 windows (3 windows annually) of assessment data in LONG format for 3 content areas
(Early Literacy, Mathematics, and Reading). This data set is similar to the sgpData_LONG
data set without the demographic variables and with an additional DATE
variable indicating the date associated with the student assessment record.
head(sgptData_LONG)
The data set sgpData_INSTRUCTOR_NUMBER
is an anonymized, student-instructor lookup table that provides insturctor information associated with each students test record.
Note that just as each teacher can (and will) have more than 1 student associated with them, a student can have more than one teacher associated with their test
record. That is, multiple teachers could be assigned to the student in a single content area for a given year.
head(sgpData_INSTRUCTOR_NUMBER)
library(plotly) p <- plot_ly(economics, x = ~date, y = ~unemploy / pop) p
If you have a contribution or topic request for this vignette, don't hesitate to write or set up an issue on GitHub.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.