Automatically download, join, and clean the NHS Digital Maternity Services Monthly Statistics data (MSMS), which is derived from the Maternity Services Data Set (MSDS). When new information is released by NHS Digital easily download and join it with the data already downloaded.
Each month, NHS Digital releases Maternity Services Monthly Statistics, which are derived from the Maternity Services Data Set. Multiple CSV and XLSX files are released each month, addressing different parts of the available data.
Working with the raw data in this form is time-consuming, and involves downloading the raw files, handling file naming inconsistencies, and joining data from multiple months together to form a clean time-series dataset.
This package enables an automated data pipeline by:
Implementing an example plotting function that quickly demonstrates the volume of data available.
Potential future work:
You can install from GitHub using the {remotes} package with:
# install.packages("remotes")
remotes::install_github("https://github.com/ThomUK/MSDSpipeline")
# Load the package
library(MSDSpipeline)
Using the package is a two-stage process. First the data must be downloaded locally. Next, each of the 3 groups of data contained in MSDS (measures, data, and dq) must be joined together and tidied. Once tidied the resulting dataframes are ready for use in your analysis.
# Download the data to your local machine, or a destination of your choice.
# This will begin downloading 780MB+ and 300+ files to your machine.
# Files are also sorted into subfolders, according to the information contained in each file.
# The download can be cancelled in RStudio by clicking the red button in the console window.
msds_download_data(destination = "data/msds_download")
# Tidy the data you need. This will combine and tidy data, including consolidating column names,
# fixing date formats, and altering data and unit columns in a consistent way.
measures_data <- msds_tidy_measures()
exp_data <- msds_tidy_data()
dq_data <- msds_tidy_dq()
```
3. Do your analysis. Some demo plotting functions are included below to illustrate the available data.
```r
# Measure
plot_demo_measure(measures_data, "CQIMPreterm", "RX1")
# Exp-data
plot_demo_data(exp_data, "TotalBabies", "RX1")
# DQ
plot_demo_dq(dq_data, "RX1")
I am always interested to hear from others working with maternity data. If you spot a problem, please raise an issue, or make a PR.
This source data could be collated with a project similar to this one, but no project currently exists.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.