synthetic_us_2010 | R Documentation |
A dataset containing exposure, confounders, and outcome for causal inference studies. The dataset is hosted on Harvard dataverse \Sexpr[results=rd]{tools:::Rd_expr_doi("10.7910/DVN/L7YF2G")}. This dataset was produced from five different resources. Please see https://github.com/NSAPH-Projects/synthetic_data/ for the data processing pipelines. In the following
Exposure Data
The exposure parameter is PM2.5. Di et al. (2019) provided daily, and annual PM2.5 estimates at 1 km×1 km grid cells in the entire United States. The data can be downloaded from Di et al. (2021). Features in this category starts with qd_ prefix.
Census Data
The main reference for getting the census data is the United States Census
Bureau. There are numerous studies and surveys for different geographical
resolutions. We use 2010 county level American County Survey at the county
level (acs5
). Features in this category starts with cs_ prefix.
CDC Data
The Centers for Disease Control and Prevention (CDC), provides the Behavioral Risk Factor Surveillance System (Centers for Disease Control and Prevention (2021)), which is the nation’s premier system of health-related telephone surveys that collect state data about U.S. residents regarding their health-related risk behaviors.
GridMET Data
Climatology Lab at the University of California, Merced, provides the GridMET data (Abatzoglou (2013)). The data set is daily surface meteorological data covering the contiguous United States.
CMS Data
The Centers for Medicare and Medicaid Services(CMS) provides synthetic data at the county level for 2008-2010 (Centers for Medicare & Medicaid Services (2021)).
The definition of each variables are provided below. All data are collected for 2010 and aggregated into the county level and in the contiguous United States.
data(synthetic_us_2010)
A data frame with 3109 rows and 46 variables:
Mean PM2.5 (microgram/m3)
The proportion of below poverty level population among 65+ years old.
The proportion of Hispanic or Latino population among 65+ years old.
The proportion of Black or African American population among 65+ years old.
The proportion of White population among 65 years and over.
The proportion of American Indian or Alaska native population among 65 years and over.
The proportion of Asian population among 65 years and over.
The proportion of other races population among 65 years and over.
The proportion of the population with below high school level education among 65 years and over.
Median Household income in the past 12 months (in 2010 inflation-adjusted dollars) where householder is 65 years and over.
Median house value (USD)
Total Population
Area of each county (square miles)
The number of the population in one square mile.
Body Mass Index.
The proportion of current smokers.
The proportion of some days smokers.
The proportion of former smokers.
The proportion of never smokers.
The proportion of not known smokers.
Annual mean of daily minimum temperature (K)
The mean of daily minimum temperature during summer (K)
The mean of daily minimum temperature during winter (K)
Annual mean of daily maximum temperature (K)
The mean of daily maximum temperature during summer (K)
The mean of daily maximum temperature during winter (K)
Annual mean of daily minimum relative humidity (%)
The mean of daily minimum relative humidity during summer (%)
The mean of daily minimum relative humidity during winter (%)
Annual mean of daily maximum relative humidity (%)
The mean of daily maximum relative humidity during summer (%)
The mean of daily maximum relative humidity during winter (%)
Annual mean of daily mean specific humidity (kg/kg)
The mean of daily mean specific humidity during summer(kg/kg)
The mean of daily mean specific humidity during winter(kg/kg)
The proportion of deceased patients.
The proportion of White patients.
The proportion of Black patients.
The proportion of Hispanic patients.
The proportion of Other patients.
The proportion of Female patients.
The region that the county is located in.
NORTHEAST=("NY","MA","PA","RI","NH","ME","VT","CT","NJ") SOUTH=("DC","VA","NC","WV","KY","SC","GA","FL","AL","TN","MS","AR","MD","DE","OK","TX","LA") MIDWEST=c("OH","IN","MI","IA","MO","WI","MN","SD","ND","IL","KS","NE") WEST=c("MT","CO","WY","ID","UT","NV","CA","OR","WA","AZ","NM")
Federal Information Processing Standards, a unique ID for each county.
County, State name.
State abbreviation.
State numerical code.
Abatzoglou, John T. 2013. “Development of Gridded Surface Meteorological Data for Ecological Applications and Modelling.” International Journal of Climatology 33 (1): 121–31. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1002/joc.3413")}.
Centers for Disease Control and Prevention. 2021. “Behavioral Risk Factor Surveillance System.” https://www.cdc.gov/brfss/annual_data/annual_2010.htm/.
Centers for Medicare & Medicaid Services. 2021. “CMS 2008-2010 Data Entrepreneurs’ Synthetic Public Use File (DE-SynPUF).” https://www.cms.gov/data-research/statistics-trends-and-reports/medicare-claims-synthetic-public-use-files/cms-2008-2010-data-entrepreneurs-synthetic-public-use-file-de-synpuf.
Di, Qian, Heresh Amini, Liuhua Shi, Itai Kloog, Rachel Silvern, James Kelly, M Benjamin Sabath, et al. 2019. “An Ensemble-Based Model of Pm2. 5 Concentration Across the Contiguous United States with High Spatiotemporal Resolution.” Environment International 130: 104909. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.envint.2019.104909")}.
Di, Qian, Yaguang Wei, Alexandra Shtein, Carolynne Hultquist, Xiaoshi Xing, Heresh Amini, Liuhua Shi, et al. 2021. “Daily and Annual Pm2.5 Concentrations for the Contiguous United States, 1-Km Grids, V1 (2000 - 2016).” NASA Socioeconomic Data; Applications Center (SEDAC). \Sexpr[results=rd]{tools:::Rd_expr_doi("10.7927/0rvr-4538")}.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.