In leerichardson/spew: SPEW Framework for Generating Synthetic Ecosystems

#!/bin/env/Rscript
args <- commandArgs(TRUE)
input_dir <- args[1]
input_dir <- paste0(input_dir, "/")
## input_dir <- "/mnt/beegfs1/data/shared_group_data/syneco/spew_1.2.0/americas/south_america/pry"
## input_dir <- "/mnt/beegfs1/data/shared_group_data/syneco/spew_1.2.0/asia/southern_asia/ind"
cur_dir <- input_dir
library(spew)
library(devtools)
library(reshape2)
data(iso3)
country_name <- iso3$country_name[iso3$iso3 == tolower(basename(input_dir))]

`r toupper(country_name)`

library(devtools)
library(data.table)
library(maptools)
library(ggmap)
library(ggplot2)
library(RColorBrewer)
#setwd(input_dir)
# Params 
output_dir <- input_dir
region <- basename(input_dir)
varsToSummarize = list(vars_hh = "base", vars_p = "base")
sampSize =  10^4
vars_hh <- NULL
doPrint <- FALSE
ipums_fs <- spew:::summarizeFileStructure(output_dir, doPrint)
ipums_list <- spew:::summarize_ipums(output_dir, ipums_fs, doPrint = doPrint,
        sampSize = sampSize, readFun = data.table::fread)
ipums_sum_list <- ipums_list
hh_list <- ipums_list$hh_sum_list

There is/are r ipums_fs$nLevels level(s) of nested ecosystems in r toupper(country_name).

There is/are r nrow(ipums_fs$paths_df) lowest level sub-regions.

Data

The raw counts, microdata, and shapefiles are from IPUMS-I.

Maps

data_path <- input_dir
plot_name <- paste0("diags_", region, "-tn.png")
map_type <- "toner-lite"
savePlot <- FALSE
g <- spew:::plot_region_diags(ipums_sum_list, ipums_fs,
            data_path = data_path, map_type = map_type,
            savePlot = savePlot, plot_name = plot_name)

The above map shows a sub-sample of the different sub regions. Each region has a sample of up to r prettyNum(sampSize, big.mark = ",") households.

Synthetic Households {.tabset}

# Households summary
hh_list <- ipums_list$hh_sum_list
hh_sum_df <- do.call('rbind', lapply(hh_list, "[[", 1))
hh_sum_df$Region <- toupper(hh_sum_df$region_id)
df <- hh_sum_df[, c("Region", "nRecords")]
tot <- sum(hh_sum_df$nRecords)

Total Synthetic Households: r prettyNum(tot, big.mark = ",")

Summaries

Below is a bar chart of the number of households in each region.

cbbPalette <- c("#999999", "#E69F00", "#56B4E9",
    "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")
cols <- rep(cbbPalette, length.out = nrow(df))
colScale <- scale_fill_manual(values = cols)
g <- ggplot(df, aes(x = Region, y = nRecords, fill = Region)) +
    geom_bar(stat = "identity") + ggtitle("Household Counts") +
    labs(x = "Region", y = "Number of Households") + colScale +
    theme_light() + 
    theme(axis.text.x = element_text(angle = 90), legend.position = "none") 
print(g)

Column Names

There are r length(ipums_list$header_hh) columns in the synthetic household ecosystem. They are:

print(ipums_list$header_hh)

Synthetic Persons {.tabset}

# People  summary
p_list <- ipums_list$p_sum_list
p_sum_df <- do.call('rbind', lapply(p_list, "[[", 1))
p_sum_df$Region <- toupper(p_sum_df$region_id)
df <- p_sum_df[, c("Region", "nRecords")]
tot <- sum(p_sum_df$nRecords)

Total Synthetic Persons: r prettyNum(tot, big.mark = ",")

Summary

```r
if( as.character(country_name) %in% c("china", "india", "slovenia", "nigeria"){
    print("This population currently does not have summary statistics.")
} else{
    p_mf <- lapply(p_list, "[[", 2)
    regions <- df$Region
    df2<- do.call('cbind', lapply(p_mf, "[[", 1))
    df_sf <- data.frame(t(df2))
    df_sf <- df_sf / rowSums(df_sf)
    colnames(df_sf) <- c("Male", "Female")

    df_sf$Region <- regions
    cols <- c("darkblue", "lightpink")
    colScale <- scale_fill_manual(values = cols)
    df_melt <- melt(df_sf, id.vars = "Region", varnames = c("Male", "Female"))
    colnames(df_melt)[2:3] <- c("Sex", "Percentage")
    p <- ggplot(data=df_melt, aes(x=Region, y=Percentage, fill=Sex)) +
        geom_bar(stat="identity") + coord_flip() + ggtitle("Ratio of Males to Females") + colScale + theme_light()
    print(p)
}

### Column Names
There are `r length(ipums_list$header_p)` columns in the synthetic household ecosystem.  They are:

```r
print(ipums_list$header_p)

Generation Information

This report was generated on r Sys.time() by spew, an R package used to generate populations throughout the world. Please see our spew Github repo and our previously generated regions at epimodels.org. We are a part of the Informatics Services Group MIDAS branch at Carnegie Mellon University and University of Pittsburgh and are supported by 1 U24 GM110707-01 NIH/NIGMS grant. Please send any comments to sventura@stat.cmu.edu.

leerichardson/spew documentation built on May 21, 2019, 1:39 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com