datgz_get: Return PMA output data from dat.gz data file and YAML metdata
In mgunther87/ipumsPMA: Common functions for IPUMS PMA staff

Description Usage Arguments Details Note Author(s) Examples

This function provides an R-based workaround to checking output data with STATA. It uses a YAML metadata file stored in the specified workspace folder or in the PMA output_data folder.

It loads output data as a tibble with columns of the Haven "labelled" class (following the convention in the ipumsr package).

Any warnings from the sample log file will also be reported, or else a message will show that the log file contains none.

1	datgz_get(sample, workspace, job_number)

`sample`	A sample that has been "run" and is located in either the specified workspace or in the PMA output_data folder.
`workspace`	Optional: if specified, must be a character string matching the name of an open PMA workspace. If not specified, the function will search in the PMA output_data/current folder instead.
`job_number`	Optional integer: the number associated with a "jobs" folder in the specified workspace. If job_number is provided, data will be read from that location, and not the outputdata/current folder in the workspace. If a workspace is not provided, job_number will be ignored, and data will be read from the PMA output_data folder.

User may specify an open PMA workspace with an "output_data/current" folder or a job folder (created after the specified sample has been "run"). If no workspace is specified, function will read output data at "pkg/ipums/PMA/output_data/current". A YAML file will be sought in the "syntax" subfolder at this location.

This function functions much like mpctools::read_output_from_yaml(), except that 1) the YAML file and data file are automatically found in the specified workspace or the PMA output folder, and 2) character encoding "ASCII" is used because it seems to handle PMA YAML files without error.

Matt Gunther

## Not run: 
# Default: looks in pma/output_data/current
datgz_get("ke2015a_hh")

# Uses ./my_workspace/output_data/current
datgz_get("ke2015a_hh", "my_workspace") 

# Uses ./my_workspace/jobs/3
datgz_get("ke2015a_hh", "my_workspace", 3) 

## End(Not run)