View source: R/micro_read_yield.r
read_ipums_micro_yield | R Documentation |
Reads a dataset downloaded from the IPUMS extract system, but does
so by returning an object that can read a group of lines at a time.
This is a more flexible way to read data in chunks than
the functions like read_ipums_micro_chunked
, allowing
you to do things like reading parts of multiple files at the same time
and resetting from the beginning more easily than with the chunked
functions. Note that while other read_ipums_micro*
functions
can read from .csv(.gz) or .dat(.gz) files, these functions can only read
from .dat(.gz) files.
read_ipums_micro_yield( ddi, vars = NULL, data_file = NULL, verbose = TRUE, var_attrs = c("val_labels", "var_label", "var_desc"), lower_vars = FALSE ) read_ipums_micro_list_yield( ddi, vars = NULL, data_file = NULL, verbose = TRUE, var_attrs = c("val_labels", "var_label", "var_desc"), lower_vars = FALSE )
ddi |
Either a filepath to a DDI xml file downloaded from
the website, or a |
vars |
Names of variables to load. Accepts a character vector of names, or
|
data_file |
Specify a directory to look for the data file. If left empty, it will look in the same directory as the DDI file. |
verbose |
Logical, indicating whether to print progress information to console. |
var_attrs |
Variable attributes to add from the DDI, defaults to
adding all (val_labels, var_label and var_desc). See
|
lower_vars |
Only if reading a DDI from a file, a logical indicating
whether to convert variable names to lowercase (default is FALSE, in line
with IPUMS conventions). Note that this argument will be ignored if
argument |
These functions return an IpumsYield R6 object which have the following methods:
yield(n = 10000)
A function to read the next 'yield' from the data,
returns a 'tbl_df' (or list of 'tbl_df' for 'hipread_list_yield()')
with up to n rows (it will return NULL if no rows are left, or all
available ones if less than n are available).
reset()
A function to reset the data so that the next yield will
read data from the start.
is_done()
A function that returns whether the file has been completely
read yet or not.
cur_pos
A property that contains the next row number that will be
read (1-indexed).
A HipYield R6 object (See 'Details' for more information)
hipread::HipYield
-> hipread::HipLongYield
-> IpumsLongYield
new()
IpumsLongYield$new( ddi, vars = NULL, data_file = NULL, verbose = TRUE, var_attrs = c("val_labels", "var_label", "var_desc"), lower_vars = FALSE )
yield()
IpumsLongYield$yield(n = 10000)
hipread::HipYield
-> hipread::HipListYield
-> IpumsListYield
new()
IpumsListYield$new( ddi, vars = NULL, data_file = NULL, verbose = TRUE, var_attrs = c("val_labels", "var_label", "var_desc"), lower_vars = FALSE )
yield()
IpumsListYield$yield(n = 10000)
Other ipums_read:
read_ipums_micro_chunked()
,
read_ipums_micro()
,
read_ipums_sf()
,
read_nhgis()
,
read_terra_area()
,
read_terra_micro()
,
read_terra_raster()
# An example using "long" data long_yield <- read_ipums_micro_yield(ipums_example("cps_00006.xml")) # Get first 10 rows long_yield$yield(10) # Get 20 more rows now long_yield$yield(20) # See what row we're on now long_yield$cur_pos # Reset to beginning long_yield$reset() # Read the whole thing in chunks and count Minnesotans total_mn <- 0 while (!long_yield$is_done()) { cur_data <- long_yield$yield(1000) total_mn <- total_mn + sum(as_factor(cur_data$STATEFIP) == "Minnesota") } total_mn # Can also read hierarchical data as list: list_yield <- read_ipums_micro_list_yield(ipums_example("cps_00006.xml")) list_yield$yield(10)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.