read_ascii | R Documentation |
read_ascii
helps format ASCII data files downloaded from the Roper Center.
read_ascii(
file,
total_cards = 1,
var_names,
var_cards = 1,
var_positions,
var_widths,
card_pattern,
respondent_pattern
)
file |
A path to an ASCII data file. |
total_cards |
For multicard files, the number of cards in the file. |
var_names |
A string vector of variable names. |
var_cards |
For multicard files, a numeric vector of the cards on which |
var_positions |
A numeric vector of the column positions in which |
var_widths |
A numeric vector of the widths used to record |
card_pattern |
For use when the file does not contain a line for every card for every respondent (or contains extra lines that correspond to no respondent), a regular expression that matches the file's card identifier; e.g., if the card number is stored in the last digit of each line, "\d$". |
respondent_pattern |
For use when the file does not contain a line for every card for every respondent (or contains extra lines that correspond to no respondent), a regular expression that matches the file's respondent identifier; e.g., if the respondent number is stored in the first four digits of each line, preceded by a space, "(?<=^\s)\d4". |
Many older Roper Center datasets are available only in ASCII format, which is notoriously difficult to work with. The 'read_ascii' function facilitates the process of extracting selected variables from ASCII datasets. For single-card files, one can simply identify the names, positions, and widths of the needed variables from the codebook and pass them to read_ascii
's var_names
, var_positions
, and var_widths
arguments. Multicard datasets are more complicated. In the best case, the file contains one line per card per respondent; then, the user can extract the needed variables by adding only the var_cards
and total_cards
arguments. When this condition is violated—there is not a line for every card for every respondent, or there are extra lines—the function will throw an error and request the user specify the additional arguments card_pattern
and respondent_pattern
.
A data frame containing any variables specified in the var_names
argument, plus a numeric respondent
identifier and as many string card
variables (card1
, card2
, ...) as specified by the total_cards
argument.
## Not run:
# a single-card file
roper_download("USAIPO1982-1197G", # Gallup Poll for June 25-28, 1982
download_dir = tempdir()) # remember to specify a directory for your download
gallup1982 <- read_ascii(file = file.path(tempdir(), "USAIPO1982-1197G",
"1197.dat"),
var_names = c("q09j", "weight"),
var_positions = c(38, 1),
var_widths = c(1, 1))
# a multi-card file, with extra lines that make the card_pattern and
respondent_pattern arguments necessary
roper_download("USAIPOCNUS1996-9603008", # Gallup/CNN/USA Today Poll: Politics/1996 Election
download_dir = tempdir()) # remember to specify a directory for your download
gallup1996 <- read_ascii(file = file.path(tempdir(), "USAIPOCNUS1996-9603008",
"a9603008.dat"),
var_names = c("q43a", "q44", "weight"),
var_cards = c(6, 6, 1),
var_positions = c(62, 64, 13),
var_widths = c(1, 1, 3),
total_cards = 7,
card_pattern = "(?<=^.{10})\\d",
# (a digit, preceded by the start of the line
# and ten other characters)
respondent_pattern = "(?<=^\\s{2})\\d{4}")
# (# four digits, preceded by the start of the line
# and two whitespace characters)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.