# rmarkdown::render("README.Rmd")
knitr::opts_chunk$set(
  echo = TRUE,
  collapse = TRUE,
  comment = "#>",
  fig.path = "README/",
  cache = FALSE)

Introduction

multiline is an R package for reading data from multiline fixed-width-formatted (FWF) files. This format is like that of typical FWF files, except that data for a given observation wraps after some number of columns to span a fixed number of rows.

Digitized punch card data are often found in multiline FWF format. If data for each observation exceeded the horizontal space on a card (conventionally 80 columns), additional decks of cards were used. When digitized, their rows were were often interleaved so that data for each observation would appear in consecutive rows, one for each card.

Installation

Install from GitHub with devtools:

if (!require(devtools, quietly = TRUE)) install.packages("devtools")
devtools::install_github("jamesdunham/multiline")

Background

Consider the following multiline FWF (MFWF) data. As with FWF data, parsing requires the column positions of each field (ie, variable). But furthermore, we need the line position of each field.

123456789
789      
987654321
987      

Parsing requires:

Suppose there are 2 lines per observation in the data; field1 occupies columns 1-4 of line 1; field2 columns 5-9 of line 1; and field3 columns 1-3 of line 2.

123456789  [line 1, obs. 1]
789        [line 2, obs. 1]
987654321  [line 1, obs. 2]
987        [line 2, obs. 2]

The purpose of multiline is reading this data into a tidy table:

obs field 1  field 2  field 3
  1    1234    56789      789
  2    9876    54321      987

Usage

Specify the column and line positions of each field in a table or list of tables. multiline imports the fwf_ functions from readr to help with this task.

As a list:

positions <- list(
  fwf_positions(start = c(1, 5), end = c(4, 9), col_names = c('field1', 'field2')),
  fwf_positions(start = 1, end = 3, col_names = 'field3'))
positions

The line position of each field is implicit in the list order. Here, field1 and field2 are in line 1 and field3 is in line 2.

Given the data:

d <- "123456789\n789\n987654321\n9871"
d

read_multiline() returns a tidy table with observations in rows and fields in columns. Note that read_multiline() requires that the number of items in the list of positions exactly match the number of lines in the MFWF.

tidy <- read_multiline(d, lines = 2, positions)
tidy


jamesdunham/multiline documentation built on May 20, 2019, 2:25 p.m.