parseScript_SPSS: Parse SPSS Syntax Script for Fixed-Width Data Files

View source: R/parseScript_SPSS.R

parseScript_SPSSR Documentation

Parse SPSS Syntax Script for Fixed-Width Data Files

Description

Parses an SPSS Syntax Script (.sps) file to return information relating to fixed-width data files.

Usage

parseScript_SPSS(
  spsFilePath,
  verbose = FALSE,
  outputFormat = c("data.frame"),
  encoding = getOption("encoding")
)

Arguments

spsFilePath

a character value of the file path to the SPSS script to parse.

verbose

a logic value to indicate if user wishes to print parsing activity to console. Default value is FALSE.

outputFormat

a named argument to indicate which output format the resulting object should be. See details for information on each format. Currently, data.frame format is only supported.

encoding

a character value to indicate the encoding specification that is used by readLines base function for the spsFilePath parameter. Only adjust this parameter if the original file encoding of the file is known, is not producing correct string values, or other errors occur. See ?readLines help for details about it's use for file encoding, and additional details.

Details

NOT CURRENTLY EXPORTED! In Future this could potentially be made to a separate R package THIS parseScript_SPSS function should be used 100 Old/Previous SPSS script parsers should be slowly transitioned to utilize this function when possible to maximize code use.

The SPSS syntax script parser is focused on gathering details for use with fixed-width data files. This function scans for the following SPSS commands:

  • FILE HANDLE

  • DATA LIST

  • VARIABLE LABEL

  • VALUE LABEL

  • MISSING VALUE

The outputFormat specified will determine the result object returned. This function currently supports the following formats.

  • data.frame

    • variableName - The variable name as defined in the script

    • Start - The start number index of the variable defined for the fixed-width format layout

    • End - The end number index of the variable defined for the fixed-width format layout

    • Width - The length of how many columns the variable uses in the fixed-width format layout

    • Attributes - Any SPSS attributes that are defined in the DATA LIST command. This is typically only for field formatting.

    • RecordNumber - Some fixed-width data files are considered "multi-line" where one record of data can span multiple rows in the file. The RecordNumber indicates which line the variable is assigned.

    • Labels - The descriptive label associated with the variable name to give more detail or context.

    • labelValues - For categorical variables a stored value will typically be assigned a longer label/definition. This string identifies these mappings. The '^' symbol is used to delimit each individual label value. Then additionally, the '=' is used to split the value from the left side of the '=' symbol, and the remaining right-hand side of '=' is the text label for that value.

    • dataType - A best-guess of the data type (either 'numeric' or 'character') without actually examining the data-file.

    • missingValues - If a MISSING VALUE clause is included in the script this will list the values that are considered 'Missing'. If multiple values specified, they will be delimited by a ';' (semi-colon) symbol.

Value

returns an object containing information specified by the outputFormat argument.

Author(s)

Tom Fink


EdSurvey documentation built on Nov. 2, 2023, 6:25 p.m.