mergeFiles: Merge text files adding names

View source: R/fileUtils.R

mergeFilesR Documentation

Merge text files adding names

Description

Sequentially concatenates text files adding the source filename to the start of each line. Allows files to have headers if the number of header lines is known and the same in all files. Headers will be taken from the first file, others are ignored other than triggering a warning for each that does not match the first file. The output file path can be specified, or it will be created as a temporary file. Instead of prefixing source filenames, a vector of strings can be specified.

Usage

mergeFiles(
  inFiles,
  outFile = tempfile(pattern = "merged", fileext = ".tmp"),
  names = inFiles,
  delim = "\t",
  headerLines = 0L,
  colName = "FILE",
  keepEmpty = FALSE
)

Arguments

inFiles

REQ The file paths to concatenate.

outFile

The file paths to use for output. Defaults to a temporary file named ⁠<tempdir()>/merged<random>.tmp⁠. If the file already exists, it will be overwritten.

names

The names to prefix to the output lines. By default this will be the inFiles. Must be a vector of the same length as inFiles.

delim

The separator between the prefixed file name column and the source file lines. Defaults to a tab, \t.

headerLines

The number of header lines. Defaults to 0. All files must have the same number of header lines. It is an error if a file has fewer lines than required by this parameter. A warning is generated for each file whose header differ from the first.

colName

The header for the name column, if headerLines > 0. Every header line will have this prefixed, separated by delim. Default is FILE.

keepEmpty

Set TRUE to have empty files treated as if they contained a single empty line. Results in a line in the output file with just the name and delim By default empty files are ignored. For files with headers, empty files are those that contain no lines other than the header (which should end with an EOL character.).

Details

It is possible to just concatenate files without a prefix if all applicable values are set to the empty string (i.e. names, delim, and possibly colName.) A blank line for empty files will be included only if keepEmpty is set TRUE.

Value

Returns the output file name, important if the output is created as an temp file.

Errors

Must specify at least one input file.

If no files are specified, function will exit with error.

Not enough lines in file to have expected header: "file".

There are fewer lines in the file "file" than header lines, so it can't possibly have the same header, let alone any data.

Parameters inFiles= and names= must be vectors of the same length.

Since names are being used instead of file paths in the output file, it does not make sense to allow wrapping here. If you want the same name for multiple files or the same file with multiple names, just include it in the relevant parameter more than once.

Warnings

File headings differ between first file and "file"

If headingLines is set (> 0), that many lines will be read from the first file and used as the heading in the output file. Each following file is then checked to ensure it has the same heading, If the heading does not match, this warning is signaled.

Examples

# Create a couple of temp files to merge
header <- "DESC | THING"
contentA <- c("One | fish,", "two | fish;")
contentB <- c("red | fish,", "blue | fish.")
inFileA <- makeTempFile( c( header, contentA ))
inFileB <- makeTempFile( c( header, contentB ))
empty <- makeTempFile( header )

# Merge files
# tempFile <- mergeFiles( c(inFileA, empty, inFileB),
#                        names= c("A", "B"), headerLines= 1L )
# Error as not matching files to names
tempFile <- mergeFiles( c(inFileA, empty, inFileB),
                        names= c("A", "B", "C"), headerLines= 1L )
readLines(tempFile)
#> [1] "FILE\tDESC | THING"
#> [2] "A\tOne | fish,"
#> [3] "A\ttwo | fish;"
#> [4] "C\tred | fish,"
#> [5] "C\tblue | fish."

tempFile <- mergeFiles(
   c(inFileA, empty, inFileB), names= c("A", "B", "C"), headerLines= 1L,
   colName= 'stanza', delim= ": ", keepEmpty= TRUE )
readLines(tempFile)
#> [1] "stanza: DESC | THING"
#> [2] "A: One | fish,"
#> [3] "A: two | fish;"
#> [4] "B: "
#> [5] "C: red | fish,"
#> [6] "C: blue | fish."


jefferys/JefferysRUtils documentation built on June 18, 2024, 4:39 a.m.