rbindFiles: Combine a sequence of files by rows

View source: R/rbindFiles.R

rbindFilesR Documentation

Combine a sequence of files by rows

Description

Takes a sequence of files and combines them by rows, without reading the full files into memory. This is especially useful when dealing with large datasets, where the reading of entire files may be time consuming and require a large amount of memory.

Usage

rbindFiles(infiles, outfile, col.sep, header = FALSE, ask = TRUE, 
verbose = FALSE, add.file.number = FALSE, blank.lines.skip = FALSE)

Arguments

infiles

A character vector of names (and paths) of the files to combine.

outfile

A character string giving the name of the modified file. The name of the file is relative to the current working directory, unless the file name contains a definite path.

col.sep

Specifies the separator used to split the columns in the files. To split at all types of spaces or blank characters, set col.sep = "[[:space:]]" or col.sep = "[[:blank:]]".

header

A logical variable which indicates if the first line in each file contains the names of the variables. If "TRUE", outfile will display this header in its first row, assuming the headers for each file are identical. Equals FALSE by default, i.e. no headers assumed.

ask

Logical. Default is "TRUE". If set to "FALSE", an already existing outfile will be overwritten without asking.

verbose

Logical. Default is "TRUE", which means that the line number is displayed for each iteration, i.e. each combined line.

add.file.number

A logical variable which equals "FALSE" by deafult. If "TRUE", an extra first column will be added to the outfile, consisting of the file numbers for each line.

blank.lines.skip

Logical. If "TRUE" (default), lineByLine ignores blank lines in the input.

Details

The function rbind combines R objects by rows. However, reading large data files may require a large amount of memory and be extremely time consuming. rbindFiles avoids reading the full files into memory. It reads the files line by line, possibly modifies each line, then writes to outfile. If however, header, verbose, add.file.number and blank.lines.skip are all set to "FALSE" (their default values), the files are appended directly, thus evading line-by-line modifications. In the case where infiles contains only one file and no output or modifications are requested (verbose, add.file.number and blank.lines.skip equal "FALSE"), an identical copy of this file is made.

Value

There is no useful output; the objective of rbindFiles is to produce outfile.

Note

Combining the files by reading each file line by line is less time efficient than appending the files directly. For this reason, if header = FALSE, changing the values of the logical variables verbose, add.file.number and blank.lines.skip from "FALSE" to "TRUE" should not be done unless absolutely necessary.

Author(s)

Miriam Gjerdevik,
with Hakon K. Gjessing
Professor of Biostatistics
Division of Epidemiology
Norwegian Institute of Public Health
hakon.gjessing@uib.no

References

Web Site: https://haplin.bitbucket.io

See Also

cbindFiles, lineByLine

Examples

## Not run: 

# Combines the three infiles, by rows
rbindFiles(file.names = c("myfile1.txt", "myfile2.txt", "myfile3.txt"), 
outfile = "myfile_combined_by_rows.txt", col.sep = " ", header = TRUE, verbose = TRUE)


## End(Not run)

Haplin documentation built on Sept. 11, 2024, 7:13 p.m.