read_dir: Read and Merge Files from Directory

View source: R/read_dir.R

read_dirR Documentation

Read and Merge Files from Directory

Description

Reads data files from any given directory as data frames and merges them into a single data frame (using data.table::rbindlist).

Usage

read_dir(
  pattern = "*[.]",
  path = ".",
  reader_function = data.table::fread,
  ...,
  subdirs = FALSE,
  filt = NULL,
  hush = FALSE
)

Arguments

pattern

Regular expression ("regex"; as string or NULL) for selecting files (passed to the list.files function). The default NULL means that all files at the specified path will be read in. To select, for example, a specific extension like ".txt", the pattern can be given as "\.txt$" (for CSV files, "\.csv$", etc.). Files ending with e.g. "group2.txt" can be specified as "group2\.txt$". Files starting with "exp3" can be specified as "^exp3". Files starting with "exp3" AND ending with ".txt" extension can be specified as "^exp3.*\.txt$". To read in a single file, specify the full filename (e.g. "exp3_subject46_group2.txt"). (See ?regex for more details.)

path

Path to the directory from which the files should be selected and read. The default "." means the current working directory (as returned by getwd()). Either specify correct working directory in advance (see setwd, path_neat), or otherwise enter relative or full paths (e.g. "C:/research" or "/home/projects", etc.).

reader_function

A function to be used for reading the files, data.table::fread by default.

...

Any arguments to be passed on to the chosen reader_function.

subdirs

Logical (FALSE by default). If TRUE, searches files in subdirectories as well (relative to the given path).

filt

An expression to filter, by column values, each data file after it is read and before it is merged with the other data. (The expression should use column names alone; see Examples.)

hush

Logical. If FALSE (default), prints lists all data file names as they are being read (along with related warnings).

Note

This function is very similar to the readbulk::read_bulk function. One important difference however is the data.table use, which greatly speeds up the process. Another important difference is the possibility of file selection based on any regex pattern. Furthermore, this function allows pre-filtering by file (see filt). Data files could include significant amount of unnecessary data, and filtering prevents these to be merged.

See Also

data.table::rbindlist

Examples



# first, set current working directory
# e.g. to script's path with setwd(path_neat())

# read all text files in currect working directory
merged_df = read_dir("\\.txt$")
# merged_df now has all data

# to use utils::read.table for reading (slower than fread)
# (with some advisable options passed to it)
merged_df = read_dir(
    '\\.txt$',
    reader_function = read.table,
    header = TRUE,
    fill = TRUE,
    quote = "\"",
    stringsAsFactors = FALSE
)



neatStats documentation built on Dec. 8, 2022, 1:13 a.m.