load_raw: Load raw read data

View source: R/load.R

load_rawR Documentation

Load raw read data

Description

Loads raw read data and formats for use with the feedr functions. This is merely a wrapper function that does many things that you can do yourself. It's utility depends on how standardized your data is, and whether you have extra details you need to address.

Usage

load_raw(
  r_file,
  tz = Sys.timezone(),
  tz_disp = NULL,
  dst = FALSE,
  details = 1,
  logger_pattern = NA,
  time_format = "mdy HMS",
  extra_pattern = NULL,
  extra_name = NULL,
  sep = "",
  skip = 0,
  verbose = TRUE,
  feeder_pattern
)

Arguments

r_file

Character. The location of a single file to load.

tz

Character. The time zone the date/times are in (should match one of the zones produced by OlsonNames()). Attempts to use user's system timezone, if none supplied. Defaults to UTC if all else fails.

tz_disp

Character. The time zone the date/times should be displayed in (if not the same as tz; should match one of the zones produced by OlsonNames()). Defaults to tz if none supplied.

dst

Logical. Whether or not to use Daylight Savings. When set to FALSE timezones are converted to the Etc/GMT+X timezones which do not include DST. (Note this overrides the timezone specification such that a timezone of America/Vancouver, which would normally include DST in the summer, will be transformed to a timezone with the same GMT offset, but not including DST).

details

Numeric. Where to find logger details, either 0 (file name), 1 (first line) or 2 (first two lines). See 'details'.

logger_pattern

Character. A regular expression matching the logger id in the file name. NA (default) matches file name (extension omitted) or first line of the file (See the details argument). Alternatively, [GPR]{2,3}[0-9]{1,2} would match the names of TRU loggers.

time_format

Character. The date/time format of the 'date' and 'time' columns combined. Defaults to "mdy HMS". Should be in formats usable by the parse_date_time() function from the lubridate package (e.g., "ymd HMS", "mdy HMS", "dmy HMS", etc.). See details for more information.

extra_pattern

Character vector. A vector of regular expressions matching any extra information in the file or directory names or in the first line of the file.

extra_name

Character vector. A vector of column names matching the order of extra_pattern for storing the results of the pattern.

sep

Character. An override for the separator in the read.table() call (see sep = under ?read.table for more details).

skip

Character. Extra lines to skip in addition to the lines specified by details.

verbose

Logical. Whether to include progress messages or not.

feeder_pattern

Deprecated. Use logger_pattern.

Details

Data is assumed to contain three columns (without column names) corresponding to animal_id, date and time (without date). By default they are expected to be separated by white space, but the sep argument can be modified to reflect other separators, such as comma- or tab-separated data.

The columns date and time will be combined to extract the date/time of each event. Thus, the time_format argument specifies the order of the combined date and time columns and should be in formats usable by the lubridate::parse_date_time() function from the lubridate package (e.g., "ymd HMS", "mdy HMS", "dmy HMS", etc.). For example, the default "mdy HMS" expects a date column in the format of month/day/year and a time column in the format of H:M:S (note that separators and leading zeros are ignored, thus month-day-year is equivalent to month/day/year, see the order argument of the parse_date_time function for more information. More complex formats can also be specified: For example, 09/30/16 2:00 pm can be specified by time_format = "mdy HM p".

Logger details are the logger_id and the lat/lon for the logger. A value of 0 reflects that the logger_id is in the file name, defined by the pattern logger_pattern. A value of 1 reflects that the logger_id is in the first line of the file, also defined by the pattern logger_pattern. A value of 2 reflects that in addition to the logger_id being in the first line ofthe file, the lat/lon information is on the second line, in the format of "latitude, longitude" both in decimal format (spacing doesn't matter, but the comma does).

Examples

## Not run: 
# Load a single raw file:
r <- load_raw("GPR13DATA_2015_12_01.csv")

# Modify logger pattern (match only "GPR13")
r <- load_raw("GPR13DATA_2015_12_01.csv", logger_pattern = "[GPR]{2,3}[0-9]{1,2}")

# Modify logger pattern (match ids like: 2300, 2500, 2550)
r <- load_raw("2300.csv", logger_pattern = "[0-9]{4}")

# Load a file where the logger id is detected as the first line in the file,
not the file name (still use default skip = 1):
r <- load_raw("2016-01-01_09_30.csv", details = 1)

# Note that the following won't work because the pattern matches both the
logger id as well as the year:
r <- load_raw("2300_2015_12_01.csv", logger_pattern = "[0-9]{4}")

# Extract extra data to be stored in another column:
r <- load_raw("2300.csv", extra_pattern = "exp[0-9]{1}", extra_name = experiment)


## End(Not run)

steffilazerte/feedr documentation built on Jan. 27, 2023, 3:46 a.m.