sim_data: Randomly generate data for internal testing of 'auto_rate()"s...

View source: R/sim_data.R

sim_dataR Documentation

Randomly generate data for internal testing of auto_rate()'s linear method.

Description

Generate data of size len that is coerced to mimic common respirometry data. This is an internal function not intending for public use, though may prove of interest or utility. We may modify this function at any time, which may irreversibly change the outputs. This function was first created to test auto_rate() using the other internal function, test_lin(), but we decided to publish the code as it is an effective (and visually-appealing) tool for teaching, testing and visualising purposes. This function is by no means comprehensive and we encourage users to generate data that suit their own unique situations.

Usage

sim_data(len = 300, type = "default", sd = 0.05, preview = TRUE)

Arguments

len

numeric. Defaults at 300. Number of observations in the dataset.

type

character. What kind of data should the function generate? Available for use: "default", "corrupted" and "segmented".

sd

numeric. Defaults at 0.05. This is the amount of noise to add to the system, randomly generated as a standard deviation based on the entire data set.

preview

logical. Defaults to TRUE. Plots the generated data as an xy.

Details

sim_data() creates 3 types of data that we think are common in respirometry or oxygen flux data. The data types can be selected using the type input:

  • "default": data is made up of a linear segment of known length and slope, with a non-linear segment, generated by a sine or cosine function depending on whether the slope is positive or negative, appended to the beginning of the data. The shape of the dataset is designed to mimic many similar data whereby the initial sections of the data are often non-linear. Here the slope is randomly generated using rnorm(1, 0, 0.025), the length of the inital segment randomly generated using floor(abs(rnorm(1, .25*len, .05*len))) where len is the total number of observations in the data, and the amplitude of the segment also randomly generated using rnorm(1, .8, .05).

  • "corrupted": same as "default", but "corrupted" data is inserted randomly at any point in the linear segment. The data corruption is chosen as a sudden dip in the reading, which recovers. This event mimics equipment interference that sometimes happens, but does not necessarily invalidate the dataset if the corrupted section is omitted from analysis, The dip is generated by a cosine function of fixed amplitude of 1, and the length is randomly generated.

  • "segmented": same as "default", but the data is modified to contain two linear segments. The slope of the second linear segment is randomly picked at between 0.5 and 0.6 of the first linear segment. Its length is also randomly generated but always smaller than the first segment.

Normally-distributed noise is added to the dataset to add variation, which can be modified using the "sd" input.

Value

A list containing the dataframe, the slope and the length of the linear section of the data, to be used for analysis in the function test_lin().

See Also

test_lin()

Examples

# Generate data of length 200
sim_data(len = 200)

# Generate data that contains a "corruption"
sim_data(type = "corrupted")

# Generate noisy data
sim_data(type = "segmented", sd = .2)

# Generate "perfect" non-noisy data
sim_data(sd = 0)

januarharianto/respR documentation built on April 20, 2024, 4:34 p.m.