1 - Data Preparation"
In swaRmverse: Swarm Space Creation

1.1 Input data - trackdf

The swaRmverse package uses the trackdf package to standardize the input dataset. Data are expected to be trajectories (id, x, y, t) generated by GPS or video tracking. First, lets load some data from trackdf:

library(swaRmverse)

raw <- read.csv(system.file("extdata/video/01.csv", package = "trackdf"))
raw <- raw[!raw$ignore, ]
head(raw)

1.2 Transform data

trackdf takes as input a vector for each positional time series (x,y) along with an vector of ids and time. Time will be transformed to date-time POSIXct format. Without additional information, the package uses UTC as timezone, current time as the origin of the experiment, and 1 second as the sampling step (time between observations). If your t column corresponds to real time (and not frames or sampling steps, e.g., c(1, 2, 3, 4)), then the period doesn't have to be specified. For more details, see https://swarm-lab.github.io/trackdf/index.html. For now, let's specify these attributes and create our main dataset (as a dataframe):

data_df <- set_data_format(raw_x = raw$x,
                          raw_y = raw$y,
                          raw_t = raw$frame,
                          raw_id = raw$track_fixed,
                          origin = "2020-02-1 12:00:21",
                          period = "0.04S",
                          tz = "America/New_York"
                          )

head(data_df)

You can now notice that a 'set' column is added to the dataset. swaRmverse is using this column as the main unit for grouping the tracks into separate events. By default, the day of data collection is used.

1.3 Multi-species or multi-context data

As mentioned above, swaRmverse uses the date as a default data organization unit. However, if several separate observations are conducted in the same day, or an additional label on the data is needed, such as context or species, additional information can be given to the \code{set_data_format} function. For instance, let's assume that data from 2 different contexts exist in the data set:

# dummy column
raw$context <- c(rep("ctx1", nrow(raw) / 2), rep("ctx2", nrow(raw) / 2))

We can give any additional vector to the function and it will be combined with the date column as a set:

data_df <- set_data_format(raw_x = raw$x,
                          raw_y = raw$y,
                          raw_t = raw$frame,
                          raw_id = raw$track_fixed,
                          origin = "2020-02-1 12:00:21",
                          period = "0.04 seconds",
                          tz = "America/New_York",
                          raw_context = raw$context
                          )

head(data_df)