source("setup.R")

This is a quick tutorial covering a couple functions you can use to clean and fix problems in your data.

These functions can be skipped, or used one after the other. It's up to you. Basically, they are wrapper functions for cleaning your data while maintaining appropriate categorical (factor) levels.

This tutorial assumes that you already know how to load/import your data. We will be using the feedr package example data set: finches.

head(finches)

check_ids()

This function can be used to remove any animal_ids that are present, but which you know aren't really animals. For example, if you use a 'wand' to test the deployment of your loggers, this is an animal_id that you should remove prior to analysis. Further, occasionally there are animal_ids that are error codes (e.g. 0000000000), you may wish to determine why these are present (you probably should!), but once again, for analysis they should be removed.

This function works by comparing the list of animal_ids in the data to an external, animal_id data set. The data set is expected to have at least two columns: animal_id and species. The species column should either contain species identity (e.g. House Finch or HOFI) or the error code (e.g. wand, or error).

In addition to removing error or wand ids, this function will also report which animal_ids are in your data sets, but not in your animal_id index, and which are in your animal_id index, but not in your data sets. This will help you determine whether you are detecting RFID tags that are not in your master index (weird!) and whether some RFID tags have never been recorded by a logger.

With no error/wand ids

Let's load an animal index file. Note that there are no errors or wands ids.

animal_index <- read.csv("./Data/animal_index.csv")
animal_index

You need to give this function a data set, and it will return a cleaned data set. Here we'll save it as r_clean:

r_clean <- check_ids(finches, ids = animal_index)

This output shows that all the animal_ids in the data are also in the index and vice versa. Further, there were no omitted ids (error or wand ids).

Note: You can also skip loading the index and simply provide check_ids() with the location of the index file:

r_clean <- check_ids(finches, ids = "./Data/animal_index.csv")

With error/wand ids

Let's see how it works if you did have a 'wand' or 'error' code in your index file that matched a animal_id in your data set.

animal_index$species <- as.character(animal_index$species)
animal_index$species[1] <- "wand"
animal_index$species[2] <- "error"
animal_index
r_clean <- check_ids(finches, ids = animal_index)

Here we omitted two animal_ids, one associated with a wand (0620000514) and one with an error (041868D861).

Note that nothing else changed.

animal_ids present in data set but not in the index

animal_index <- animal_index[-5, ]
animal_index
r_clean <- check_ids(finches, ids = animal_index)

check_problems()

This function is only necessary if, for some reason, you're getting errors in the recorded animal_ids.

This function will correct all instances of an animal_id according to the list provided.

problems <- read.csv("./Data/problems.csv")
problems

Original animal_ids:

finches$animal_id[1:5]

Fix problems and new animal_ids:

r_clean <- check_problems(finches, problems = problems)
r_clean$animal_id[1:5]

Note that the animal_ids have been modified, but also that the factor levels have been updated to match.

Now that your data has been cleaned of erroneous or problematic data, it is ready to be transformed.


Back to top
Go back to home | Go back to loading/importing data | Continue with transformations



steffilazerte/feedr documentation built on Jan. 27, 2023, 3:46 a.m.