source("setup.R")
This is a quick tutorial covering a couple functions you can use to clean and fix problems in your data.
These functions can be skipped, or used one after the other. It's up to you. Basically, they are wrapper functions for cleaning your data while maintaining appropriate categorical (factor) levels.
This tutorial assumes that you already know how to load/import your data. We will be using the feedr
package example data set: finches.
head(finches)
check_ids()
This function can be used to remove any animal_ids
that are present, but which you know aren't really animals. For example, if you use a 'wand' to test the deployment of your loggers, this is an animal_id
that you should remove prior to analysis. Further, occasionally there are animal_ids
that are error codes (e.g. 0000000000), you may wish to determine why these are present (you probably should!), but once again, for analysis they should be removed.
This function works by comparing the list of animal_ids
in the data to an external, animal_id
data set. The data set is expected to have at least two columns: animal_id
and species
. The species
column should either contain species identity (e.g. House Finch or HOFI) or the error code (e.g. wand, or error).
In addition to removing error or wand ids, this function will also report which animal_id
s are in your data sets, but not in your animal_id
index, and which are in your animal_id
index, but not in your data sets. This will help you determine whether you are detecting RFID tags that are not in your master index (weird!) and whether some RFID tags have never been recorded by a logger.
Let's load an animal index file. Note that there are no errors or wands ids.
animal_index <- read.csv("./Data/animal_index.csv") animal_index
You need to give this function a data set, and it will return a cleaned data set. Here we'll save it as r_clean:
r_clean <- check_ids(finches, ids = animal_index)
This output shows that all the animal_id
s in the data are also in the index and vice versa. Further, there were no omitted ids (error or wand ids).
Note: You can also skip loading the index and simply provide check_ids()
with the location of the index file:
r_clean <- check_ids(finches, ids = "./Data/animal_index.csv")
Let's see how it works if you did have a 'wand' or 'error' code in your index file that matched a animal_id
in your data set.
animal_index$species <- as.character(animal_index$species) animal_index$species[1] <- "wand" animal_index$species[2] <- "error" animal_index
r_clean <- check_ids(finches, ids = animal_index)
Here we omitted two animal_id
s, one associated with a wand (0620000514) and one with an error (041868D861).
Note that nothing else changed.
animal_id
s present in data set but not in the indexanimal_index <- animal_index[-5, ] animal_index
r_clean <- check_ids(finches, ids = animal_index)
check_problems()
This function is only necessary if, for some reason, you're getting errors in the recorded animal_id
s.
This function will correct all instances of an animal_id
according to the list provided.
problems <- read.csv("./Data/problems.csv") problems
Original animal_id
s:
finches$animal_id[1:5]
Fix problems and new animal_id
s:
r_clean <- check_problems(finches, problems = problems) r_clean$animal_id[1:5]
Note that the animal_id
s have been modified, but also that the factor levels have been updated to match.
Now that your data has been cleaned of erroneous or problematic data, it is ready to be transformed.
Back to top
Go back to home | Go back to loading/importing data | Continue with transformations
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.