The goal of stepfinder
is to detect a step in one dimensional data.
This package is yet to be released on CRAN. In the future, you will be able to install the released version of stepfinder from CRAN with:
install.packages("stepfinder")
For now, install the development version from GitHub with:
# install.packages("devtools")
devtools::install_github("matiasandina/stepfinder")
This package contains a semi-automated pipeline. This means the user will be prompted several times and should be familiar with the pipeline.
Although these functions should work for any type of 1 dimensional data, there are a good number of references to animals. This is because the package was developed for detection of steps during tracking of animal positions.
library(stepfinder)
#> Loading required package: ggplot2
#> Loading required package: dplyr
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
Minimally, you would have a data.frame
with 3 columns frameID
, x
,
y
. The first idea is to inspect the detections. The package provides
examples under data
.
# load data from github
# df <- read.csv("https://raw.githubusercontent.com/matiasandina/stepfinder/master/data/sample_detection.csv")
# or from package .Rdata
head(sample_detection)
#> frameID x y
#> 1 1 93 111
#> 2 2 95 111
#> 3 3 96 111
#> 4 4 95 111
#> 5 5 95 111
#> 6 6 95 111
diagnostic <- diagnose_detection(sample_detection)
If your data does not have an
id
column, random IDs will be assigned on each run ofdiagnose_detection()
.
No id found, assigning random id.
Internal id assignment is not reproducible, if IDs matter assign IDs beforehand!
You are prompted to see plots and diagnose whether detections were correct.
Press [enter] to see animal path: >
And to see the velocity plots.
Press [enter] to see diagnostics: >
Finally, you have to answer whether the detection was good or bad.
Detection was correct [Yy/Nn]: >
Here’s an example of a wrong detection (Prompts not included).
# read data from GitHub
df_wrong <- read.csv("https://raw.githubusercontent.com/matiasandina/stepfinder/master/data/sample_wrong_detection.csv")
# or just from package .Rdata
diagnostic <- diagnose_detection(sample_wrong_detection)
The path does not make it clear that there were wrong detections. However, the errors are evident in velocity plots.
These plots also give a good estimate for the velocity we will set up as
threshold for bad detections (see v_thresh
in ?fix_detection_jumps
).
The idea is to do this process for multiple detections and later filter out those that need to be fixed.
You can easily run multiple detections wrapping the diagnose_detection
in an lapply
call. This applies to all the functions of the package.
# Make list of dfs
list_of_df <- list(df, df_wrong)
lapply(list_of_df, function(t) diagnose_detection(t))
The workhorse for fixing detections is fix_detection_jumps
. This
function will print a good amount of info about possible wrong
detections (those xy
with abs(diff(xy)) > v_thresh
). Because we are
looking for steps, every bad detection should have a companion. Thus,
fix_detection_jumps
will call cluster_candidate_list
to attempt to
cluster possible bad detections in pairs. Later on, it will prompt the
user for proper removal and interpolation of steps. See
?fix_detection_jumps
. Basic example below.
The default uses derivatives to find the possible candidates and
convolution to find whether there’s a step around the candidates.
Convolution is implemented through find_step()
.
fixed_data <- fix_detection_jumps(sample_wrong_detection)
Info will be printed for x
and y
, only x
is presented below.
We found 4 possible candidates...
[1] "This clusters were found..."
[1] 1 1 2
positions x_clust diff_val
1 320 1 249
2 401 1 -279
3 16483 2 -30
We are looking for detections that have high difference (in this case diff_val ~ 250), opposite sign, and come in pairs. Positions 320 and 401 look like an actual step, the other one looks like a genuine high velocity.
Inspecting positions in x
Press [enter] to see velocity plots: >
We can see that the spikes in velocity are artificial. We could skip removal (1) but would like to remove them (2).
Diagnose detection:
Good Detection --> Keep or Bad Detection --> Remove?? [(1/2)]: >
Using convolution to find step.
Derivate goes positive to negative,
Prediction is step-up
Analyzing position close to bad detections
Finally, the user will still have the last say whether to replace data or not.
Are you happy with interpolation [Yy/Nn]? : >
If you liked the interpolation (and entered ‘y’), you will see:
Modifying data...
Sometimes, convolution can’t detect steps. In that case, we can just try to use the candidates from derivatives. This works most of the time and might be the default behavior in next versions of the package.
fixed_detections <- fix_detection_jumps(sample_wrong_detection, use_convolution = FALSE)
Procedure is almost identical to before, but you will see.
No convolution.
Finding steps from derivatives
Analyzing position close to bad detections
Sometimes, you have to go full manual.
fix_detection_jumps(df_wrong, manual_removal = TRUE)
We found 4 possible candidates...
Entering manual mode....
Analyze x
Diagnose detection:
Good Detection --> Keep or Bad Detection --> Remove?? [(1/2)]: >
When you try to remove (2), you will see:
Select range from possible candidates.
320 401 16483
We know that 401 is the one we need to select (the length of the
data.frame
is included in the candidates for the very special case in
which a step is detected until the end of the data).
Analyzing position close to bad detections
Are you happy with interpolation [Yy/Nn]? : >
This is a preliminary release. Please file issues to make this software work better.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.