View source: R/mt_filter_unique.R
mt_filter_unique | R Documentation |
move2
objectmt_filter_unique
: returns a move2
from which duplicated records have been removed
mt_unique
: returns a logical vector indicating the unique records
By default columns that have a duplicated timestamps and track identifier are filtered
mt_filter_unique(x, ...)
mt_unique(
x,
criterion = c("subsets", "subsets_equal", "sample", "first", "last"),
additional_columns = NULL,
...
)
x |
The |
... |
Arguments passed on to the |
criterion |
The criterion to decide what records to filter out. For more information see Details below. |
additional_columns |
In some cases different sensors or tracking devices might have the same combination of time and track identifier. It might, for example, be desirable to retain records from an accelerometer and gps recorded at the same time. This argument can be used to indicate additional column to include in the grouping within which the records should not be duplicated. See the examples below for its usage. |
To make an informed choice of how to remove duplicates, we recommend to first try to understand why the data set has duplicates.
Several methods for filtering duplicates are available the options can be controlled through the criterion
argument:
"subsets"
: Only records that are a subset of other records are omitted.
Some tracking devices first transmit an smaller dataset that does not contain all information, therefore some
records may be the same as others only containing additional NA
values.
This strategy only omits those (duplicated) records. As a result duplicates that contain unique information are
retained, the dataset is thus not guaranteed to not have unique records afterwards.
"subsets_equal"
: The same as "subsets"
however not exact equivalence is tested using base::identical()
but
rather base::all.equal()
is used. This makes it possible to allow for small numeric differences to be considered
equal. This can however reduce speed considerably.
"sample"
: In this case one record is randomly selected from the duplicated records.
"first"
: Select the first location from a set of duplicated locations. Note that reordering the data will affect
which record is selected. For movebank data no specific order is enforced, ensure that the order of the locations is like you expect (same goes for "last"
).
"last"
: Select the last location from a set of duplicated locations.
mt_unique
returns a logical vector indicating the unique records.
mt_filter_unique
returns a filtered move2
object
Other filter:
mt_filter_movebank_visible()
,
mt_filter_per_interval()
m <- mt_sim_brownian_motion(1:2)[rep(1:4, 4), ]
m$sensor_type <- as.character(gl(2, 4))
m$sensor_type_2 <- as.character(gl(2, 8))
table(mt_unique(m, "sample"))
mt_filter_unique(m[, c("time", "track", "geometry")])
mt_filter_unique(m[, c("time", "track", "geometry", "sensor_type")],
additional_columns = sensor_type
)
if (requireNamespace("dplyr")) {
mt_filter_unique(m, additional_columns = across(all_of(c("sensor_type", "sensor_type_2"))))
}
mt_filter_unique(m, "sample")
mt_filter_unique(m, "first")
m$sensor_type[1:12] <- NA
mt_filter_unique(m[, c("time", "track", "geometry", "sensor_type")])
## Sometimes it is desirable to not consider specific columns for finding
## the unique records. For example the record identifier like `event_id`
## in movebank This can be done by reducing the data.frame used to identify
## the unique records e.g.:
m$event_id <- seq_len(nrow(m))
m[mt_unique(m |> dplyr::select(-event_id, -ends_with("type_2"))), ]
## Note that because we subset the full original data.frame the
## columns are not lost
## This example is to retain the duplicate entry which contains the least
## number of columns with NA values
require(dplyr)
mv <- mt_read(mt_example())
mv <- dplyr::bind_rows(mv, mv[1:10, ])
mv[, "eobs:used-time-to-get-fix"] <- NA
mv_no_dup <- mv %>%
mutate(n_na = rowSums(is.na(pick(everything())))) %>%
arrange(n_na) %>%
mt_filter_unique(criterion = "first") %>%
arrange(mt_track_id()) %>%
arrange(mt_track_id(), mt_time())
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.