step_geodist | R Documentation |
step_geodist()
creates a specification of a recipe step that will
calculate the distance between points on a map to a reference location.
step_geodist(
recipe,
lat = NULL,
lon = NULL,
role = "predictor",
trained = FALSE,
ref_lat = NULL,
ref_lon = NULL,
is_lat_lon = TRUE,
log = FALSE,
name = "geo_dist",
columns = NULL,
keep_original_cols = TRUE,
skip = FALSE,
id = rand_id("geodist")
)
recipe |
A recipe object. The step will be added to the sequence of operations for this recipe. |
lon , lat |
Selector functions to choose which variables are
used by the step. See |
role |
For model terms created by this step, what analysis role should they be assigned? By default, the new columns created by this step from the original variables will be used as predictors in a model. |
trained |
A logical to indicate if the quantities for preprocessing have been estimated. |
ref_lon , ref_lat |
Single numeric values for the location of the reference point. |
is_lat_lon |
A logical: Are coordinates in latitude and longitude? If
|
log |
A logical: should the distance be transformed by the natural log function? |
name |
A single character value to use for the new predictor column. If a column exists with this name, an error is issued. |
columns |
A character string of the selected variable names. This field
is a placeholder and will be populated once |
keep_original_cols |
A logical to keep the original variables in the
output. Defaults to |
skip |
A logical. Should the step be skipped when the
recipe is baked by |
id |
A character string that is unique to this step to identify it. |
step_geodist
uses the Pythagorean theorem to calculate Euclidean
distances if is_lat_lon
is FALSE. If is_lat_lon
is TRUE, the Haversine
formula is used to calculate the great-circle distance in meters.
An updated version of recipe
with the new step added to the
sequence of any existing operations.
When you tidy()
this step, a tibble is returned with
columns latitude
, longitude
, ref_latitude
, ref_longitude
,
is_lat_lon
, name
, and id
:
character, name of latitude variable
character, name of longitude variable
numeric, location of latitude reference point
numeric, location of longitude reference point
character, the summary function name
character, name of resulting variable
character, id of this step
The underlying operation does not allow for case weights.
https://en.wikipedia.org/wiki/Haversine_formula
Other multivariate transformation steps:
step_classdist()
,
step_classdist_shrunken()
,
step_depth()
,
step_ica()
,
step_isomap()
,
step_kpca()
,
step_kpca_poly()
,
step_kpca_rbf()
,
step_mutate_at()
,
step_nnmf()
,
step_nnmf_sparse()
,
step_pca()
,
step_pls()
,
step_ratio()
,
step_spatialsign()
data(Smithsonian, package = "modeldata")
# How close are the museums to Union Station?
near_station <- recipe(~., data = Smithsonian) %>%
update_role(name, new_role = "location") %>%
step_geodist(
lat = latitude, lon = longitude, log = FALSE,
ref_lat = 38.8986312, ref_lon = -77.0062457,
is_lat_lon = TRUE
) %>%
prep(training = Smithsonian)
bake(near_station, new_data = NULL) %>%
arrange(geo_dist)
tidy(near_station, number = 1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.