strr_ghost: Function to identify STR ghost hostels

View source: R/strr_ghost.R

strr_ghostR Documentation

Function to identify STR ghost hostels

Description

strr_ghost takes reported STR listing locations and identifies possible "ghost hostels"–clusters of private-room STR listings operating in a single building.

Usage

strr_ghost(
  property,
  start_date = NULL,
  end_date = NULL,
  property_ID = property_ID,
  host_ID = host_ID,
  multi_date = TRUE,
  created = created,
  scraped = scraped,
  distance = 205,
  min_listings = 3,
  listing_type = listing_type,
  private_room = "Private room",
  EH_check = FALSE,
  entire_home = "Entire home/apt",
  geom_type = c("point", "polygon"),
  quiet = FALSE
)

Arguments

property

A data frame of STR listings with sf or sp point geometries in a projected coordinate system. If the data frame does not have spatial attributes, an attempt will be made to convert it to sf using strr_as_sf. The result will be transformed into the Web Mercator projection (EPSG: 3857) for distance calculations. To use a projection more suitable to the data, supply an sf or sp object.

start_date

A character string of format YYYY-MM-DD indicating the first date for which to run the analysis. If NULL (default), all dates will be used. This argument is ignored if 'multi_date' is FALSE.

end_date

A character string of format YYYY-MM-DD indicating the last date for which to run the analysis. If NULL (default), all dates will be used. This argument is ignored if 'multi_date' is FALSE.

property_ID

The name of a character or numeric variable in the property object which uniquely identifies STR listings.

host_ID

The name of a character or numeric variable in the property object which uniquely identifies STR hosts.

multi_date

A logical scalar. Should the analysis be run for separate dates (controlled by the 'created', 'scraped', 'start_date' and 'end_date' arguments), or only run a single time, treating all listings as simultaneously active?

created

The name of a date variable in the property object which gives the creation date for each listing. This argument is ignored if 'multi_date' is FALSE.

scraped

The name of a date variable in the property object which gives the last-scraped date for each listing. This argument is ignored if 'multi_date' is FALSE.

distance

A numeric scalar. The radius (in the units of the CRS) of the buffer which will be drawn around points to determine possible ghost hostel locations.

min_listings

A numeric scalar. The minimum number of listings to be considered a ghost hostel.

listing_type

The name of a character variable in the property object which identifies private-room listings. Set this argument to FALSE to use all listings in the 'property' table.

private_room

A character string which identifies the value of the 'listing_type' variable to be used to find ghost hostels. This field is ignored if 'listing_type' is FALSE.

EH_check

A logical scalar. Should ghost hostels be checked against possible duplicate entire-home listings operated by the same host? This field is ignored if 'listing_type' is FALSE.

entire_home

A character string which identifies the value of the 'listing_type' variable to be used to find possible duplicate entire-home listings. This field is ignored if 'listing_type' or 'EH_check' are FALSE.

quiet

A logical scalar. Should the function execute quietly, or should it return status updates throughout the function (default)?

geom_type.

A character string, either "point" or "polygon", which identifies the type of geometry which should be appended to the function output. Point geometries will be calculated faster than polygon geometries, and will require less memory.

Details

A function for identifying clusters of possible "ghost hostels"–clusters of private-room STR listings operating in a single building. The function works by intersecting the possible locations of listings operated by a single host with each other, to find areas which could the common location of the listings, and thus be one or more housing units subdivided into private rooms rather than a set of geographically disparate listings. The function can optionally run its analysis separately for each date within a time period, and can also check for possible duplication with entire-home listings operated by the same host.

Value

The output will be a tidy data frame of identified ghost hostels, organized with the following fields: 'ghost_ID': an identifier for each unique ghost hostel cluster. 'date': the date on which the ghost hostel was detected, if the 'created' and 'scraped' arguments are supplied. 'host_ID' (or whatever name was passed to the host_ID argument): The ID number of the host operating the ghost hostel. 'listing_count': how many separate listings comprised the ghost hostel. 'housing_units': an estimate of how many housing units the ghost hostel occupies, calculated as 'ceiling(listing_count / 4)'. 'property_IDs': A list of the property_ID (or whatever name was passed to the property_ID argument) values from the listings comprising the ghost hostel. 'EH_check': if EH_check is not NULL, a list of possible entire-home listing duplicates. 'data': a nested tibble of additional variables present in the property object. 'geometry': the polygons representing the possible locations of each ghost hostel.


UPGo-McGill/strr documentation built on Feb. 24, 2024, 6:15 p.m.