R/dplyr-funcs-doc.R

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.

# Generated by using data-raw/docgen.R -> do not edit by hand

#' Functions available in Arrow dplyr queries
#'
#' The `arrow` package contains methods for 37 `dplyr` table functions, many of
#' which are "verbs" that do transformations to one or more tables.
#' The package also has mappings of 211 R functions to the corresponding
#' functions in the Arrow compute library. These allow you to write code inside
#' of `dplyr` methods that call R functions, including many in packages like
#' `stringr` and `lubridate`, and they will get translated to Arrow and run
#' on the Arrow query engine (Acero). This document lists all of the mapped
#' functions.
#'
#' # `dplyr` verbs
#'
#' Most verb functions return an `arrow_dplyr_query` object, similar in spirit
#' to a `dbplyr::tbl_lazy`. This means that the verbs do not eagerly evaluate
#' the query on the data. To run the query, call either `compute()`,
#' which returns an `arrow` [Table], or `collect()`, which pulls the resulting
#' Table into an R `tibble`.
#'
#' * [`anti_join()`][dplyr::anti_join()]: the `copy` and `na_matches` arguments are ignored
#' * [`arrange()`][dplyr::arrange()]
#' * [`collapse()`][dplyr::collapse()]
#' * [`collect()`][dplyr::collect()]
#' * [`compute()`][dplyr::compute()]
#' * [`count()`][dplyr::count()]
#' * [`distinct()`][dplyr::distinct()]: `.keep_all = TRUE` not supported
#' * [`explain()`][dplyr::explain()]
#' * [`filter()`][dplyr::filter()]
#' * [`full_join()`][dplyr::full_join()]: the `copy` and `na_matches` arguments are ignored
#' * [`glimpse()`][dplyr::glimpse()]
#' * [`group_by()`][dplyr::group_by()]
#' * [`group_by_drop_default()`][dplyr::group_by_drop_default()]
#' * [`group_vars()`][dplyr::group_vars()]
#' * [`groups()`][dplyr::groups()]
#' * [`inner_join()`][dplyr::inner_join()]: the `copy` and `na_matches` arguments are ignored
#' * [`left_join()`][dplyr::left_join()]: the `copy` and `na_matches` arguments are ignored
#' * [`mutate()`][dplyr::mutate()]: window functions (e.g. things that require aggregation within groups) not currently supported
#' * [`pull()`][dplyr::pull()]: the `name` argument is not supported; returns an R vector by default but this behavior is deprecated and will return an Arrow [ChunkedArray] in a future release. Provide `as_vector = TRUE/FALSE` to control this behavior, or set `options(arrow.pull_as_vector)` globally.
#' * [`relocate()`][dplyr::relocate()]
#' * [`rename()`][dplyr::rename()]
#' * [`rename_with()`][dplyr::rename_with()]
#' * [`right_join()`][dplyr::right_join()]: the `copy` and `na_matches` arguments are ignored
#' * [`select()`][dplyr::select()]
#' * [`semi_join()`][dplyr::semi_join()]: the `copy` and `na_matches` arguments are ignored
#' * [`show_query()`][dplyr::show_query()]
#' * [`slice_head()`][dplyr::slice_head()]: slicing within groups not supported; Arrow datasets do not have row order, so head is non-deterministic; `prop` only supported on queries where `nrow()` is knowable without evaluating
#' * [`slice_max()`][dplyr::slice_max()]: slicing within groups not supported; `with_ties = TRUE` (dplyr default) is not supported; `prop` only supported on queries where `nrow()` is knowable without evaluating
#' * [`slice_min()`][dplyr::slice_min()]: slicing within groups not supported; `with_ties = TRUE` (dplyr default) is not supported; `prop` only supported on queries where `nrow()` is knowable without evaluating
#' * [`slice_sample()`][dplyr::slice_sample()]: slicing within groups not supported; `replace = TRUE` and the `weight_by` argument not supported; `n` only supported on queries where `nrow()` is knowable without evaluating
#' * [`slice_tail()`][dplyr::slice_tail()]: slicing within groups not supported; Arrow datasets do not have row order, so tail is non-deterministic; `prop` only supported on queries where `nrow()` is knowable without evaluating
#' * [`summarise()`][dplyr::summarise()]: window functions not currently supported; arguments `.drop = FALSE` and `.groups = "rowwise" not supported
#' * [`tally()`][dplyr::tally()]
#' * [`transmute()`][dplyr::transmute()]
#' * [`ungroup()`][dplyr::ungroup()]
#' * [`union()`][dplyr::union()]
#' * [`union_all()`][dplyr::union_all()]
#'
#' # Function mappings
#'
#' In the list below, any differences in behavior or support between Acero and
#' the R function are listed. If no notes follow the function name, then you
#' can assume that the function works in Acero just as it does in R.
#'
#' Functions can be called either as `pkg::fun()` or just `fun()`, i.e. both
#' `str_sub()` and `stringr::str_sub()` work.
#'
#' In addition to these functions, you can call any of Arrow's 262 compute
#' functions directly. Arrow has many functions that don't map to an existing R
#' function. In other cases where there is an R function mapping, you can still
#' call the Arrow function directly if you don't want the adaptations that the R
#' mapping has that make Acero behave like R. These functions are listed in the
#' [C++ documentation](https://arrow.apache.org/docs/cpp/compute.html), and
#' in the function registry in R, they are named with an `arrow_` prefix, such
#' as `arrow_ascii_is_decimal`.
#'
#' ## arrow
#'
#' * [`add_filename()`][arrow::add_filename()]
#' * [`cast()`][arrow::cast()]
#'
#' ## base
#'
#' * [`!`][!()]
#' * [`!=`][!=()]
#' * [`%%`][%%()]
#' * [`%/%`][%/%()]
#' * [`%in%`][%in%()]
#' * [`&`][&()]
#' * [`*`][*()]
#' * [`+`][+()]
#' * [`-`][-()]
#' * [`/`][/()]
#' * [`<`][<()]
#' * [`<=`][<=()]
#' * [`==`][==()]
#' * [`>`][>()]
#' * [`>=`][>=()]
#' * [`ISOdate()`][base::ISOdate()]
#' * [`ISOdatetime()`][base::ISOdatetime()]
#' * [`^`][^()]
#' * [`abs()`][base::abs()]
#' * [`acos()`][base::acos()]
#' * [`all()`][base::all()]
#' * [`any()`][base::any()]
#' * [`as.Date()`][base::as.Date()]: Multiple `tryFormats` not supported in Arrow.
#' Consider using the lubridate specialised parsing functions `ymd()`, `ymd()`, etc.
#' * [`as.character()`][base::as.character()]
#' * [`as.difftime()`][base::as.difftime()]: only supports `units = "secs"` (the default)
#' * [`as.double()`][base::as.double()]
#' * [`as.integer()`][base::as.integer()]
#' * [`as.logical()`][base::as.logical()]
#' * [`as.numeric()`][base::as.numeric()]
#' * [`asin()`][base::asin()]
#' * [`ceiling()`][base::ceiling()]
#' * [`cos()`][base::cos()]
#' * [`data.frame()`][base::data.frame()]: `row.names` and `check.rows` arguments not supported;
#' `stringsAsFactors` must be `FALSE`
#' * [`difftime()`][base::difftime()]: only supports `units = "secs"` (the default);
#' `tz` argument not supported
#' * [`endsWith()`][base::endsWith()]
#' * [`exp()`][base::exp()]
#' * [`floor()`][base::floor()]
#' * [`format()`][base::format()]
#' * [`grepl()`][base::grepl()]
#' * [`gsub()`][base::gsub()]
#' * [`ifelse()`][base::ifelse()]
#' * [`is.character()`][base::is.character()]
#' * [`is.double()`][base::is.double()]
#' * [`is.factor()`][base::is.factor()]
#' * [`is.finite()`][base::is.finite()]
#' * [`is.infinite()`][base::is.infinite()]
#' * [`is.integer()`][base::is.integer()]
#' * [`is.list()`][base::is.list()]
#' * [`is.logical()`][base::is.logical()]
#' * [`is.na()`][base::is.na()]
#' * [`is.nan()`][base::is.nan()]
#' * [`is.numeric()`][base::is.numeric()]
#' * [`log()`][base::log()]
#' * [`log10()`][base::log10()]
#' * [`log1p()`][base::log1p()]
#' * [`log2()`][base::log2()]
#' * [`logb()`][base::logb()]
#' * [`max()`][base::max()]
#' * [`mean()`][base::mean()]
#' * [`min()`][base::min()]
#' * [`nchar()`][base::nchar()]: `allowNA = TRUE` and `keepNA = TRUE` not supported
#' * [`paste()`][base::paste()]: the `collapse` argument is not yet supported
#' * [`paste0()`][base::paste0()]: the `collapse` argument is not yet supported
#' * [`pmax()`][base::pmax()]
#' * [`pmin()`][base::pmin()]
#' * [`round()`][base::round()]
#' * [`sign()`][base::sign()]
#' * [`sin()`][base::sin()]
#' * [`sqrt()`][base::sqrt()]
#' * [`startsWith()`][base::startsWith()]
#' * [`strftime()`][base::strftime()]
#' * [`strptime()`][base::strptime()]: accepts a `unit` argument not present in the `base` function.
#' Valid values are "s", "ms" (default), "us", "ns".
#' * [`strrep()`][base::strrep()]
#' * [`strsplit()`][base::strsplit()]
#' * [`sub()`][base::sub()]
#' * [`substr()`][base::substr()]: `start` and `stop` must be length 1
#' * [`substring()`][base::substring()]
#' * [`sum()`][base::sum()]
#' * [`tan()`][base::tan()]
#' * [`tolower()`][base::tolower()]
#' * [`toupper()`][base::toupper()]
#' * [`trunc()`][base::trunc()]
#' * [`|`][|()]
#'
#' ## bit64
#'
#' * [`as.integer64()`][bit64::as.integer64()]
#' * [`is.integer64()`][bit64::is.integer64()]
#'
#' ## dplyr
#'
#' * [`across()`][dplyr::across()]
#' * [`between()`][dplyr::between()]
#' * [`case_when()`][dplyr::case_when()]: `.ptype` and `.size` arguments not supported
#' * [`coalesce()`][dplyr::coalesce()]
#' * [`desc()`][dplyr::desc()]
#' * [`if_all()`][dplyr::if_all()]
#' * [`if_any()`][dplyr::if_any()]
#' * [`if_else()`][dplyr::if_else()]
#' * [`n()`][dplyr::n()]
#' * [`n_distinct()`][dplyr::n_distinct()]
#'
#' ## lubridate
#'
#' * [`am()`][lubridate::am()]
#' * [`as_date()`][lubridate::as_date()]
#' * [`as_datetime()`][lubridate::as_datetime()]
#' * [`ceiling_date()`][lubridate::ceiling_date()]
#' * [`date()`][lubridate::date()]
#' * [`date_decimal()`][lubridate::date_decimal()]
#' * [`day()`][lubridate::day()]
#' * [`ddays()`][lubridate::ddays()]
#' * [`decimal_date()`][lubridate::decimal_date()]
#' * [`dhours()`][lubridate::dhours()]
#' * [`dmicroseconds()`][lubridate::dmicroseconds()]
#' * [`dmilliseconds()`][lubridate::dmilliseconds()]
#' * [`dminutes()`][lubridate::dminutes()]
#' * [`dmonths()`][lubridate::dmonths()]
#' * [`dmy()`][lubridate::dmy()]: `locale` argument not supported
#' * [`dmy_h()`][lubridate::dmy_h()]: `locale` argument not supported
#' * [`dmy_hm()`][lubridate::dmy_hm()]: `locale` argument not supported
#' * [`dmy_hms()`][lubridate::dmy_hms()]: `locale` argument not supported
#' * [`dnanoseconds()`][lubridate::dnanoseconds()]
#' * [`dpicoseconds()`][lubridate::dpicoseconds()]: not supported
#' * [`dseconds()`][lubridate::dseconds()]
#' * [`dst()`][lubridate::dst()]
#' * [`dweeks()`][lubridate::dweeks()]
#' * [`dyears()`][lubridate::dyears()]
#' * [`dym()`][lubridate::dym()]: `locale` argument not supported
#' * [`epiweek()`][lubridate::epiweek()]
#' * [`epiyear()`][lubridate::epiyear()]
#' * [`fast_strptime()`][lubridate::fast_strptime()]: non-default values of `lt` and `cutoff_2000` not supported
#' * [`floor_date()`][lubridate::floor_date()]
#' * [`force_tz()`][lubridate::force_tz()]: Timezone conversion from non-UTC timezone not supported;
#' `roll_dst` values of 'error' and 'boundary' are supported for nonexistent times,
#' `roll_dst` values of 'error', 'pre', and 'post' are supported for ambiguous times.
#' * [`format_ISO8601()`][lubridate::format_ISO8601()]
#' * [`hour()`][lubridate::hour()]
#' * [`is.Date()`][lubridate::is.Date()]
#' * [`is.POSIXct()`][lubridate::is.POSIXct()]
#' * [`is.instant()`][lubridate::is.instant()]
#' * [`is.timepoint()`][lubridate::is.timepoint()]
#' * [`isoweek()`][lubridate::isoweek()]
#' * [`isoyear()`][lubridate::isoyear()]
#' * [`leap_year()`][lubridate::leap_year()]
#' * [`make_date()`][lubridate::make_date()]
#' * [`make_datetime()`][lubridate::make_datetime()]: only supports UTC (default) timezone
#' * [`make_difftime()`][lubridate::make_difftime()]: only supports `units = "secs"` (the default);
#' providing both `num` and `...` is not supported
#' * [`mday()`][lubridate::mday()]
#' * [`mdy()`][lubridate::mdy()]: `locale` argument not supported
#' * [`mdy_h()`][lubridate::mdy_h()]: `locale` argument not supported
#' * [`mdy_hm()`][lubridate::mdy_hm()]: `locale` argument not supported
#' * [`mdy_hms()`][lubridate::mdy_hms()]: `locale` argument not supported
#' * [`minute()`][lubridate::minute()]
#' * [`month()`][lubridate::month()]
#' * [`my()`][lubridate::my()]: `locale` argument not supported
#' * [`myd()`][lubridate::myd()]: `locale` argument not supported
#' * [`parse_date_time()`][lubridate::parse_date_time()]: `quiet = FALSE` is not supported
#' Available formats are H, I, j, M, S, U, w, W, y, Y, R, T.
#' On Linux and OS X additionally a, A, b, B, Om, p, r are available.
#' * [`pm()`][lubridate::pm()]
#' * [`qday()`][lubridate::qday()]
#' * [`quarter()`][lubridate::quarter()]
#' * [`round_date()`][lubridate::round_date()]
#' * [`second()`][lubridate::second()]
#' * [`semester()`][lubridate::semester()]
#' * [`tz()`][lubridate::tz()]
#' * [`wday()`][lubridate::wday()]
#' * [`week()`][lubridate::week()]
#' * [`with_tz()`][lubridate::with_tz()]
#' * [`yday()`][lubridate::yday()]
#' * [`ydm()`][lubridate::ydm()]: `locale` argument not supported
#' * [`ydm_h()`][lubridate::ydm_h()]: `locale` argument not supported
#' * [`ydm_hm()`][lubridate::ydm_hm()]: `locale` argument not supported
#' * [`ydm_hms()`][lubridate::ydm_hms()]: `locale` argument not supported
#' * [`year()`][lubridate::year()]
#' * [`ym()`][lubridate::ym()]: `locale` argument not supported
#' * [`ymd()`][lubridate::ymd()]: `locale` argument not supported
#' * [`ymd_h()`][lubridate::ymd_h()]: `locale` argument not supported
#' * [`ymd_hm()`][lubridate::ymd_hm()]: `locale` argument not supported
#' * [`ymd_hms()`][lubridate::ymd_hms()]: `locale` argument not supported
#' * [`yq()`][lubridate::yq()]: `locale` argument not supported
#'
#' ## methods
#'
#' * [`is()`][methods::is()]
#'
#' ## rlang
#'
#' * [`is_character()`][rlang::is_character()]
#' * [`is_double()`][rlang::is_double()]
#' * [`is_integer()`][rlang::is_integer()]
#' * [`is_list()`][rlang::is_list()]
#' * [`is_logical()`][rlang::is_logical()]
#'
#' ## stats
#'
#' * [`median()`][stats::median()]: approximate median (t-digest) is computed
#' * [`quantile()`][stats::quantile()]: `probs` must be length 1;
#' approximate quantile (t-digest) is computed
#' * [`sd()`][stats::sd()]
#' * [`var()`][stats::var()]
#'
#' ## stringi
#'
#' * [`stri_reverse()`][stringi::stri_reverse()]
#'
#' ## stringr
#'
#' Pattern modifiers `coll()` and `boundary()` are not supported in any functions.
#'
#' * [`str_c()`][stringr::str_c()]: the `collapse` argument is not yet supported
#' * [`str_count()`][stringr::str_count()]: `pattern` must be a length 1 character vector
#' * [`str_detect()`][stringr::str_detect()]
#' * [`str_dup()`][stringr::str_dup()]
#' * [`str_ends()`][stringr::str_ends()]
#' * [`str_length()`][stringr::str_length()]
#' * [`str_like()`][stringr::str_like()]
#' * [`str_pad()`][stringr::str_pad()]
#' * [`str_remove()`][stringr::str_remove()]
#' * [`str_remove_all()`][stringr::str_remove_all()]
#' * [`str_replace()`][stringr::str_replace()]
#' * [`str_replace_all()`][stringr::str_replace_all()]
#' * [`str_split()`][stringr::str_split()]: Case-insensitive string splitting and splitting into 0 parts not supported
#' * [`str_starts()`][stringr::str_starts()]
#' * [`str_sub()`][stringr::str_sub()]: `start` and `end` must be length 1
#' * [`str_to_lower()`][stringr::str_to_lower()]
#' * [`str_to_title()`][stringr::str_to_title()]
#' * [`str_to_upper()`][stringr::str_to_upper()]
#' * [`str_trim()`][stringr::str_trim()]
#'
#' ## tibble
#'
#' * [`tibble()`][tibble::tibble()]
#'
#' ## tidyselect
#'
#' * [`all_of()`][tidyselect::all_of()]
#' * [`contains()`][tidyselect::contains()]
#' * [`ends_with()`][tidyselect::ends_with()]
#' * [`everything()`][tidyselect::everything()]
#' * [`last_col()`][tidyselect::last_col()]
#' * [`matches()`][tidyselect::matches()]
#' * [`num_range()`][tidyselect::num_range()]
#' * [`one_of()`][tidyselect::one_of()]
#' * [`starts_with()`][tidyselect::starts_with()]
#'
#' @name acero
#'
#' @aliases arrow-functions arrow-verbs arrow-dplyr
NULL

Try the arrow package in your browser

Any scripts or data that you put into this service are public.

arrow documentation built on Nov. 25, 2023, 1:09 a.m.