R/dplyr-funcs-augmented.R

Defines functions register_bindings_augmented add_filename

Documented in add_filename

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.

#' Add the data filename as a column
#'
#' This function only exists inside `arrow` `dplyr` queries, and it only is
#' valid when querying on a `FileSystemDataset`.
#'
#' To use filenames generated by this function in subsequent pipeline steps, you
#' must either call \code{\link[dplyr:compute]{compute()}} or
#' \code{\link[dplyr:collect]{collect()}} first. See Examples.
#'
#' @return A `FieldRef` \code{\link{Expression}} that refers to the filename
#' augmented column.
#'
#' @examples \dontrun{
#' open_dataset("nyc-taxi") %>% mutate(
#'   file =
#'     add_filename()
#' )
#'
#' # To use a verb like mutate() with add_filename() we need to first call
#' # compute()
#' open_dataset("nyc-taxi") %>%
#'   mutate(file = add_filename()) %>%
#'   compute() %>%
#'   mutate(filename_length = nchar(file))
#' }
#'
#' @keywords internal
add_filename <- function() Expression$field_ref("__filename")

register_bindings_augmented <- function() {
  register_binding("arrow::add_filename", add_filename)
}

Try the arrow package in your browser

Any scripts or data that you put into this service are public.

arrow documentation built on Sept. 11, 2024, 8:02 p.m.