Other functions in bulkreadr"
In bulkreadr: The Ultimate Tool for Reading Data in Bulk

knitr::opts_chunk$set(
  collapse = TRUE,
  message = FALSE, 
  warning = FALSE,
  comment = "#>",
  fig.path = "man/figures/",
  out.width = "100%")

options(tibble.print_min = 5, tibble.print_max = 5)

options(rmarkdown.html_vignette.check_title = FALSE)

The bulkreadr package in R includes specialized functions beyond bulk data reading, aimed at enhancing data analysis efficiency. These functions are designed to operate on individual vectors, except for inspect_na() and fill_missing_values(), which work on data frames.

pull_out()

pull_out() is similar to [. It acts on vectors, matrices, arrays and lists to extract or replace parts. It is pleasant to use with the magrittr (⁠%>%⁠) and base(|>) operators.

library(bulkreadr)
library(dplyr)

top_10_richest_nig <- c("Aliko Dangote", "Mike Adenuga", "Femi Otedola", "Arthur Eze", "Abdulsamad Rabiu", "Cletus Ibeto", "Orji Uzor Kalu", "ABC Orjiakor", "Jimoh Ibrahim", "Tony Elumelu")

top_10_richest_nig %>% 
  pull_out(c(1, 5, 2))

top_10_richest_nig %>% 
  pull_out(-c(1, 5, 2))

convert_to_date()

convert_to_date() parses an input vector into POSIXct date-time object. It is also powerful to convert from excel date number like 42370 into date value like 2016-01-01.

## ** heterogeneous dates **

dates <- c(
  44869, "22.09.2022", NA, "02/27/92", "01-19-2022",
  "13-01-  2022", "2023", "2023-2", 41750.2, 41751.99,
  "11 07 2023", "2023-4"
  )

# Convert to POSIXct or Date object

convert_to_date(dates)

# It can also convert date time object to date object 

convert_to_date(lubridate::now())

inspect_na()

inspect_na() summarizes the rate of missingness in each column of a data frame. For a grouped data frame, the rate of missingness is summarized separately for each group.

# dataframe summary

inspect_na(airquality)

Grouped dataframe summary

airquality %>% 
  group_by(Month) %>% 
  inspect_na()

fill_missing_values()

fill_missing_values() is an efficient function that addresses missing values in a data frame. It uses imputation by function, also known as column-based imputation, to impute the missing values. It supports various imputation methods for continuous variables, including minimum, maximum, mean, median, harmonic mean, and geometric mean. For categorical variables, missing values are replaced with the mode of the column. This approach ensures accurate and consistent replacements derived from individual columns, resulting in a complete and reliable dataset for improved analysis and decision-making.

df <- tibble::tibble(
  Sepal_Length = c(5.2, 5, 5.7, NA, 6.2, 6.7, 5.5),
  Sepal.Width = c(4.1, 3.6, 3, 3, 2.9, 2.5, 2.4),
  Petal_Length = c(1.5, 1.4, 4.2, 1.4, NA, 5.8, 3.7),
  Petal_Width = c(NA, 0.2, 1.2, 0.2, 1.3, 1.8, NA),
  Species = c("setosa", NA, "versicolor", "setosa",
    NA, "virginica", "setosa"
  )
)

df

Impute using the mean method for continuous variables

#' df <- tibble::tibble(
#' Sepal_Length = c(5.2, 5, 5.7, NA, 6.2, 6.7, 5.5),
#' Petal_Length = c(1.5, 1.4, 4.2, 1.4, NA, 5.8, 3.7),
#' Petal_Width = c(NA, 0.2, 1.2, 0.2, 1.3, 1.8, NA),
#' Species = c("setosa", NA, "versicolor", "setosa",
#'            NA, "virginica", "setosa")
#' )

result_df_mean <- fill_missing_values(df, method = "mean")

result_df_mean

Impute using the geometric mean for continuous variables and specify variables Petal_Length and Petal_Width

result_df_geomean <- fill_missing_values(df, selected_variables = c
("Petal_Length", "Petal_Width"), method = "geometric")

result_df_geomean

Impute missing values (NAs) in a grouped data frame

You can use the fill_missing_values() in a grouped data frame by using other grouping and map functions. Here is an example of how to do this:

sample_iris <- tibble::tibble(
Sepal_Length = c(5.2, 5, 5.7, NA, 6.2, 6.7, 5.5),
Petal_Length = c(1.5, 1.4, 4.2, 1.4, NA, 5.8, 3.7),
Petal_Width = c(0.3, 0.2, 1.2, 0.2, 1.3, 1.8, NA),
Species = c("setosa", "setosa", "versicolor", "setosa",
          "virginica", "virginica", "setosa")
)

sample_iris

sample_iris %>%
  group_by(Species) %>%
  group_split() %>%
  map_df(fill_missing_values, method = "median")

Any scripts or data that you put into this service are public.

bulkreadr documentation built on June 8, 2025, 9:36 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

bulkreadr
The Ultimate Tool for Reading Data in Bulk

Other functions in bulkreadr"
In bulkreadr: The Ultimate Tool for Reading Data in Bulk

pull_out()

convert_to_date()

inspect_na()

fill_missing_values()

Impute missing values (NAs) in a grouped data frame

Try the bulkreadr package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

bulkreadr The Ultimate Tool for Reading Data in Bulk

Other functions in bulkreadr" In bulkreadr: The Ultimate Tool for Reading Data in Bulk

pull_out()

convert_to_date()

inspect_na()

fill_missing_values()

Impute missing values (NAs) in a grouped data frame

Try the bulkreadr package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

bulkreadr
The Ultimate Tool for Reading Data in Bulk

Other functions in bulkreadr"
In bulkreadr: The Ultimate Tool for Reading Data in Bulk