In Lightbridge-AI/lbr: Data wrangling and analysis

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
  #source("corrplot_df.R",local = knitr::knit_global()),
  #source("read_dir_csv.R",local = knitr::knit_global())
)

lbr

My personal tools for data wrangling and analysis, a collection of random stuff.

Mostly relies on tidyverse package.

Installation

You can install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("Lightbridge-KS/lbr")

Read multiple CSV files from a directory

I want to use readr::read_csv() for multiple CSV files. So, I combine read_csv() with lapply to read from a directory path with a little help of fs package to get file path and grep() for file pattern matching.

  library(lbr)

# Read form current working directory by default

  read_dir_csv()

# And file names are set to names of each data frame in a list

# Give a directory path
  read_dir_csv("path/to/dir")

# Also, can read from sub-directory of given directory

  read_dir_csv("path/to/dir", recurse = T)

# Can specify regular expression of file names to read.

  read_dir_csv(regexp = "[[:digit:]]+")  # CSV file name contains numbers

# If set strict_csv_ext = F and remove regex which require file name ending with .csv
# it might read other file type as well (if you like)

  read_dir_csv(regexp = "[[:digit:]]+\\.txt$", strict_csv_ext = FALSE) # Try read .txt file

# If you want file names in snake_case (require snakecase package)

  read_dir_csv(snake_case = TRUE)

Read multiple files by supply a function

Now, you can supply any file reading function you want to read from a directory !

read.dir() can read multiple files from a directory using a function supplied by a user.

First argument fun is any function you want to use as reading engine. (first argument of fun must be file path) eg. utils::read.csv
Second argument path is a path to directory.
Third argument pattern is a regular expression to match file name and extension you want to read.
... : argument pass to fun .

# Read .csv file form working directory (default) using `utils::read.csv`. 

  read.dir(utils::read.csv) # default `pattern` is "\\.csv$"

# Read .xlsx file from a directory using `readxl::read_excel`. 
## Must specify regular expression to match file extension.

  read.dir(readxl::read_excel, path = "path/to/dir" ,pattern = "\\.xlsx$")

# Read .rds file ; To also read form sub-directory set `recursive = TRUE` .

  read.dir(readRDS, pattern = "\\.rds$", path = "path/to/dir" ,recursive = TRUE)

# Read files using multiple engine from multiple path and multiple file extension.

  params <- list(fun = c(readr::read_csv, readxl::read_excel),
                 path = c("path/to/dir_1", "path/to/dir_2"),
                 pattern = c("\\.csv$", "\\.xlsx$")
                 )

  purrr::pmap(params, read.dir)

  # or using base R

  ls <- vector("list", 2)
  for(i in seq_along(params[[1]])){
    ls[[i]]  <- read.dir(fun = params[[1]][[i]],  path = params[[2]][[i]],
                         pattern = params[[3]][[i]])
  }

Plot correlation from data frame

A wrapper around corrplot::corrplot() which accept data frame as input.

Can set p-value to exclude non-significant correlation out from the plot.
Correlation calculation method: "pearson", "spearman" or "kendall" can be specified in method_cor argument.
... pass to arguments of corrplot::corrplot()

library(lbr)

# Plot correlation of numeric column (automatically filter only numeric for plot)

corrplot_df(iris)

mtcars %>% 
  corrplot_df(method_plot = "circle", # specify graphic method of correlation
                method_cor = "pearson", # method of calculation of correlation
                type = "lower",         # display lower half of correlation plot
                sig.level = 0.01,       # p-value = 0.01
                insig = "pch"           # insignificant will be crossed out 
                )

mtcars %>% 
  corrplot_df(method_plot = "number",   # Show correlation as number
                method_cor = "spearman",# Calculate correlation by "spearman" method
                type = "upper",         # display upper half of correlation plot
                sig.level = 0.05,       # p-value = 0.05
                insig = "blank"         # insignificant will be blank (default)
                )