knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
This is a collection of R utilities functions for me, but maybe also for you.
Functions may be added, specifications of functions may change or become obsolete, and names may change without notice.
install the development version install from GitHub:
install.packages("remotes") remotes::install_github("indenkun/infun", build_vignettes = TRUE)
load library.
library(infun)
Make a sample data for README (example.data
).
example.data <- data.frame(value1 = 1:10, value2 = c(1:3, "strings", 5:10), value3 = c(1:3, "strings", 5:10), value4 = 11:20)
find.not.numeric.value()
This function is used to find the where in the vector there are values that cannot be converted to numbers.
If you specify a column from the dataframe with []
, it behaves in the same way.
If you input a dataframe that contains multiple columns, it will return the location of the column that contains the value that cannot be converted to a number, if specified.
The fourth data in value2 of example.data
will be a string. find.not.numeric.value()
will show where all the data in the vector is located if there is a value that will be forced to NA when converted to numeric type by as.numeric()
. If there is no value to be converted, NA is returned.
find.not.numeric.value(example.data$value1) find.not.numeric.value(example.data$value2) find.not.numeric.value(example.data[1]) find.not.numeric.value(example.data[2]) find.not.numeric.value(example.data)
find.same.value.col()
This function is used to find a column consisting of the same value in a data frame.
If you run same.value.col()
on example.data
, you will see that the second and third columns of the sample data all have the same value.
The result is returned in a list format.
find.same.value.col(example.data)
unique_col()
is a function to remove duplicate columns in a data frame, the column version of {base}
's unique()
.
unique_col(example.data)
find.not.integer.value()
This function is used to find a non-integer value in a vector.
The input value can be of any type, but it must be a vector of numbers only.
If a string or other value is entered, a warning message will be displayed and NA
will be returned.
If you get a warning message that a non-numeric value is entered, try find.not.numeric.value()
to find the non-numeric value.
If you input a dataframe that contains multiple columns, it will return the location of the column that contains the value that non-integer, if specified.
example.data.integer <- data.frame(Item1 = 1:10, Item2 = c(1:5, 6.5, 7.5, 8:10), Item3 = c(1:6, "strings", 8:10))
Returns the location as a number if the value is not an integer. If no non-integer values are entered in a vector consisting of numbers, NA will be returned.
If "logical"
is specified in where
, a vector of logical type will be returned.
find.not.integer.value(example.data.integer$Item1) find.not.integer.value(example.data.integer[2]) find.not.integer.value(example.data.integer[1:2]) find.not.integer.value(example.data.integer$Item3)
find.not.as.Date.value()
This function is used to find the where in the vector there are values that cannot be converted to Date
using as.Date()
in {base}
.
example.data.Date <- data.frame(Date1 = c("2021-7-28", "2021-08-08", "2021-08-24", "2021-09-05"), Date2 = c("2021-7-28", "NOTDATE", "NOTDATE", "2021-09-05"), Date3 = c("210728", "21/08/08", "21/Aug-24", "21sep5")) find.not.as.Date.value(example.data.Date$Date1) find.not.as.Date.value(example.data.Date$Date2) find.not.as.Date.value(example.data.Date$Date3)
find.not.as_date.value()
This function is used to find the where in the vector there are values that cannot be converted to Date
using as_date()
in {lubridate}
.
There is a slight difference between the values that can be converted to Date by {lubridate}
's as_date()
and those that can be converted by {base}
's as.Date()
.
find.not.as_date.value(example.data.Date$Date1) find.not.as_date.value(example.data.Date$Date2) # as_date() converts even relatively fuzzy forms if they can be changed to a date type, while as.Date() operates relatively more strictly. find.not.as_date.value(example.data.Date$Date3)
add.str()
Combine all the items in a specific column of a data frame with any string of characters in the original data frame. The converted column will be a string because it contains strings such as ALL.
You need to specify any column as key
with the column name.
example.data.add.all <- add.str(example.data, "value1") head(example.data.add.all, 20)
random.Date()
random.Date()
is a function that randomly creates a vector of dates at a specified sample size between a specified date and a date.
random.Date(from = "2021/1/1", to = "2021/4/1", size = 10) #> [1] "2021-01-21" "2021-03-17" "2021-03-09" "2021-02-20" "2021-02-24" #> [6] "2021-02-22" "2021-02-06" "2021-03-24" "2021-02-03" "2021-03-11"
age.cal()
age.cal()
is a function that calculates the number of years (age by default), months, and days from a specified date to a specified date.
age.cal(from = c("2000/1/1", "2010/1/1"), to = "2021/4/1")
tableone.rename.*()
These functions are used to change the headline character in the item name of a table created with {tableone}
's to any character.
tableone.rename.overall()
is used to change the "Overall" character in the item name of a table created with {tableone}
's CreateTableOne()
to any character.
# This is the code to create a sample table in `{tableone}`. library(tableone) iris.table <- CreateTableOne(data = iris) iris.table # Rename "Overall" to "ALL". tableone.rename.overall(iris.table, rename.str = "ALL")
tableone.rename.headline()
is a function that change any heading (including Overall) to any character by setting the table heading as an formula before and after the change.
# This is the code to create a sample table in `{tableone}`. library(tableone) library(survival) data(pbc) varsToFactor <- c("status","trt","ascites","hepato","spiders","edema","stage") pbc[varsToFactor] <- lapply(pbc[varsToFactor], factor) vars <- c("time","status","age","sex","ascites","hepato", "spiders","edema","bili","chol","albumin", "copper","alk.phos","ast","trig","platelet", "protime","stage") tableOne <- CreateTableOne(vars = vars, strata = c("trt"), data = pbc, addOverall = TRUE) tableOne # Rename headline name "1" to "D-penicillmain", "2" to "placebo". # Names that contain hyphens will be evaluated as negative in the formula, so they must be enclosed in quotation marks. tableone.rename.headline(tableOne, rename.headline = list(1 ~ "D-penicillmain", 2 ~ placebo))
seq_geometric()
This function is used to generate a sequence of equal ratios, also known as a geometric sequence.
By specifying the first term in from
, the last term or the closest value to the last term in to
, and the common ratio in by.rate
, you can obtain an geometric sequence of "first term * common ratio ^ n" from "from" to the closest value to "to".
seq_geometric(from = 1, to = 128, by.ratio = 2)
Rtools.pacman.package.*()
These are functions to search for packages that can be installed by Rtools' pacman. In short, it is a wrapper for some of the functions of pacman in Rtools.
Cannot be used except in a Windows environment where Rtools40 or later is installed.
You may not be able to use the functions in Rtools42(on R 4.2.x). Please configure Rtools42 before executing the function.
Rtools.pacman.package.list()
is a function that outputs a list of packages that can be installed by Rtools pacman from repository. By specifying arguments, you can extract only those packages that are already installed, or only those that are yet uninstalled.
package.list <- Rtools.pacman.package.list() # It's too long, so show part of it in head() head(package.list)
Rtools.pacman.package.list()
is a function that displays a list of packages that can be installed by pacman in Rtools from repository with the specified arguments in the string. If no matching package is found, return NA.
package.list.curl <- Rtools.pacman.package.find("curl") # It's too long, so show part of it in head() head(package.list.curl)
scale.data.frame()
scale.data.frame()
is generic function whose default method centers and/or scales the columns of a numeric in data frame.
The non-numeric values in the data frame will remain unchanged.
In short, it is a generic function of {base}
cale()
.
It is a generic function of scale()
, so call it with scale()
when {infun}
library is loaded. If the object is a data frame, this will work by itself.
If you want to call it explicitly, use infun:::scale.data.frame()
.
If you want to explicitly use the {base}
scale()
after loading {infun}
as a library, you can use it in scale.default()
.
z.iris <- scale(iris) # It's too long, so show part of it in head() head(z.iris)
save_gtsummary()
This function is used to output the table created by the gtsummary package in PowerPoint or Word, or as an image file.
It just wraps {gtsummary}
's as_flex_table()
and {flextalbe}
's save_as*()
functions.
Supported filename extensions: .pptx, .docx, .png, .pdf, .jpg.
library(gtsummary) library(tidyverse) # Sample code for gtsummary tbl_summary_ex1 <- trial %>% select(age, grade, response) %>% tbl_summary() # Output the table created by gtsummary to PowerPoint(.pptx). tbl_summary_ex1 %>% save_gtsummary(path = "table.pptx")
round_any_*()
round_any()
is used to round a vector made of numbers to an approximation of a sequence of numbers with arbitrary equidifferences.
If the value matches an arbitrary isoperimetric sequence, the value will be output as is.
If the type
argument is ceiling
, it will round to the upper side of the nearest value, and if the type
argument is floor
, it will round to the lower side.
example.vector <- seq(0, 1, 0.1) example.vector round_any(example.vector, by = 0.25, type = "ceiling") round_any(example.vector, by = 0.25, type = "floor")
round_any_ceiling()
is a simplified version of round_any()
, which outputs the result with the argument of type
fixed to ceiling
and origin
fixed to 0
.
round_any_floor()
is a simplified version of round_any()
, where the type
argument is fixed to floor
and the origin
is fixed to 0
.
round_any_*
is faster than round_any()
in most cases, because the internal processing is done as a vector.
However, in rare cases, round_any_*()
may not be possible to obtain accurate values because of R's internal floating point arithmetic. round_any()
creates a sequence of numbers and compares them, so it gives accurate rounding results.
round_any_ceiling(example.vector, 0.25) round_any_floor(example.vector, 0.25)
rand_moji()
Function to create a random Japanese (Kanji or Hiragana) string. Only the range of regular kanji is supported.
It is also compatible and reproducible for set.seed()
.
rand_moji(length = 3, size = 3, moji = "kanji") #> [1] "缶販微" "症凸噴" "侍沖忌" rand_moji(length = 3, size = 3, moji = "hiragana") #> [1] "にうぁ" "もんゐ" "えヴり"
It is a random string, so it does not reflect the normal rules of Japanese. In the case of hiragana, characters that do not normally appear at the beginning of a string, such as Sutegana and "n", will also appear at the beginning.
Katakana strings are not supported and should be converted using functions such as stringi::stri_trans_general()
in the {stringi}
package.
hiragana.moji <- rand_moji(length = 3, size = 3, moji = "hiragana") hiragana.moji #> [1] "つざそ" "がせど" "せこへ" katakana.moji <- stringi::stri_trans_general(hiragana.moji, "hiragana-katakana") katakana.moji #> [1] "ツザソ" "ガセド" "セコヘ"
str_remove_sandwich()
Delete a string of characters sandwiched between specific characters.
The specified string must be a single character, and the first and last characters of the string must be different.
str_remove_sandwich("西馬音内《にしもない》は雄勝郡羽後町《おがちぐんうごまち》です。", start_pattern = "《", end_pattern = "》")
Please escape characters that need to be escaped, such as ()
.
str_remove_sandwich("dplyr (≥ 0.8.3), arabic2kansuji (≥ 0.1.0)", "\\(", "\\)")
str_extract_sandwich()
Extracts strings sandwiched between specified strings.
The specified string must be a single character, and the first and last characters of the string must be different.
str_extract_sandwich("西馬音内《にしもない》は雄勝郡羽後町《おがちぐんうごまち》です。", start_pattern = "《", end_pattern = "》")
Please escape characters that need to be escaped, such as ()
.
str_extract_sandwich("dplyr (≥ 0.8.3), arabic2kansuji (≥ 0.1.0)", "\\(", "\\)")
subset_interchange_col()
For any two columns specified in the data frame (say column A and B), if the combination of column A and B is the same even if they are swapped, it will return it as a data frame or a row number.
For example, if column A has "TOM" and "BOB", and the same respective row in column B has "BOB" and "TOM", the row will be extracted as interchangeable.
Also, when there is a row with the same value in column A and B, it is also determined to be interchangeable and extracted.
example.interchange <- data.frame(X = c("TOM", "BOB", "JOHN", "POP"), Y = c("BOB", "TOM", "BEE", "TOO"), Z = seq(10, 40, by = 10)) subset_interchange_col(example.interchange, "X", "Y") subset_interchange_col(example.interchange, "X", "Y", out.put = "num")
list2data.frame_*()
Function to convert a list into a dataframe.
For list of different lengths, the data frame is constructed according to the longest list, and for short lists, the missing places are filled with NA according to the long list.
list2data.frame_cbind()
makes each element of the list a column.
list2data.frame_rbind()
makes each element of the list a row.
multi_length_list <- list(A = 1, B = 1:2, C = 1:3, D = c(1, NA, 3:4), E = c(1, NA)) list2data.frame_cbind(multi_length_list) list2data.frame_rbind(multi_length_list)
Of course, lists of the same length can also be converted to data frames.
equal_length_list <- list(a = 1:4, b = 5:8, c = 9:12) list2data.frame_cbind(equal_length_list) list2data.frame_rbind(equal_length_list)
objcets_length*()
objects_length()
returns the length value of the input object as a vector.
objects_length_all_equal
returns TRUE if the lengths of all input objects are equal, and FALSE if any one of them is different.
objects_length_num_equal
returns TRUE if the length of the input object is at least one equal to the length specified by .num.
objects_length_num_equal_quantity
returns the number of input objects whose length is equal to the length specified by .num. If .quantity is specified, it will return TRUE if the answer is equal to the specified number.
x <- 1:3 y <- 1:6 z <- 1:3 objects_length(x, y, z) objects_length_all_equal(x, y, z) objects_length_num_equal(x, y, z, .num = 6) objects_length_num_equal_quantity(x, y, z, .num = 3) objects_length_num_equal_quantity(x, y, z, .num = 3, .quantity = 2)
var_()
var_()
computes an interval estimate of the population variance of x
and a hypothesis test using the given population variance.
The sample variance of the estimate is the unbiased variance computed with stats::var()
.
It also calculates the population variance assuming the given value is the population.
Returns results in the "htest" class.
var_(iris$Sepal.Length)
label_vetical()
is function to convert the axis labels of a ggplot2 format graph to a vertical writing system.
It does not actually realize the vertical writing system, but actually just changes lines one character at a time.
If horizontal bars are not replaced with vertical bars, unnatural Japanese notation may result. By default, some horizontal bars are specified with vertical_list()
and replaced with vertical bars.
touhoku <- c("青森県", "秋田県", "岩手県", "山形県", "宮城県", "福島県") scales::demo_discrete(touhoku) scales::demo_discrete(touhoku, labels = label_vertical())
The function to express line breaks when the text consists only of Japanese has been provided, but there is a possibility of misalignment when half-width characters are included or when proportional fonts are used.
tiiki <- c("秋田県\n東北", "東京都\n関東", "大阪府\n関西") scales::demo_discrete(tiiki) scales::demo_discrete(tiiki, labels = label_vertical(line_feed = "\n"))
mode_()
and mode_data.farme()
mode()
is function to calculate the mode and frequency given a vector or a data frame.
mode_(iris["Sepal.Length"])
If multiple columns of data frames are given, the most frequent combination of combinations and frequencies is computed.
Large data frames cannot be calculated properly.
mode_(iris[c(1, 5)])
mode_data.frame()
calculate the mode frequency for each column of the data frame.
The result is in the form of a data frame that returns answers in the form of column name, mode, and frequency. More than one answer may be returned for a column as the mode may not be uniquely obtained.
mode_data.frame(iris)
dummy_code()
Given a variable x with n distinct values, create n new dummy coded variables coded 0/1 for presence (1) or absence (0) of each variable.
This function can be used to create a dummy code by splitting a single value into multiple values separated by commas or other delimiters by specifying any delimiter character.
df_sample <- data.frame(sample = c("a,b", "b", "c", "c,a", "a,b,c")) dummy_code(df_sample$sample, split = ",")
replace_match()
Function to replace a value that exactly matches a pattern with a replacement. Given a vector of equal length for pattern and replacement, the first value of pattern is interpreted as replacing the first value of replacement. This means that a large number of patterns and replacements can be specified in a vector.
pref_list <- c("あきた", "秋田", "秋田県", "あき田", "秋た", "東京都") pattern <- c("あきた", "秋田", "あき田", "秋た") replacement <- c("秋田県", "秋田県", "秋田県", "秋田県") replace_match(pref_list, pattern = pattern, replacement = replacement)
If a value is specified for the nomatch
argument, any value that does not match the pattern and is not substituted is returned; if nomatch
is not specified, the original value is output.
replace_match(pref_list, pattern = pattern, replacement = replacement, nomatch = NA_character_) replace_match(pref_list, pattern = pattern, replacement = replacement, nomatch = "変換不要")
na.omit_select()
If NA is present in a selected column in the data frame, returns a data frame with the rows containing NA in that column removed if the default is the case.
If .retrieve = FALSE
is specified, only rows with NA in the chosen column are returned.
Multiple columns may be specified as the columns to be selected.
example_data <- data.frame(value1 = c(1, 2, NA, NA, 10), value2 = c(1, NA, 3:5), value3 = c(NA, 1, 2, NA, 10)) na.omit_select(example_data, value2, value3) na.omit_select(example_data, value2, value3, .retrieve = FALSE)
hosmer_test()
Hosmer-Lemeshow Goodness of Fit (GOF) Test is to check model quality of logistic regression models.
The Hosmer-Lemeshow Goodness of Fit (GOF) Test is a method for obtaining statistics by dividing observed and expected values into several arbitrary subgroups.
The method of dividing the observed and expected values into subgroups is generally based on the quantile of the expected value, for example, by taking a decile of the expected value.
This method is used in the hoslem.test()
function of the {resouceselection}
package and the performance_hosmer()
function of the {performance}
package.
However, there are several variations on how to divide the subgroups, and this function uses a method in which the expected values are ordered from smallest to largest so that each subgroup has the same number of samples as possible.
The division of subgroups when simple is TRUE and when FALSE is different. See Detail in the documentation for details.
The result is in “htest”
class list format.
data("Titanic") df <- data.frame(Titanic) df <- data.frame(Class = rep(df$Class, df$Freq), Sex = rep(df$Sex, df$Freq), Age = rep(df$Age, df$Freq), Survived = rep(df$Survived, df$Freq)) model <- glm(Survived ~ . ,data = df, family = binomial()) HL <- hosmer_test(model) HL cbind(HL$observed, HL$expected)
readme()
Access The Package README in a Browser. With the package installed, access the README of the installed package from CRAN or GitHub with a browser.
If the package was installed from CRAN, it accesses the CRAN package web page with the README; if there is no README, an empty web page is displayed.
If the package was installed from GitHub, the web page of package on the GitHubis accessed.
readme("infun")
to_times()
Create a times
object in the {chron}
package by taking only the elements of time from a POSIXlt
or POSIXct
object or chron
object in {chron}
package.
x <- as.POSIXlt("2024/12/13 12:00:00") x to_times(x) library(chron) x <- as.chron(x) x to_times(x)
If the POSIXct
object differs from the system time zone, the time zone must also be specified within to_times()
.
Sys.timezone() x <- as.POSIXct("2024/12/13 12:00:00", tz = "America/New_York") x # incorrect to_times(x) # correct to_times(x, tz = "America/New_York")
head_tail()
Only display the specified number of rows and columns of the data frame are extracted, otherwise "..." and abbreviations are used to denote the rest. If the specified number of rows and columns is greater than the original data frame, the data is display as it is.
head_tail(mtcars, n = 3L, col_n = 3L) knitr::kable(head_tail(mtcars, n = 3L, col_n = 3L), align = "r")
mlest2()
Function to compute maximum likelihood estimates of the mean vector and covariance matrix based on the mlest()
function in the {mvnmle}
package.
The mlest()
function in the {mvnmle}
package is computed internally using the nlm()
function, but the solution may not converge if the appropriate number of computations is not specified.If the solution does not converge, this function sets the specified max_iterlim
as the upper limit and recalculates the solution while increasing the number of calculations.
# Solution does not converge and stop.code becomes 4 mvnmle::mlest(airquality) # stop.code is 1 or 2 because the solution of maximum likelihood estimates is convergent. # The stop.code follows the description of nlm(). mlest2(airquality)
LittleMCAR_test()
The LittleMCAR_test()
function internally uses the mlest2()
function to perform the MCAR test for Little, and the {BaylorEdPsych}
removed from CRAN has a function that uses the mlest()
function from the {mvnmle}
package to Little's MCAR test, but as described in the description of mlest2()
, the mlest()
function in the {mvnmle}
package may exit without converging solutions in its default behavior, and correct statistics may not be calculated.
Therefore, LittleMCAR_test()
performs Little's MCAR test using mlest2()
internally, whose solution converges.
The upper limit of the number of calculations can be set with max_iterlim
. If the solution has not converged beyond the upper limit, stop.code
will be 4
, so increase max_iterlim
and recalculate.
The result is in “htest”
class list format.
LittleMCAR_test(airquality) # stop.code is 1 or 2 because the solution of maximum likelihood estimates is convergent. # The stop.code follows the description of nlm(). LittleMCAR_test(airquality)$stop.code
{purrr}
{stats}
{utils}
{gtsummary}
{flextable}
{tools}
{lubridate}
{dplyr}
{knitr}
{rmarkdown}
{chron}
{mvnmle}
MIT.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.