If you are lazy like me, than you might find this package useful. typeless provides shorthand for common functions and code expressions.
library(typeless)
Tired of typing as.character
, as.numeric
or as.factor
?
Don't.
Use these instead:
ac(120) # as.character(120) an("2") # as.numeric("2") af("a") # as.factor("a") ai(2.1) # as.integer(2.1) al(0) # as.logical(0)
Need to coerce tabular data object to a data.frame again?
Your ML algorithm requires your tabular data to be a matrix?
Use these:
df <- adf(matrix(0,2,2)) # as.data.frame(matrix(0,2,2)) class(df) mx <- am(data.frame(a = "a", b = "b")) # as.matrix(data.frame(a = "a", b = "b")) class(mx)
Replace colnames
and row.names
with cn
and rn
:
cn(iris) rn(head(mtcars))
Tired of counting missing values with length(which(is.na(x)))?
Use nas
instead (it is a vectorized implementation too)
nas(airquality)
Dividing a floating point usually yields another floating point.
A minor nuisance is that the division operand /
does not ask the user about the number of decimal points to display in the resulting computation.
This sometimes yields an additional call to the round
function, which is quite annoying.
Hence, the function dvf
(Divide and Format) will save some occasional function calls by explicitly specifying the decimal points of interest.
dvf(1,3) # default is 3 decimal points dvf(1, 3, decimals = 1) # 1 decimal point
mean
function extracts the mean.
median
function extracts the median.
mode
function extracts the storage mode of an object (wait.. what?)
that's correct, nor base
neither stats
packages provide a function to extract the most frequently appearing object in a vector, aka the mode.
Since the function name mode
is already taken, we can use mod
:
mod(c("a", "a", "b", "c", "a")) # mode is "a" mod(c(1:10, 3)) # mode is 3
There are many packages that provide imputation methods such as mice and norm. However, sometimes a very swift and naive imputation is sufficient. I use 4 imputation methods quite commonly:
Introducing simp
(simple imputation) function:
vec <- c(1, 1, 4, NA) simp(vec) # default is imputation by mean simp(vec, "median") # impute with median simp(vec, "zero") # impute with zero simp(vec, "flag") # impute with "NA" flag - suits categorical features
Many times we are interested in seeing values as percent rather than prportions.
This can be achieved easily with scales::percent_format()
, but it'd be even shorter with pf()
:
pf(91/100) pf(47/51) pf(c(0.2, 0.8)) # vector input
When dealing with large values (in plots for example), it is sometimes useful to change the values display to a comma format.
cf()
is the short version to do it (a shorthand for scales::comma_format()
)
cf(1000000L) cf(seq(1e+10, 5e+10, 1e+10)) # vector input works as well
Sometimes, we want to convert unix time objects into datetime in order to analyse temporal
effects (i.e. month, weekday etc.).
The wrapper convert_unix()
offers a comfortable alternative to Base R
as.POSIXct()
.
In convert_unix()
the need to specify the timezone is obsolete and whether to consider
miliseconds or not is specified as an argument rather than hard-coding division of 1,000 to the
original unix time objects.
# given the format yy-mm-dd, 24 * 60 * 60 * 1000 (miliseconds considered) represents 1970-01-02 day_ms <- 24 * 60 * 60 * 1000 convert_unix(day_ms) # with miliseconds disconsidered, unix time is 24 * 60 * 60 - simply set ms argument to FALSE day <- day_ms / 1000 convert_unix(day, ms = FALSE) # for a vector of unix time objects one_day <- seq(day, day * 2, 60 * 60) # every hour between 1970-01-02 to 1970-01-03, no miliseconds convert_unix(one_day, ms = FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.