Description

If you are lazy like me, than you might find this package useful. typeless provides shorthand for common functions and code expressions.

library(typeless)

Variable Type Conversions

Tired of typing as.character, as.numeric or as.factor?
Don't.
Use these instead:

ac(120) # as.character(120)
an("2") # as.numeric("2")
af("a") # as.factor("a")
ai(2.1) # as.integer(2.1)
al(0)   # as.logical(0)

Objects Type Conversions

Need to coerce tabular data object to a data.frame again?
Your ML algorithm requires your tabular data to be a matrix?
Use these:

df <- adf(matrix(0,2,2)) # as.data.frame(matrix(0,2,2))
class(df)
mx <- am(data.frame(a = "a", b = "b")) # as.matrix(data.frame(a = "a", b = "b"))
class(mx)

Tabular Data Objects Metadata

Replace colnames and row.names with cn and rn:

cn(iris)
rn(head(mtcars))

Tired of counting missing values with length(which(is.na(x)))?
Use nas instead (it is a vectorized implementation too)

nas(airquality)

Common Operations

Division

Dividing a floating point usually yields another floating point.
A minor nuisance is that the division operand / does not ask the user about the number of decimal points to display in the resulting computation.
This sometimes yields an additional call to the round function, which is quite annoying.
Hence, the function dvf (Divide and Format) will save some occasional function calls by explicitly specifying the decimal points of interest.

dvf(1,3) # default is 3 decimal points
dvf(1, 3, decimals = 1) # 1 decimal point

Mode

mean function extracts the mean.
median function extracts the median.
mode function extracts the storage mode of an object (wait.. what?)
that's correct, nor base neither stats packages provide a function to extract the most frequently appearing object in a vector, aka the mode. Since the function name mode is already taken, we can use mod:

mod(c("a", "a", "b", "c", "a")) # mode is "a"
mod(c(1:10, 3)) # mode is 3

Imputation

There are many packages that provide imputation methods such as mice and norm. However, sometimes a very swift and naive imputation is sufficient. I use 4 imputation methods quite commonly:

Introducing simp (simple imputation) function:

vec <- c(1, 1, 4, NA)
simp(vec) # default is imputation by mean
simp(vec, "median") # impute with median
simp(vec, "zero")   # impute with zero
simp(vec, "flag")   # impute with "NA" flag - suits categorical features

Display Percent

Many times we are interested in seeing values as percent rather than prportions. This can be achieved easily with scales::percent_format(), but it'd be even shorter with pf():

pf(91/100)
pf(47/51)
pf(c(0.2, 0.8)) # vector input

Display Comma Format

When dealing with large values (in plots for example), it is sometimes useful to change the values display to a comma format. cf() is the short version to do it (a shorthand for scales::comma_format())

cf(1000000L)
cf(seq(1e+10, 5e+10, 1e+10)) # vector input works as well

Convert unix date into datetime

Sometimes, we want to convert unix time objects into datetime in order to analyse temporal
effects (i.e. month, weekday etc.).

The wrapper convert_unix() offers a comfortable alternative to Base R as.POSIXct().
In convert_unix() the need to specify the timezone is obsolete and whether to consider
miliseconds or not is specified as an argument rather than hard-coding division of 1,000 to the original unix time objects.

# given the format yy-mm-dd, 24 * 60 * 60 * 1000 (miliseconds considered) represents 1970-01-02
day_ms <- 24 * 60 * 60 * 1000
convert_unix(day_ms)

# with miliseconds disconsidered, unix time is 24 * 60 * 60 - simply set ms argument to FALSE
day <- day_ms / 1000
convert_unix(day, ms = FALSE)

# for a vector of unix time objects
one_day <- seq(day, day * 2, 60 * 60) # every hour between 1970-01-02 to 1970-01-03, no miliseconds
convert_unix(one_day, ms = FALSE)


ShaulAb/typeless documentation built on May 28, 2019, 3:15 p.m.