In TaylorAndrew/atClean: Methods for cleaning data

Clean character strings:

trim() will remove whitespace from either the left side of a character string, right side of a character string, or both sides.

Usage: trim(" A string ", side = "left")

substrRight() provides a wrapper for substr() where it will take a substring indexing at the right side of a character string.

Usage: substrRight("12March2014", n = 4)

Dealing with factors:

applyLabels() is a wrapper for factor() which will apply a common label to one or more variables in data.frame.

Usage: applyLabels(data = df, varlist = c("var1", "var2"), labels = c("No", "Yes"))

reFactor() reorders the ordering of the levels of a factor variable. If the order is not given, the factor levels are reverse-ordered.

Usage: reFactor(x, index_order = c(3, 1, 2))

Expanding and Collapsing Data:

dichot() converts a vector to 0 and 1s, converting all data in matching group1 to 0 and all data matching group2 to 1, with all other data being set to NA

Usage: dichot(x, group1=c("No", "no"), group2=c("Yes", "yes"))

dummyCode() creates a standard set of variables dummy coded based on a input vector.

Usage: dummyCode(x)

dropAllNA() will drop all columns, rows, or both, where all entries in the data.frame are NA.

Usage: dropAllNA(df)

multChoiceCondense() takes a set of variables in a dataset and collapses them into a single variable.

Usage: multChoiceCondense(data = df, varList = c())

Other data cleaning methods:

logical_join() provides booliean variable appended onto data1, indicating which rows have matching observations in data2

Usage: logical_join(df1, df2, by =) df1 %>% logical_join(df2, by =)

normalize() normalize transforms a numeric vector to percentile ranks

Usage: normalize(x)

rmOutliers() converts outleirs (as defined as being some x number of standard deviations from the mean) to either NA, mean +/- x standard deviations, or the mean.

Usage: rmOutliers(x, sdCut = 3, method = "remove")

t_() is an expansion of the t() function, in which, if x is a data.frame(), it is transposed, and the first column of the output data.frame is the column names from the input data.frame.

Usage: t_(x)

TaylorAndrew/atClean documentation built on May 9, 2019, 4:21 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com