2019-12-26
zhoufang

A set of convenience functions around data manipulation.
lapply_preserve_names base R lapply wrapper since it give access to only the element of the vector but not other attributes
> foo = list("fruit1"="apple", "fruit2" = "banana", "num" = c(1, 2))
> lapply(foo, function(x)print(names(x)))
NULL
NULL
NULL
...
> lapply_preserve_names(foo, function(x)print(names(x)))
[1] "fruit1"
[1] "fruit2"
[1] "num"
list_files_fwd_slash wraper function list.files to allow coppied windows path format containing backward slashes
is_fct_or_chr character type column identifier
headtail a quick glance into a dataframe combining n head and tail rows
get_mode get the mode of a vector
ff_quantile function factory to create nth quantile function, such as q5 <- ff_quantile(0.05)
unique_row extract unique rows combinations and show row counts
sum_table given 2 columns, summarize counts of each unique value-pair combinations
sum_col summarize a data frame with concise and useful summary statistics
sum_row generate row-wise summary for missing and unique values
sum_df sister function to sum_col and sum_row, returns a list of results from each
sum_missing summary statistics of top n missing value columns
contain_value examine if a data holder contains 'value' in a pre-defined term
any_dups output summary of duplicated rows by given keys
trim_ws remove leading and trailing white spaces
trim_ws_df remove leading and trailing whitespaces that are within any character or factor type columns of a data.frame
move_left move column or columns to far left hand side of a dataframe
add_wmy add week, weekday, month, year columns based on datetime column
subset_by_quantile filter column by removing points beyond top and(or) bottom percentage thresholds
add_empty_rows append empty rows to a dataframe to make row number to a target number
puff_my_df artificially increase the number of rows to make up any missing combinations by label columns for every unique id, functions similarly to tidyr::crossing and tidyr::expand.grid.
> foo <- data.frame(
+     "name"   = c("Rachel", "Chandler", "Rachel", "Monica", "Rachel", "Chandler"),
+     "year"   = c(2010, 2010, 2012, 2010, 2011, 2012),
+     "size"  =  c("XS", "S",  "M", "S", "S", "M"),
+     "results" = c(1,2,3,4,5,6)
+ )
#new rows introduced with 'weights' of NA
> puff_my_df(foo, "name", c("year", "size"))
       name year size results
1    Rachel 2010   XS       1
3    Rachel 2012    M       3
5    Rachel 2011    S       5
11   Rachel 2010    S      NA
2  Chandler 2010    S       2
6  Chandler 2012    M       6
12 Chandler 2010   XS      NA
21 Chandler 2011    S      NA
4    Monica 2010    S       4
13   Monica 2010   XS      NA
22   Monica 2012    M      NA
31   Monica 2011    S      NA
coalesce_join coalescely join x, y data frames, for columns of same names, append y values to x if rows that are missing value
multi_join wrapper function to coalesce_join, join together a list of data frames
rm_single_unique_col remove column(s) that contain a single unique value
remove_empty_rows remove duplicated rows of the original data frame, or a subset of if column names being passed in
rm_dups_w_less_data remove duplicated rows by leaving only 1 observation per each group of most non-missing data columns, if multiple observations sharing equal number of most non-missing data columns, choose the first one by row index. A wraper function of any_dups.
insert_nas insert NAs as replacement randomly to a data
rm_na remove NAs given a vector
remove_duplicates
fill_na_as_missing fill NAs and empty cells with fillers such as a character
fill_as_na sister function to fill_na_as_missing, replace cells with NAs if match to given string
clean_by_id remove rows where id columns are all missing, and rows where all columns but the id columns are missing; (optional) Remove columns where all rows are of missing values.
encode_col replace column names with alpha-numeric sequences, returns a named vector (of name-value pairs) for column name look-up; Returns a list.
decode_col sister function to encode_col to revert encoded list of dataframes
get_name sister function to encode_col, retrieve column names based on id(s)
get_id sister function to encode_col, retrieve column id based on name(s)
str_find wrapper function for locate a string based on regex leading to and after it
copy_unique copy unique values to clipboard
save_csv a custom way to output CSV files
save_csv_from_a_list a wrapper function of save_csv that applies to a list of data frames
load_csv a custom way to load CSV files
from_excel wrapper function for convenient copy from Excel into a data frame ('Trick' via @SuzanBaert on twitter)
write_fwf write fixed width format text file
fread2 wrapper function to data.table::fread to convert blank cells to NA at reading
load_files load files paths in specified location
file.copy.content.only to enable selective file copy to another directory, base R's file.copy function can only copy a entire directory to another directory
insp_dir function to ferform a non intrusive inspection of a given directory and return a dataframe with essential file information sorted by file size
clear_dir given a directory, delete files given the regex pattern excluding files matching to 'avoid' arg
gen_key a wrapper function for sodium::keygen to generate, convert keys
encrypt a wrapper function to encrypt file using a key
decrypt a wrapper function to decrypt file using a key
open_encrypt a wrapper function for the workflow of decrypt, source and encrypt back files
format_to_percentage convert integer/floating point number format to percentage format
format_num format numbers to limit digits after decimal point
format_datetime format datetime columns to a specified format
cap_str capitalize the first letter of each word in the string
conv_fct_to_chr convert all columns of factor type to character type
conv_chr_to_fct convert all columns of character type to factor type
secs_to_date convert all columns of character type to factor type
dats_to_date convert numerically converted format POSIXct time back to POSIXct datetime format
multiplot
plot_boxplot boxplot with confidence intervals and summary statistics such as p-val and risk
corr_to_df filter high corr features and output in tidy format
remove_high_corr_features wraper function to caret::findCorrelation: find and remove highly correlated features
model_lm run lm model and print a curated summary output
hclust_wss hiearchical clustering with for within cluster SSs screeplot for easier cluster number selection, a wrapper function of 'fastcluster' package method hclust (for matrix) and hclust.vector(for vector)
fill_map a wrapper function to 'gstat::gstat' function to use a univariate or multivariate geostatistical model with input of x,y coordinate and fill z column with predictions, for each id group, puff_my_df also written to make all available entries available for each group
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.