if (!exists('dont_run_setup')) { dont_run_setup <- FALSE } if (!dont_run_setup) { knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) } #
library(loose.rock) loose.rock::base.dir(file.path(tempdir(), 'run-cache'))
Collection of function to improve workflow in survival analysis and data science. Among the many features, the generation of balanced datasets, retrieval of protein coding genes from two public databases (live) and generation of random matrix based on covariance matrix.
The work has been mainly supported by two grants: FCT SFRH/BD/97415/2013 and the EU Commission under SOUND project with contract number 633974.
The only pre-requirement is to install biomaRt
bioconductor package as
it cannot be installed automatically via CRAN.
All other dependencies should be installed when running the install command.
if (!require("BiocManager")) install.packages("BiocManager") BiocManager::install("loose.rock") # use the package library(loose.rock)
coding.genes()
: downloads protein coding genes from external databasesgen.synth.xdata()
: generate random matrix with pre-determined covariancebalanced.cv.folds()
and balanced.train.and.test()
: get balanced
train/test sets and cv folds.run.cache()
: keep cache or results of a functionproper()
: Capitalize string using regexpressionmy.colors()
: My own palletemy.symbols()
: Same with symbols to plotslibrary(dplyr)
Showing only a random sample of 15
coding.genes() %>% dplyr::arrange(external_gene_name) %>% { dplyr::slice(., sample(seq(nrow(.)), 15)) } %>% knitr::kable()
This is specially relevant in survival or binary output with few cases of one category that need to be well distributed among test/train data sets or in cross-validation folds.
Example below sets aside 90% of the data to the training set. As samples are
already divided in two sets (set1
and set2
), it performs the 90% separation
for each and then joins (with option join.all = T
) the result.
set1 <- c(rep(TRUE, 8), FALSE, rep(TRUE, 9), FALSE, TRUE) set2 <- !set1 cat( 'Set1', '\n', set1, '\n\n', 'Set2', '\n', set2, '\n\n', 'Training / Test set using logical indices', '\n\n' ) set.seed(1985) balanced.train.and.test(set1, set2, train.perc = .9) # set1 <- which(set1) set2 <- which(set2) cat( '##### Same sets but using numeric indices', '\n\n', 'Set1', '\n', set1, '\n\n', 'Set2', '\n', set2, '\n\n', 'Training / Test set using numeric indices', '\n') set.seed(1985) balanced.train.and.test(set1, set2, train.perc = .9) #
xdata1 <- gen.synth.xdata(10, 5, .2) xdata2 <- gen.synth.xdata(10, 5, .75)
# cat('Using .2^|i-j| to generate co-variance matrix\n\n') cat('X generated\n\n') data.frame(xdata1) cat('cov(X)\n\n') data.frame(cov(xdata1)) draw.cov.matrix(xdata1) + ggplot2::ggtitle('X1 Covariance Matrix') # cat('Using .75^|i-j| to generate co-variance matrix (plotting correlation)\n\n') cat('X generated\n\n') data.frame(xdata2) cat('cov(X)\n\n') data.frame(cor(xdata2, method = 'pearson')) draw.cov.matrix(xdata2, fun = cor, method = 'pearson') + ggplot2::ggtitle('X2 Pearson Correlation Matrix')
Uses a cache to save and retrieve results. The cache is automatically created with the arguments and source code for function, so that if any of those changes, the cache is regenerated.
Caution: Files are not deleted so the cache directory can become rather big.
Set a temporary directory to save all caches (optional)
base.dir(file.path(tempdir(), 'run-cache'))
Run sum function twice
a <- run.cache(sum, 1, 2) b <- run.cache(sum, 1, 2) all(a == b)
Run rnorm function with an explicit seed (otherwise it would return the same random number)
a <- run.cache(rnorm, 5, seed = 1985) b <- run.cache(rnorm, 5, seed = 2000) all(a == b)
One of such is a proper function that capitalizes a string.
x <- "OnE oF sUcH iS a proPer function that capitalizes a string." proper(x)
my.colors()
and my.symbols()
can be used to improve plot readability.
xdata <- -10:10 plot( xdata, 1/10 * xdata * xdata + 1, type="l", pch = my.symbols(1), col = my.colors(1), cex = .9, xlab = '', ylab = '', ylim = c(0, 20) ) grid(NULL, NULL, lwd = 2) # grid only in y-direction for (ix in 2:22) { points( xdata, 1/10 * xdata * xdata + ix, pch = my.symbols(ix), col = my.colors(ix), cex = .9 ) }
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.