README.md
In BioStaCs/BSCgas: Greedy AUC stepwise

BioStaCs

Data manupilating/Analytic tools for GWAS data.

The following needs to be integrated in our projects.

C/Cpp
Python/perl
R
Java

TODO LIST:

Use text files to store files: csv/txt.
Offering README file which includes
Data description
Colum description (types: Numeric/Categorical/Ordered; cate: What is the var? )
Header/No headr
State how to handle NA/NULL/missing value

Additional package is need:

# R>3.0
library(devtools)
devtools::install_github("lineprof")
devtools::install_github("shiny-slickgrid", "wch")

To use it, one can try

#library(lineprof)
#source(find_ex("read-delim.r"))
#wine <- find_ex("wine.csv")
#x <- lineprof(read_delim(wine, sep = ","), torture = TRUE)
#shine(x)

It will open an web page such as

Profiling

The profile information can show more details.

There are some built-in function in this package to monitor/clean memory

Built-in functions

lsos() shows the memory usage with neat format; showMemoryUse() shows memory usage and performs a gc() automatically.

Besides,

Outer Packages

bigalgebra BIGMEM biganalytics BIGMEM bigmemory BIGMEM bigtabulate BIGMEM synchronicity BIGMEM

GAS (Greedy AUC Stepwise) is a classification framework, which is successfully applied our SpermatogenesisOnline project. In binary classification problem, GAS maximizes ROC curve with pregiven number of variables automatically, which aims to solve a K-Sparse problem by finding the best K features using greedy searching that maximizes AUC. The strategy of GAS is similar to forward selection, which only adds one variable that is not already in the model and increases the value of AUC. If GAS fails to find out the solution with K variables, it will output the model that generates the maximum AUC instead For each step with maximum allowed numbers.

AUC