knitr::opts_chunk$set(echo = TRUE)
library(QuiPTsim)
setwd("~/amylogram/QuiPTsim/vignettes/")

QuiPTsim contains source code used utilized in extensive study of Quick Permutation Test (QuiPT) properties. QuiPT have been further compared to other widely used feature selection algorithms (Hope this gonna happen).

Simulation data

Following steps have been carried out to create datasets:

  1. Various alphabets have been defined. Sequences generated consist of both full alphabet and and their simplifications(from literature & domain knowledge of Mr Burdukiewicz).
  2. Sequences have been generated both using uniform distribution of elements and previously computed element probabilites (AAs occurence count in AmpGram data). Sequence simplification was further based on precomputed AA probabilites.
  3. For such sequences, motifs have been injected. Again, AA occurence probability was precomputed on AmpGram data. Worth nothing is that motif injection has been implemented so that motifs can overlap each other.
  4. Each n-gram occurence matrix has been equipped with a list of motifs and their masks.

Simulation pipeline details

Function create_simulation_data() is a high-level wrapper of function implemented in QuiPTsim package as a data generation automation framework.

Results of single simulation iteration are saved in previously stated directory. N-gram occurence matrices (in RDS format) are saved along with master data frame in csv format that contains all the details about prepared dataset. Each row represents single matrix and its location is defined in path column.

Simulation highlights:

What did not work for us

Generating sequence probabilites using Markov Chains

Generating alphabet probabilites with given cosine similarity



jakubkala/QuiPTsim documentation built on Jan. 17, 2022, 11:27 p.m.