Description Usage Arguments Value
View source: R/dg_prepare_datasets.R
Prepare DiMSum-processed doubledeepPCA datasets for free energy estimation
1 2 3 4 5 6 7 8 | dg_prepare_datasets(
dataset_folder,
abundancepca_files,
bindingpca_files,
wt_seq = "",
train_test_split = c(10, 0.1),
fitness_scale = "lin"
)
|
dataset_folder |
absolute path to the dataset folder, is created if non-existent |
abundancepca_files |
absolute path to the DiMSum-processed abundancePCA .RData files containing aa_seq (or aa_subs), fitness and sigma cols; multiple files will be treated as independent measurements |
bindingpca_files |
absolute path to the DiMSum-processed bindingPCA .RData files containing aa_seq (or aa_subs), fitness and sigma cols; multiple files will be used as independent measurements |
wt_seq |
a character string of the wild-type/reference aa sequence used to extract aa_subs from aa_seq; if wt_seq is empty, assumes aa_subs column is present |
train_test_split |
c(A, B), Ax test split of variants, where each split has size B; default c(10, 0.1), i.e. 10x cross validation |
fitness_scale |
either "lin" or "log"; for "lin", assumes fitness = [0,Inf], with wild-type fitness 1; for "log", assumes fitness = [-Inf, Inf], with wild-type fitness = 0, default = "lin" |
writes a .RData file to $dataset_folder/data/fitness_dataset.RData containing the varlist list with all necessary variables to compute dg models for the dataset
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.