dg_prepare_datasets: Prepare DiMSum-processed doubledeepPCA datasets for free...

Description Usage Arguments Value

View source: R/dg_prepare_datasets.R

Description

Prepare DiMSum-processed doubledeepPCA datasets for free energy estimation

Usage

1
2
3
4
5
6
7
8
dg_prepare_datasets(
  dataset_folder,
  abundancepca_files,
  bindingpca_files,
  wt_seq = "",
  train_test_split = c(10, 0.1),
  fitness_scale = "lin"
)

Arguments

dataset_folder

absolute path to the dataset folder, is created if non-existent

abundancepca_files

absolute path to the DiMSum-processed abundancePCA .RData files containing aa_seq (or aa_subs), fitness and sigma cols; multiple files will be treated as independent measurements

bindingpca_files

absolute path to the DiMSum-processed bindingPCA .RData files containing aa_seq (or aa_subs), fitness and sigma cols; multiple files will be used as independent measurements

wt_seq

a character string of the wild-type/reference aa sequence used to extract aa_subs from aa_seq; if wt_seq is empty, assumes aa_subs column is present

train_test_split

c(A, B), Ax test split of variants, where each split has size B; default c(10, 0.1), i.e. 10x cross validation

fitness_scale

either "lin" or "log"; for "lin", assumes fitness = [0,Inf], with wild-type fitness 1; for "log", assumes fitness = [-Inf, Inf], with wild-type fitness = 0, default = "lin"

Value

writes a .RData file to $dataset_folder/data/fitness_dataset.RData containing the varlist list with all necessary variables to compute dg models for the dataset


jschmiedel/tempura documentation built on Nov. 13, 2020, 3:53 a.m.