View source: R/assemble_training_data.R
assemble_training_data | R Documentation |
Prepares the training data for fitting TOP models. It splits training data into 10 partitions and assembles training data for all TF x cell type combinations for each of the partitions.
assemble_training_data(
tf_cell_table,
logistic_model = FALSE,
chip_col = "chip",
training_chrs = paste0("chr", seq(1, 21, 2)),
n_partitions = 10,
n_cores = n_partitions,
max_sites = 50000,
seed = 1
)
tf_cell_table |
A data frame listing all TF x cell type combinations and the training data for each combination. It should have at least three columns, with: TF names, cell types, and file names of the individual training data for each TF x cell type combination. The individual training data should be in .rds or text (.txt, or .csv) format. |
logistic_model |
Logical. If |
chip_col |
The column name of ChIP data in the individual training data (default: ‘chip’). |
training_chrs |
Chromosomes used for training the model (default: odd chromosomes, chr1, chr3, ..., chr21) |
n_partitions |
Number of partitions to split the training data (default: 10). |
n_cores |
Number of cores to run in parallel
(default: equal to |
max_sites |
Max number of candidate sites to keep for
each TF x cell type combination (default: 50000). To reduce computation time,
randomly select |
seed |
A number for the seed used when sampling sites. |
A list of data frames (default: 10), each containing one partition of the training data with all TF x cell type combinations.
## Not run:
# tf_cell_table should have three columns with:
# TF names, cell types, and paths to the training data files, like:
# | tf_name | cell_type | data_file |
# |:------------:|:-------------:|:------------------------:|
# | CTCF | K562 | CTCF.K562.data.rds |
# | CTCF | A549 | CTCF.A549.data.rds |
# | CTCF | GM12878 | CTCF.GM12878.data.rds |
# | ... | ... | ... |
# Assembles training data for the quantitative occupancy model,
# uses odd chromosomes for training, keeps at most 50000 candidate sites for
# each TF x cell type combination, and splits training data into 10 partitions.
assembled_training_data <- assemble_training_data(tf_cell_table,
logistic_model = FALSE,
chip_col = 'chip',
training_chrs = paste0('chr', seq(1,21,2)),
n_partitions=10,
max_sites = 50000)
# Assembles training data for the logistic version of the model
assembled_training_data <- assemble_training_data(tf_cell_table,
logistic_model = TRUE,
chip_col = 'chip_label',
training_chrs = paste0('chr', seq(1,21,2)),
n_partitions=10,
max_sites = 50000)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.