partition_dataset: Partition synthetic dataset to training and test set

Description Usage Arguments Value Author(s) See Also Examples

View source: R/utils.R

Description

Partition synthetic dataset to training and test set

Usage

1
2
3
4
5
6
7
partition_dataset(
  dt_obj,
  data_train_prcg = 0.5,
  region_train_prcg = 0.95,
  cpg_train_prcg = 0.5,
  is_synth = FALSE
)

Arguments

dt_obj

Melissa data object

data_train_prcg

Percentage of genomic regions that will be fully used for training, i.e. across the whole region we will have no CpGs missing.

region_train_prcg

Fraction of genomic regions to keep for training set, i.e. some genomic regions will have no coverage at all during training.

cpg_train_prcg

Fraction of CpGs in each genomic region to keep for training set.

is_synth

Logical, whether we have synthetic data or not.

Value

The Melissa object with the following changes. The 'met' element will now contain only the 'training' data. An additional element called 'met_test' will store the data that will be used during testing to evaluate the imputation performance. These data will not be seen from Melissa during inference.

Author(s)

C.A.Kapourani C.A.Kapourani@ed.ac.uk

See Also

create_melissa_data_obj, melissa, filter_regions

Examples

1
2
# Partition the synthetic data from Melissa package
dt <- partition_dataset(melissa_encode_dt)

Melissa documentation built on Nov. 8, 2020, 5:37 p.m.