cumulative_count_union: Integrations cumulative count in time by sample
In ISAnalytics: Analyze gene therapy vector insertion sites data identified from genomics next generation sequencing reads for clonal tracking studies

Description Usage Arguments Details Value See Also Examples

\lifecycle

experimental This function computes the cumulative number of integrations observed in each sample at different time points by assuming that if an integration is observed at time point "t" then it is also observed in time point "t+1".

cumulative_count_union(
  x,
  association_file = NULL,
  timepoint_column = "TimePoint",
  key = c("SubjectID", "CellMarker", "Tissue", "TimePoint"),
  include_tp_zero = FALSE,
  zero = "0000",
  aggregate = FALSE,
  ...
)

`x`	A simple integration matrix or an aggregated matrix (see details)
`association_file`	NULL or the association file for x if `aggregate` is set to TRUE
`timepoint_column`	What is the name of the time point column?
`key`	The aggregation key - must always contain the `timepoint_column`
`include_tp_zero`	Include timepoint 0?
`zero`	How is 0 coded in the data frame?
`aggregate`	Should x be aggregated?
`...`	Additional parameters to pass to `aggregate_values_by_key`

Input data frame

The user can provide as input for the x parameter both a simple integration matrix AND setting the aggregate parameter to TRUE, or provide an already aggregated matrix via aggregate_values_by_key. If the user supplies a matrix to be aggregated the association_file parameter must not be NULL: aggregation will be done by an internal call to the aggregation function. If the user supplies an already aggregated matrix, the key parameter is the key used for aggregation - NOTE: for this operation is mandatory that the time point column is included in the key.

Assumptions on time point format

By using the functions provided by this package, when imported, an association file will be correctly formatted for future usage. In the formatting process there is also a padding operation performed on time points: this means the functions expects the time point column to be of type character and to be correctly padded with 0s. If the chosen column for time point is detected as numeric the function will attempt the conversion to character and automatic padding. If you choose to import the association file not using the import_association_file function, be sure to check the format of the chosen column to avoid undesired results.

A data frame

Other Analysis functions: CIS_grubbs(), comparison_matrix(), compute_abundance(), sample_statistics(), separate_quant_matrices(), threshold_filter(), top_integrations()

op <- options(ISAnalytics.widgets = FALSE)

path_AF <- system.file("extdata", "ex_association_file.tsv",
    package = "ISAnalytics"
)
root_correct <- system.file("extdata", "fs.zip",
    package = "ISAnalytics"
)
root_correct <- unzip_file_system(root_correct, "fs")

association_file <- import_association_file(path_AF, root_correct,
    dates_format = "dmy"
)
matrices <- import_parallel_Vispa2Matrices_auto(
    association_file = association_file, root = NULL,
    quantification_type = c("seqCount", "fragmentEstimate"),
    matrix_type = "annotated", workers = 2, patterns = NULL,
    matching_opt = "ANY", multi_quant_matrix = FALSE
)

#### EXTERNAL AGGREGATION
aggregated <- aggregate_values_by_key(matrices$seqCount, association_file)
cumulative_count <- cumulative_count_union(aggregated)

#### INTERNAL AGGREGATION
cumulative_count_2 <- cumulative_count_union(matrices$seqCount,
    association_file,
    aggregate = TRUE
)

options(op)