segtoFreq: Calculate CNV frequency data from given segment data
In progenetix/pgxRpi: R wrapper for Progenetix

View source: R/segtoFreq.R

segtoFreq

R Documentation

Calculate CNV frequency data from given segment data

Description

Thie function calculates the frequency of deletions and duplications

Usage

segtoFreq(
  data,
  cnv_column_idx = 6,
  cohort_name = "unspecified cohort",
  assembly = "hg38",
  bin_size = 1e+06,
  overlap = 1000,
  soft_expansion = 0.1
)

Arguments

`data`	Segment data containing CNV states. The first four columns should represent sample ID, chromosome, start position, and end position, respectively. The fifth column can contain the number of markers or other relevant information. The column representing CNV states (with a column index of 6 or higher) should either contain "DUP" for duplications and "DEL" for deletions, or level-specific CNV states such as "EFO:0030072", "EFO:0030071", "EFO:0020073", and "EFO:0030068", which correspond to high-level duplication, low-level duplication, high-level deletion, and low-level deletion, respectively.
`cnv_column_idx`	Index of the column specifying the CNV state. Default is 6, based on the "pgxseg" format used in Progenetix. If the input segment data follows the general `.seg` file format, this index may need to be adjusted accordingly.
`cohort_name`	A string specifying the cohort name. Default is "unspecified cohort".
`assembly`	A string specifying the genome assembly version for CNV frequency calculation. Allowed options are "hg19" or "hg38". Default is "hg38".
`bin_size`	Size of genomic bins used to split the genome, in base pairs (bp). Default is 1,000,000.
`overlap`	Numeric value defining the amount of overlap between bins and segments considered as bin-specific CNV, in base pairs (bp). Default is 1,000.
`soft_expansion`	Fraction of `bin_size` to determine merge criteria. During the generation of genomic bins, division starts at the centromere and expands towards the telomeres on both sides. If the size of the last bin is smaller than `soft_expansion` * bin_size, it will be merged with the previous bin. Default is 0.1.

Value

The binned CNV frequency stored in "pgxfreq" format

Examples

## load necessary data (this step can be skipped in real implementation)
data("hg38_cytoband")
## get pgxseg data
seg <- read.table(system.file("extdata", "example.pgxseg",package = 'pgxRpi'),header=TRUE,sep = "\t")
## calculate frequency data
freq <- segtoFreq(seg)
## visualize
pgxFreqplot(freq)

progenetix/pgxRpi documentation built on June 1, 2025, 1:06 p.m.