Analyze the performance on simulation with constant signal length in each set

Share:

Description

Takes the dataset and metafile output of nhppSimConstWindowGen and of SegSeq, then evaluates the performance in change-point precision and recall. The dataset must be generated in such format for this function to work.

Usage

1
nhppSimConstWindowAnalysis(filePrefix, chromosomeN, distMetric=c(20,50,100,150,200,300,500,1000), cptLen=c(3,5,8,12,15,20,30,50,100), nPair=2, nRepeat=10, statistic="normal", grid.size="auto", takeN=5, maxNCut=60, minStat=5, verbose=FALSE, timing=TRUE, hasRun=FALSE, width=12, height=6)

Arguments

filePrefix

The first part of the filename for data and metafile generated by nhppSimConstWindowGen

chromosomeN

The number indicating the chromosome number the dataset emulates

distMetric

A set of criterions of determining change points called are true. A call is deemed true if an actual signal change points within x number of reads is matched to it, after a minimum-cost bipartite matching. Larger value is a looser criterion.

cptLen

The second part of the filename for data and metafile generated by nhppSimConstWindowGen, indicating the length of the true signal. Constant width of the signal (CN gain or loss) region to simulate, can be a vector of different values for which to test

nPair

A part of the filename for data and metafile generated by nhppSimConstWindowGen, indicating the number of normal/tumor pair. Number of tumor samples to generate for each choice of the width of the signal; number of normal samples to generate

nRepeat

A part of the filename for data and metafile generated by nhppSimConstWindowGen. Number of times to repeat the simulation data generation

statistic

The type of statistic to use for the analysis

grid.size

Argument to ScanCBS

takeN

Argument to ScanCBS

maxNCut

Argument to ScanCBS

minStat

Argument to ScanCBS

verbose

If TRUE, will print run information as the algorithm proceeds

timing

Performs timing of the ScanCBS algorithm

hasRun

If TRUE, will read the output file of ScanCBS instead of run it on these datasets again. Only use when the same call to ScanCBS has been used before in this function call.

width

Width of the graph output file

height

Height of the graph output file

Details

This function is used in conjunction with nhppSimConstWindowGen. It reads in the data and metafile output of the said function, and compares the performance of our algorithm with SegSeq. It is important that SegSeq has been used on the simulation datasets generated before using this.

Value

simCBS

Result of ScanCBS output structure

CBSMatchDist

The distance among reads after minimum-cost bipartite graph matching for our algorithm

SegMatchDist

The distance among reads after minimum-cost bipartite graph matching for SegSeq

CBSRecall, SegRecall

The recall rates of two algorithms

CBSPrecision, SegPrecision

The precision rates of two algorithms

CBSFMeasure, SegFMeasure

The F-measure of two algorithms

trueTauMeanSigLen

The mean distance between true signal boundaries

nTrueTau

The number of true change points

nCBSCall, nSegCall

Number of change points called by the two algorithms

CBSTime

Mean computational time of ScanCBS for each signal length

Author(s)

Jeremy J. Shen

See Also

nhppSimConstWindowGen

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.