glad: Analysis of array CGH data

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/glad.R

Description

This function allows the detection of breakpoints in genomic profiles obtained by array CGH technology and affects a status (gain, normal or lost) to each clone.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
## S3 method for class 'profileCGH'
glad(profileCGH, mediancenter=FALSE,
                smoothfunc="lawsglad", bandwidth=10, round=1.5,
                model="Gaussian", lkern="Exponential", qlambda=0.999,
                base=FALSE, sigma,
                lambdabreak=8, lambdacluster=8, lambdaclusterGen=40,
                type="tricubic", param=c(d=6),
                alpha=0.001, msize=5,
                method="centroid", nmax=8, assignGNLOut=TRUE,
                breaksFdrQ = 0.0001, haarStartLevel = 1, haarEndLevel = 5,
                verbose=FALSE, ...)

Arguments

profileCGH

Object of class profileCGH

mediancenter

If TRUE, LogRatio are centered on their median.

smoothfunc

Type of algorithm used to smooth LogRatio by a piecewise constant function. Choose either lawsglad, haarseg, aws or laws in aws package.

bandwidth

Set the maximal bandwidth hmax in the aws or laws functions in aws package. For example, if bandwidth=10 then the hmax value is set to 10*X_N where X_N is the position of the last clone.

round

The smoothing results are rounded or not depending on the round argument. The round value is passed to the argument digits of the round function.

model

Determines the distribution type of the LogRatio. Keep always the model as "Gaussian" (see laws in aws package).

lkern

Determines the location kernel to be used (see aws or laws in aws package).

qlambda

Determines the scale parameter for the stochastic penalty (see aws or laws in aws package)

base

If TRUE, the position of clone is the physical position on the chromosome, otherwise the rank position is used.

sigma

Value to be passed to either argument sigma2 ofaws function or shape of laws (see aws package). If NULL, sigma is calculated from the data.

lambdabreak

Penalty term (λ') used during the Optimization of the number of breakpoints step.

lambdacluster

Penalty term (λ*) used during the MSHR clustering by chromosome step.

lambdaclusterGen

Penalty term (λ*) used during the HCSR clustering throughout the genome step.

type

Type of kernel function used in the penalty term during the Optimization of the number of breakpoints step, the MSHR clustering by chromosome step and the HCSR clustering throughout the genome step.

param

Parameter of kernel used in the penalty term.

alpha

Risk alpha used for the Outlier detection step.

msize

The outliers MAD are calculated on regions with a cardinality greater or equal to msize.

method

The agglomeration method to be used during the MSHR clustering by chromosome and the HCSR clustering throughout the genome clustering steps.

nmax

Maximum number of clusters (N*max) allowed during the the MSHR clustering by chromosome and the HCSR clustering throughout the genome clustering steps.

assignGNLOut

If FALSE the status (gain/normal/loss) is not assigned for outliers.

breaksFdrQ

breaksFdrQ for HaarSeg algorithm.

haarStartLevel

haarStartLevel for HaarSeg algorithm.

haarEndLevel

for HaarSeg algorithm.

verbose

If TRUE some information are printed

...

...

Details

The function glad implements the methodology which is described in the article: Analysis of array CGH data: from signal ratio to gain and loss of DNA regions (Hupé et al., Bioinformatics, 2004).

The principles of the GLAD algorithm: First, the detection of breakpoints is based on the estimation of a piecewise constant function with the Adaptive Weights Smoothing (AWS) procedure (Polzehl and Spokoiny, 2002). Alternatively, it is possible to use the HaarSeg algorithm (Ben-Yaacov and Eldar, Bioinformatics, 2008). Then, a procedure based on penalyzed maximum likelihood optimizes the number of breakpoints and removes the undesirable breakpoints. Finally, based on the regions previously identified, a two-step unsupervised classification (MSHR clustering by chromosome and the HCSR clustering throughout the genome) with model selection criteria allows a status to be assigned for each region (gain, loss or normal).

Main parameters to be tuned:

qlambda if you want the smoothing to fit some very local effect, choose a smaller qlambda.
bandwidth choose a bandwidth not to small otherwise you will have a lot of little discontinuities.
lambdabreak The higher the parameter is, the higher the number of undesirable breakpoints is.
lambdacluster The higher the parameter is, the higher is the number of the regions within a chromosome which belong to the same cluster.
lambdaclusterGen More the parameter is high more the regions over the whole genome are supposed to belong to the same cluster.

Value

An object of class "profileCGH" with the following attributes:

profileValues:

a data.frame with the following added information:

  • SmoothingThe smoothing values correspond to the median of each MSHR (i.e. Region).

  • BreakpointsThe last position of a region with identical amount of DNA is flagged by 1 otherwise it is 0. Note that during the "Optimization of the number of breakpoints" step, removed breakpoints are flagged by -1.

  • RegionEach position between two breakpoints are labelled the same way with an integer value starting from one. The label is incremented by one when a new breakpoint is found or when moving to the next chromosome. The variable region is what we call MSHR.

  • LevelEach position with equal smoothing value is labelled the same way with an integer value starting from one. The label is incremented by one when a new level is found or when moving to the next chromosome.

  • OutliersAwsEach AWS outliers are flagged -1 or 1 otherwise it is 0.

  • OutliersMadEach MAD outliers are flagged -1 (if it is in the α/2 lower tail of the distribution) or 1 (if it is in the α/2 upper tail of the distribution) otherwise it is 0.

  • OutliersTotOutliersAws + OutliersMad.

  • ZoneChrClusters identified after MSHR (i.e. Region) clustering by chromosome.

  • ZoneGenClusters identified after HCSR clustering throughout the genome.

  • ZoneGNLStatus of each clone : Gain is coded by 1, Loss by -1 and Normal by 0.

BkpInfo:

the data.frame attribute BkpInfo which gives the list of breakpoints:

  • PosOrderThe rank position of each clone on the genome.

  • PosBaseThe base position of each clone on the genome.

  • ChromosomeChromosome name.

SigmaC:

the data.frame attribute SigmaC gives the estimation of the LogRatio standard-deviation for each chromosome:

  • ChromosomeChromosome name.

  • ValueThe estimation is based on the Inter Quartile Range.

Note

People interested in tools dealing with array CGH analysis can visit our web-page http://bioinfo.curie.fr.

Author(s)

Philippe Hupé, glad@curie.fr.

References

See Also

profileCGH, as.profileCGH, plotProfile.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
data(snijders)

### Creation of "profileCGH" object
gm13330$Clone <- gm13330$BAC
profileCGH <- as.profileCGH(gm13330)



###########################################################
###
###  glad function as described in Hupé et al. (2004)
###
###########################################################


res <- glad(profileCGH, mediancenter=FALSE,
                smoothfunc="lawsglad", bandwidth=10, round=1.5,
                model="Gaussian", lkern="Exponential", qlambda=0.999,
                base=FALSE,
                lambdabreak=8, lambdacluster=8, lambdaclusterGen=40,
                type="tricubic", param=c(d=6),
                alpha=0.001, msize=5,
                method="centroid", nmax=8,
                verbose=FALSE)

### cytoband data to plot chromosomes
data(cytoband)

### Genomic profile on the whole genome
plotProfile(res, unit=3, Bkp=TRUE, labels=FALSE, Smoothing="Smoothing",
main="Breakpoints detection: GLAD analysis", cytoband = cytoband)

###Genomic profile for chromosome 1
plotProfile(res, unit=3, Bkp=TRUE, labels=TRUE, Chromosome=1,
Smoothing="Smoothing", main="Chromosome 1: GLAD analysis", cytoband = cytoband)

### The standard-deviation of LogRatio are:
res$SigmaC

### The list of breakpoints is:
res$BkpInfo

GLAD documentation built on Nov. 8, 2020, 11:10 p.m.

Related to glad in GLAD...