Discrimination of samples using between group analysis as described by Culhane et al., 2002.

1 2 3 4 |

`dataset` |
Training dataset. A |

`classvec` |
A |

`type` |
Character, "coa", "pca" or "nsc" indicating which data transformation is required. The default value is type="coa". |

`x` |
An object of class |

`arraycol, genecol` |
Character, colour of points on plot. If arraycol is NULL,
arraycol will obtain a set of contrasting colours using |

`nlab` |
Numeric. An integer indicating the number of variables (genes) at the end of axes to be labelled, on the gene plot. |

`axis1` |
Integer, the column number for the x-axis. The default is 1. |

`axis2` |
Integer, the column number for the y-axis, The default is 2. |

`genelabels` |
A vector of variables labels, if |

`...` |
further arguments passed to or from other methods. |

`bga`

performs a between group analysis on the input dataset. This function
calls `bca`

. The input format of the dataset
is verified using `isDataFrame`

.

Between group analysis is a supervised method for sample discrimination and class prediction.
BGA is carried out by ordinating groups (sets of grouped microarray samples), that is,
groups of samples are projected into a reduced dimensional space. This is most easily
done using PCA or COA, of the group means. The choice of PCA, COA is defined by the parameter `type`

.

The user must define microarray sample groupings in advance. These groupings are defined using
the input `classvec`

, which is a `factor`

or `vector`

.

**Cross-validation and testing of bga results:**

bga results should be validated using one leave out jack-knife cross-validation using
`bga.jackknife`

and by projecting a blind test datasets onto the bga axes
using `suppl`

.
`bga`

and `suppl`

are combined in `bga.suppl`

which requires input of both a training and test dataset.
It is important to ensure that the selection of cases for a training and test set are not biased, and
generally many cross-validations should be performed. The function `randomiser`

can be used to randomise the selection of training and test samples.

**Plotting and visualising bga results:**
*1D plots, show one axis only:*
1D graphs can be plotted using `between.graph`

and
`graph1D`

. `between.graph`

is used for plotting the cases,
and required both the co-ordinates of the cases (\$ls) and their centroids (\$li). It accepts an object `bga`

.
`graph1D`

can be used to plot either cases (microarrays) or variables (genes) and only requires
a vector of coordinates.

*2D plots:*
Use `plot.bga`

to plot results from `bga`

. plot.bga calls the functions
`plotarrays`

to draw an xy plot of cases (\$ls).
`plotgenes`

, is used to draw an xy plot of the variables (genes).
`plotgenes`

, is used to draw an xy plot of the variables (genes).

*3D plots:*
3D graphs can be generated using `do3D`

and `html3D`

.
`html3D`

produces a web page in which a 3D plot can be interactively rotated, zoomed,
and in which classes or groups of cases can be easily highlighted.

**Analysis of the distribution of variance among axes:**

It is important to know which cases (microarray samples) are discriminated by the axes.
The number of axes or principal components from a `bga`

will equal `the number of classes - 1`

,
that is length(levels(classvec))-1.

The distribution of variance among axes is described in the eigenvalues (\$eig) of the `bga`

analysis.
These can be visualised using a scree plot, using `scatterutil.eigen`

as it done in `plot.bga`

.
It is also useful to visualise the principal components from a using a `bga`

or principal components analysis
`dudi.pca`

, or correspondence analysis `dudi.coa`

using a
heatmap. In MADE4 the function `heatplot`

will plot a heatmap with nicer default colours.

**Extracting list of top variables (genes):**

Use `topgenes`

to get list of variables or cases at the ends of axes. It will return a list
of the top n variables (by default n=5) at the positive, negative or both ends of an axes.
`sumstats`

can be used to return the angle (slope) and distance from the origin of a list of
coordinates.

For more details see Culhane et al., 2002 and http://bioinf.ucd.ie/research/BGA.

A list with a class `bga`

containing:

`ord` |
Results of initial ordination. A list of class "dudi" (see |

`bet` |
Results of between group analysis. A list of class "dudi" (see |

`fac` |
The input classvec, the |

Aedin Culhane

Culhane AC, et al., 2002 Between-group analysis of microarray data. Bioinformatics. 18(12):1600-8.

See Also `bga`

,
`suppl`

, `suppl.bga`

, `bca`

,
`bga.jackknife`

1 2 3 4 5 6 7 8 9 10 11 |

```
Loading required package: ade4
Loading required package: RColorBrewer
Loading required package: gplots
Attaching package: 'gplots'
The following object is masked from 'package:stats':
lowess
Loading required package: scatterplot3d
$ord
$ord
Duality diagramm
class: coa dudi
$call: dudi.coa(df = data.tr, scannf = FALSE, nf = ord.nf)
$nf: 63 axis-components saved
$rank: 63
eigen values: 0.1713 0.1383 0.1032 0.05995 0.04965 ...
vector length mode content
1 $cw 306 numeric column weights
2 $lw 64 numeric row weights
3 $eig 63 numeric eigen values
data.frame nrow ncol content
1 $tab 64 306 modified array
2 $li 64 63 row coordinates
3 $l1 64 63 row normed scores
4 $co 306 63 column coordinates
5 $c1 306 63 column normed scores
other elements: N
$fac
[1] EWS EWS EWS EWS EWS EWS EWS EWS EWS EWS
[11] EWS EWS EWS EWS EWS EWS EWS EWS EWS EWS
[21] EWS EWS EWS BL-NHL BL-NHL BL-NHL BL-NHL BL-NHL BL-NHL BL-NHL
[31] BL-NHL NB NB NB NB NB NB NB NB NB
[41] NB NB NB RMS RMS RMS RMS RMS RMS RMS
[51] RMS RMS RMS RMS RMS RMS RMS RMS RMS RMS
[61] RMS RMS RMS RMS
Levels: EWS BL-NHL NB RMS
attr(,"class")
[1] "coa" "ord"
$bet
Between analysis
call: bca.dudi(x = data.ord$ord, fac = classvec, scannf = FALSE, nf = nclasses -
1)
class: between dudi
$nf (axis saved) : 3
$rank: 3
$ratio: 0.3599779
eigen values: 0.1522 0.1218 0.08981
vector length mode content
1 $eig 3 numeric eigen values
2 $lw 4 numeric group weigths
3 $cw 306 numeric col weigths
data.frame nrow ncol content
1 $tab 4 306 array class-variables
2 $li 4 3 class coordinates
3 $l1 4 3 class normed scores
4 $co 306 3 column coordinates
5 $c1 306 3 column normed scores
6 $ls 64 3 row coordinates
7 $as 63 3 inertia axis onto between axis
$fac
[1] EWS EWS EWS EWS EWS EWS EWS EWS EWS EWS
[11] EWS EWS EWS EWS EWS EWS EWS EWS EWS EWS
[21] EWS EWS EWS BL-NHL BL-NHL BL-NHL BL-NHL BL-NHL BL-NHL BL-NHL
[31] BL-NHL NB NB NB NB NB NB NB NB NB
[41] NB NB NB RMS RMS RMS RMS RMS RMS RMS
[51] RMS RMS RMS RMS RMS RMS RMS RMS RMS RMS
[61] RMS RMS RMS RMS
Levels: EWS BL-NHL NB RMS
attr(,"class")
[1] "coa" "bga"
[1] "Data (original) range: -0.92 0.8"
[1] "Data (scale) range: -1.15 1.15"
[1] "Data scaled to range: -1.15 1.15"
```

