gosummaries.prcomp: Prepare gosummaries object based on PCA results
In GOsummaries: Word cloud summaries of GO enrichment analysis

Description Usage Arguments Details Value Author(s) Examples

The PCA results are converted into a gosummaries object, by extracting genes with the largest positive and negative weights from each component.

## S3 method for class 'prcomp'
gosummaries(x, annotation = NULL, components = 1:10,
  show_genes = FALSE, gconvert_target = "NAME",
  n_genes = ifelse(show_genes, 30, 500), organism = "hsapiens", ...)

`x`	an object of class `prcomp`
`annotation`	a `data.frame` describing the samples, its row names should match with column names of the projection matrix in x
`components`	numeric vector of components to include
`show_genes`	logical showing if GO categories or actual genes are shown in word clouds
`gconvert_target`	specifies gene ID format for genes showed in word cloud. The name of the format is passed to `gconvert`, if NULL original IDs are shown.
`n_genes`	shows the number of genes used for annotating the component, in case gene names are shown, it is the maximum number of genes shown in a word cloud
`organism`	the organism that the gene lists correspond to. The format should be as follows: "hsapiens", "mmusculus", "scerevisiae", etc
`...`	GO annotation filtering parameters as defined in `gosummaries.default`

The usual visualisation of PCA results displays the projections of sample expression on the principal axes. It shows if and how the samples cluster, but not why do they behave like that. Actually, it is possible to go further and annotate the axes by studying genes that have the largest influence in the linear combinations that define the principal components. For example, high expression of genes with large negative weights pushes the samples projection to the negative side of the principal axis and large positive weigths to the positive side. If a sample has highly expressed genes in both groups it stays most probably in the middle. If we annotate functionally the genes with highest positive and negative weights for each of the principal axes, then it is possible to say which biological processes drive the separation of samples on them.

This function creates a gosummaries object for such analysis. It expects the results of prcomp function. It assumes that the PCA was done on samples and, thus, the row names of the rotation matrix can be interpreted as gene names. For each component it annotates n_genes elements with highest positive and negative weights.

The function can also display genes instead of their GO annotations, while the sizes of the gene names correspond to the PCA loadings. The corresponding parameters are described in more detail in gosummaries.MArrayLM.

A gosummaries object.

Raivo Kolde <raivo.kolde@eesti.ee>

## Not run: 
data(tissue_example)

pcr = prcomp(t(tissue_example$exp))
gs_pca = gosummaries(pcr, annotation = tissue_example$annot)

plot(gs_pca, classes = "Tissue", components = 1:3, fontsize = 8)

## End(Not run)

# Read metabolomic data
data(metabolomic_example)

pca = prcomp(t(metabolomic_example$data))

# Turn off GO enricment, since it does not work on metabolites
gs = gosummaries(pca, annotation = metabolomic_example$annot, 
                 show_gene = TRUE, gconvert_target = NULL)
plot(gs, class = "Tissue", components = 1:3, fontsize = 8)