Description Usage Arguments Details References See Also Examples
FunCluster performs a functional analysis of microarray expression data based on Gene Ontology & KEGG annotations. FunCluster is designed to build functional classes of putatively co-regulated biological processes through a specially designed clustering procedure relying on expression data and functional annotations.
1 2 3 4 5 |
wd |
sets the working directory where the expression data files are to be found and where results are to be stored. |
org |
indicates the biological species to which analyzable transcript expression data is related; currently only three possibilities are available with FunCluster: "HS" for human expression data, "MM" for mouse (Mus Musculus) expression data and "SC" for yeast (Saccharomyces Cerevisiae) expression data. Default value is "HS". |
two.lists |
possible values are TRUE if a discriminatory functional analysis of two lists of transcripts is required (e.g. significantly up-regulated transcripts versus down-regulated transcripts) or FALSE if only one list of transcripts is to be analyzed. In the case of differential analysis of two lists of transcripts, FunCluster expects to find within the working folder two tab separated text files containing the transcript expression data named "up.txt" and "down.txt" respectively (names are mandatory). In the case of only one list of transcripts FunCluster expects to find within its working directory a single tab separated text file named "genes.txt". Please see the example dataset for the format of the data files. The default value of this parameter is TRUE. |
restrict |
possible values are TRUE if a reference list of transcripts is provided for the statistical significance calculation of the transcript enrichment of the biological annotations or FALSE if such a restriction is not imposed and the transcript enrichment significance is therefore estimated with regards of the whole genome. The purpose of this parameter was to correct the enrichment significance calculations for those situations in which expression data is not available for the whole genome but only for a fraction of it, either because of microarray processing errors which limits the number of transcripts available for analysis, or for the case of dedicated microarrays, designed to scan only a fraction of the genome. For the case in which such a restriction is needed a tab separated text file named "ref.txt" should be provided, containing the list of all the transcripts initially available for the analysis (after filtering for missing data). The transcripts should be identified only by their LocusLink ID number or by their EntrezGene ID number. The default value for this parameter is FALSE. |
go.direct |
if TRUE it restricts the transcript enrichment calculations for the GO (Gene Ontology) annotations only to directly annotated transcripts, without taking into account the ontological lattice and the subsuming relations inside Gene Ontology. Default value is FALSE, which means that, when calculating the transcript enrichment significance of a GO functional annotation, directly annotated transcripts are considered together with transcripts annotated by the directly subsumed terms within the ontological lattice. |
compare |
refers to the approach used for clustering highly co-expressed transcripts from available data needed in order to identify, compare and group functional annotations sharing a significant number of highly co-expressed transcripts. The default value is "common.correl.genes" which implies that a detailed analysis in search for shared co-expressed transcripts is performed (very expensive computationally and requiring enough microarray samples for correlation calculations on transcript expression profiles). If "common.genes" is selected this means that when comparing two functional annotations only the transcripts commonly annotated by the two terms are considered, without taking into account transcripts expression. If "correl.mean.exp" is selected the comparison of two functional annotations is based only on the correlations of their "mean expression profile" computed for each annotation as a vector of mean expression levels of annotated transcripts for each available microarray sample. |
corr.th |
it allows varying the correlation threshold used to search for and build clusters of highly co-expressed transcripts with the greedy approach. Default value based on currently available literature data is 0.85 corresponding to a Spearman correlation coefficient Rs > = 0.85. |
corr.met |
indicates the procedure to be used to build transcript expression clusters. It counts only if "compare" is set to "common.correl.genes". Two values are possible: "hierarchical" will use a hierarchical agglomerative procedure combined with Silhouette computing; "greedy" will use an original greedy clustering procedure conditioned by a correlation threshold specified by "corr.th" to assure homogeneity of clusters. |
clusterm |
is related to the algorithm used to group terms (biological annotations), having significant transcript enrichment within the analyzed data, in order to build functional classes of putatively co-regulated biological functions. Default value is "cc" and it should not be modified. |
alpha |
signifies the threshold of p-values significance (alpha) resulting from statistical calculations concerning transcript enrichment of biological annotations. Default value is 0.05. |
location |
allows to perform an analysis of the transcript enrichment of genome locations based on available genome location data (chromosome and cytoband transcript locations). If TRUE is selected it provides two lists, one containing chromosome transcript enrichment data and the other cytoband transcript enrichment data, separately for each list of analyzed transcripts. Default value is FALSE. |
details |
specifies if intermediary results (detailed annotation data) has to be saved. |
FunCluster can be used with the currently available R distributions (tested with distributions posterior to 2.0.0), either with Microsoft Windows operating environments (tested with Windows XP) or, better, with a Linux operating environment (tested with Fedora Core 3 and 4 and Suse Linux 10.0). Please be aware that FunCluster analysis implies a lot of computations and therefore high processing power and good stability of the operating system are absolute requirements.
Together with the FunCluster algorithm this package provide also:
1. GO and KEGG annotations (as of June 2008) automatically extracted from their respective web resources
2. The routine for the automated extraction and update of the functional annotations from their respective
web resources. The use of this routine is simple: annotations(date.annot = "")
. Under common
circumstances these routine will provide up-to-date annotations, stored into environmental variables,
directly formatted for FunCluster's use. Some errors may be seen when using this routine related to a
lack of availability of the GO annotations for the current month. In case of extraction errors,
explained most usually by a delay in updating GO web servers, the release date can be expressly
indicated (see annotations
).
3. The two test data sets used for the JBCB paper (see examples below). The first data set is related to the
dichotomous functional analysis of the genes specifically expressed within adipocytes and stroma
vascular fraction (SVF) cells, extracted from adipose tissue of morbidly obese subjects (see
submitted paper and cited reference for further details). Two lists of transcripts significantly
expressed within adipocytes and SVF cells respectively are provided together with the list of all
initial transcripts available for the analysis (necessary for the accurate computation of transcript
enrichment during automated annotation of transcript expression data performed by FunCluster).
The second data set is structured in a similar way and is containing the hyperinsulinemic muscle
clamp expression data.
The format of the data files should be respected in order to perform a successful analysis. All the files are
tab separated text files which can be easily obtained from Excel data. The only transcript
identification system acceptable for FunCluster analysis is EntrezGene GeneID's. Please see more
details on this choice in the JBCB paper. The transcript expression data within the tab
separated text files is organized within rows, one for each transcript, and columns with the
first one containing the transcript identifiers for each transcript and the rest of them
containing the expression level of that transcript in each of the available microarray samples.
See test data and JBCB paper for more details.
The results of the FunCluster analysis of transcript expression data are stored as HTML files in the "Results" subfolder of the working folder. For each type of available biological annotations and for each list of transcript expression data to be analyzed (one or two), FunCluster provides a ranked list with the significant functional clusters observed, stored within a separate file. Detailed findings on the terminological composition and transcript enrichment significance of the resulting functional clusters are provided.
1. Henegar C, Cancello R, Rome S, Vidal H, Clement K, Zucker JD. Clustering biological annotations and gene expression data to identify putatively co-regulated biological processes. J Bioinform Comput Biol. 2006 Aug;4(4) :833-52.
2. Cancello R, Henegar C, Viguerie N, Taleb S, Poitou C, Rouault C, Coupaye M, Pelloux V, Hugol D, Bouillot JL, Bouloumie A, Barbatelli G, Cinti S, Svensson PA, Barsh GS, Zucker JD, Basdevant A, Langin D, Clement K. Reduction of macrophage infiltration and chemoattractant gene expression changes in white adipose tissue of morbidly obese subjects after surgery-induced weight loss. Diabetes 2005; 54(8):2277-86.
3. FunCluster website: http://corneliu.henegar.info/FunCluster.htm
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | ## Not run:
## load adipose tissue data (see Diabetes and JBCB papers for details)
data(adipose)
## or load hyperinsulinemic muscle clamp data (see JBCB paper for details)
data(insulin)
## most common use
FunCluster(go.direct = FALSE, alpha = 0.05, clusterm = "cc",
org = "HS", location = FALSE, compare =
"common.correl.genes", corr.th = 0.85,
corr.met = "greedy", two.lists = TRUE,
restrict = TRUE)
## when only GO direct annotations are to be used and detailed
findings are needed
FunCluster(go.direct = TRUE, alpha = 0.05, clusterm = "cc",
org = "HS", location = FALSE, compare =
"common.correl.genes", corr.th = 0.85,
corr.met = "greedy", two.lists = TRUE,
restrict = TRUE, details = TRUE)
## hierarchical agglomerative clustering and Silhouette computations
can be used for the preliminary step of building clusters of
co-expressed transcripts
FunCluster(go.direct = TRUE, alpha = 0.05, clusterm = "cc",
org = "HS", location = FALSE, compare =
"common.correl.genes", corr.th = 0.85,
corr.met = "hierarchical",
two.lists = TRUE, restrict = TRUE)
## use only common annotated transcripts for the annotation clustering
FunCluster(go.direct = FALSE, alpha = 0.05, clusterm = "cc",
org = "HS", location = FALSE, compare =
"common.genes", two.lists = TRUE,
restrict = TRUE)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.