cluster_analysis: Function to perform Kmeans or Hierarchical clustering...

Description Usage Arguments Value Author(s) See Also Examples

View source: R/cluster_analysis.R

Description

Function to perform Kmeans or Hierarchical clustering analysis of the selected gene probe expression data.

Usage

1
2
3
4
5
cluster_analysis(sel.exp, cluster_type = "HClust", seed = NULL,
  distance = "euclidean", linkage_type = "ward.D2",
  gene_distance = "correlation", num_clusters, data_name,
  probe_rank = "SD_Rank", probe_num_selection = "Fixed_Probe_Num",
  cluster_num_selection = "Fixed_Clust_Num")

Arguments

sel.exp

Object containing the numeric selected gene expression matrix. This object is an output of the probe_ranking function.

cluster_type

String indicating the type of clustering method to use. "Kmeans" or "HClust" are the two options. The default is set to "HClust".

seed

A positive integer vector >1 indicating a random starting position for the centers of the clusters to be used for the k-means clustering algorithm. The default value is set to NULL and is not used when Hierarchical clustering is chosen.

distance

String describing the distance metric to use for the dist function during hierarchical clustering. dist uses a default distance metric of Euclidean distance. Options include one of "euclidean", "maximum", manhattan", "canberra", "binary", or "minkowski". Kmeans clustering does not use a distance metric. The default value is set to "euclidean".

linkage_type

String describing the linkage metric to be used for HClust. The default is set to "ward.D2", however other options include "average", "complete", "median", "centroid", "single", and "mcquitty".

gene_distance

String describing the distance measure to be used for the Dist function when performing hierarchical clustering of genes. Options include one of "euclidean", "maximum", "manhattan", "canberra", "binary", "pearson", "abspearson", "correlation", "abscorrelation", "spearman" or "kendall". The default of gene_distance is set to "correlation". The deafult value is set to "correlation". The argument can be set to NULL when Kmeans clustering is used.

num_clusters

Positive integer to specify the number of clusters samples will be divided into. This number is determined by the number_clusters function.

data_name

String indicating the cancer type and name of the dataset being analyzed. This name will be used to label the sample dendrograms and heatmap files.

probe_rank

String indicating the feature selection method used in the probe_ranking function. Options include "CV_Rank", "CV_Guided", "SD_Rank", and "Poly".

probe_num_selection

String indicating the way in which probes were selected in the number_probes function. Options include "Fixed_Probe_Num", "Percent_Probe_Num", and "Adaptive_Probe_Num".

cluster_num_selection

String indicating how the number of clusters were determined in the number_clusters function. Options include "Fixed_Clust_Num" and "Gap_Statistic".

Value

Returns a vector containing the sample information and respective cluster number. In addition, this function outpus sample cluster dendrogams, average expression for each probe in each cluster, and heatmap images and Java TreeView files for HClust dendrograms.

Author(s)

Nathan Lawlor, Alec Fabbri

See Also

probe_ranking, number_clusters, number_probes, hclust, kmeans, dist

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# Example 1: HClust Analysis
# Load in a data file
data_file <- system.file("extdata", "GSE2034.normalized.expression.txt",
    package="multiClust")
data <- input_file(input=data_file)
# Choose 300 genes to select for
gene_num <- number_probes(input=data_file, data.exp=data, Fixed=300,
    Percent=NULL, Adaptive=NULL)
# Choose the "CV_Rank" Method for gene ranking
sel.data <- probe_ranking(input=data_file, probe_number=300,
    probe_num_selection="Fixed_Probe_Num", data.exp=data, method="CV_Rank")
# Choose a fixed cluster number of 3
clust_num <- number_clusters(data.exp=data, Fixed=3, gap_statistic=NULL)

# Call function using HClust parameters
hclust_analysis <- cluster_analysis(sel.exp=sel.data, cluster_type="HClust",
    seed = NULL, distance="euclidean", linkage_type="ward.D2",
    gene_distance="correlation", num_clusters=3,
    data_name="GSE2034 Breast", probe_rank="CV_Rank",
    probe_num_selection="Fixed_Probe_Num",
    cluster_num_selection="Fixed_Clust_Num")

# Example 2: Kmeans Analysis
# Call function for Kmeans parameters
kmeans_analysis <- cluster_analysis(sel.exp=sel.data, cluster_type="Kmeans",
    seed = 1, distance=NULL, linkage_type=NULL, gene_distance=NULL,
    num_clusters=3, data_name="GSE2034 Breast",
    probe_rank="CV_Rank", probe_num_selection="Fixed_Probe_Num",
    cluster_num_selection="Fixed_Clust_Num")

multiClust documentation built on Nov. 8, 2020, 5:23 p.m.