xina_clustering: xina_clustering

Description Usage Arguments Value Examples

View source: R/clustering.R

Description

Clustering multiplexed time-series omics data to find co-abundance profiles

Usage

1
2
xina_clustering(f_names, data_column, out_dir = getwd(),
  nClusters = 20, norm = "sum_normalization", chosen_model = "")

Arguments

f_names

A vector containing input file (.csv) paths

data_column

A vector containing column names (1st row of the input file) of data matrix

out_dir

A directory path for saving clustering results. (default: out_dir=getwd())

nClusters

The number of desired maximum clusters

norm

Default is "sum_normalization". Sum-normalization is to divide the data matrix by row sum. If you want to know more about sum-normalization, see https://www.ncbi.nlm.nih.gov/pubmed/19861354. "zscore" is to calculate Z score for each protein. See scale.

chosen_model

You can choose a specific model rather than testing all the models that are available in mclust. mclustModelNames If you want k-means clustering instead of the model-based clustering, use "kmeans" here.

Value

a plot containing a BIC plot in current working directory and a list containing below information:

Item Description
clusters XINA clustering results
aligned XINA clustering results aligned by ID
data_column Data matrix column names
out_dir The directory path containing XINA results
nClusters The number of clusters desired by user
max_cluster The number of clusters optimized by BIC
chosen_model The used covariance model for model-based clustering
optimal_BIC BIC of the optimized covariance model
condition Experimental conditions of the user input data
color_for_condition Colors assigned to each experimental conditions which is used for condition composition plot
color_for_clusters Colors assigned to each clusters which is used for XINA clustering plot
norm_method Used normalization method

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# Generate random multiplexed time-series data
random_data_info <- make_random_xina_data()

# Data files
data_files <- paste(random_data_info$conditions, ".csv", sep='')

# time points of the data matrix
data_column <- random_data_info$time_points

# mclust requires the fixed random seed to get reproduce the clustering results
set.seed(0)

# Run the model-based clustering to find co-abundance profiles
example_clusters <- xina_clustering(data_files, data_column=data_column,
nClusters=30)

# Run k-means clustering to find co-abundance profiles
example_clusters <- xina_clustering(data_files, data_column=data_column,
nClusters=30,
chosen_model="kmeans")

langholee/XINA documentation built on March 17, 2020, 5:23 p.m.