Linnorm-gene correlation network analysis.

Description

This function first performs Linnorm transformation on the dataset. Then, it will perform correlation network analysis on the dataset.

Usage

1
2
3
4
5
6
7
Linnorm.Cor(datamatrix, input = "Raw", method = "pearson",
  showinfo = FALSE, perturbation = 10, minZeroPortion = 2/3,
  sig.q = 0.05, plotNetwork = TRUE, plotNumPairs = 5000, plotdegree = 0,
  plotname = "networkplot", plotformat = "png", plotVertexSize = 1,
  plotFontSize = 1, plot.Pos.cor.col = "red", plot.Neg.cor.col = "green",
  vertex.col = "cluster", plotlayout = "kk",
  clusterMethod = "cluster_edge_betweenness")

Arguments

datamatrix

The matrix or data frame that contains your dataset. Each row is a feature (or Gene) and each column is a sample (or replicate). Raw Counts, CPM, RPKM, FPKM or TPM are supported. Undefined values such as NA are not supported. It is not compatible with log transformed datasets. If a Linnorm transfored dataset is being used, please set the "input" argument into "Linnorm".

input

Character. "Raw" or "Linnorm". In case you have already transformed your dataset with Linnorm, set input into "Linnorm" so that you can input the Linnorm transformed dataset into the "datamatrix" argument. Defaults to "Raw".

method

Character. "pearson", "kendall" or "spearman". Method for the calculation of correlation coefficients. Defaults to "pearson"

showinfo

Logical. Show lambda value calculated. Defaults to FALSE.

perturbation

Integer >=2. To search for an optimal minimal deviation parameter (please see the article), Linnorm uses the iterated local search algorithm which perturbs away from the initial local minimum. The range of the area searched in each perturbation is exponentially increased as the area get further away from the initial local minimum, which is determined by their index. This range is calculated by 10 * (perturbation ^ index).

minZeroPortion

Double >=0, <= 1. For example, setting minZeroPortion as 0.5 will remove genes with more than half data values being zero in the calculation of normalizing parameter. Since this test is based on correlation coefficient, which requires more non-zero values, it is suggested to set it to a larger value. Defaults to 2/3.

sig.q

Double >=0, <= 1. Only gene pairs with q values less than this threshold will be included in the "Results" data frame. Defaults to 0.05.

plotNetwork

Logical. Should the program output the network plot to a file? An "igraph" object will be included in the output regardless. Defaults to TRUE.

plotNumPairs

Integer >= 50. Number of gene pairs to be used in the network plot. Defaults to 5000.

plotdegree

Integer >= 0. In the network plot, genes (vertices) without at least this number of degree will be removed. Defaults to 0.

plotname

Character. Name of the network plot. File extension will be appended to it. Defaults to "networkplot".

plotformat

Character. "pdf" or "png". Network plot output format. Defaults to "png".

plotVertexSize

Double >0. Controls vertex Size in the network plot. Defaults to 1.

plotFontSize

Double >0. Controls font Size in the network plot. Defaults to 1.

plot.Pos.cor.col

Character. Color of the edges of positively correlated gene pairs. Defaults to "red".

plot.Neg.cor.col

Character. Color of the edges of negatively correlated gene pairs. Defaults to "green".

vertex.col

Character. "cluster" or a color. This controls the color of the vertices. Defaults to "cluster".

plotlayout

Character. "kk" or "fr". "kk" uses Kamada-Kawai algorithm in igraph to assign vertex and edges. It scales edge length with correlation strength. However, it can cause overlaps between vertices. "fr" uses Fruchterman-Reingold algorithm in igraph to assign vertex and edges. It prevents overlatps between vertices better than "kk", but edge lengths are not scaled to correlation strength. Defaults to "kk".

clusterMethod

Character. "cluster_edge_betweenness", "cluster_fast_greedy", "cluster_infomap", "cluster_label_prop", "cluster_leading_eigen", "cluster_louvain", "cluster_optimal", "cluster_spinglass" or "cluster_walktrap". These are clustering functions from the igraph package. Defaults to "cluster_edge_betweenness".

Details

This function performed gene correlated study in the dataset by using Linnorm transformation.

Value

This function will output a list with the following objects:

  • Results: A data frame containing the results of the analysis, showing only the significant results determined by "sig.q" (see below).

  • Cor.Matrix: The resulting correlation matrix between each gene.

  • q.Matrix: A matrix of q values of each of the correlation coefficient from Cor.Matrix.

  • Cluster: A data frame that shows which gene belongs to which cluster.

  • igraph: The igraph object for users who want to draw the network plot manually.

  • Linnorm: Linnorm transformed and filtered data matrix.

The "Results" data frame has the following columns:

  • Gene1: Name of gene 1.

  • Gene2: Name of gene 2.

  • XPM1: Gene 1 average expression level in XPM. If input is raw counts or CPM, this column is in CPM unit. If input is RPKM, FPKM or TPM, this column is in the TPM unit.

  • XPM2: Gene 2 average expression level in XPM. If input is raw counts or CPM, this column is in CPM unit. If input is RPKM, FPKM or TPM, this column is in the TPM unit.

  • Cor: Correlation coefficient between the two genes.

  • p.value: p value of the correlation coefficient.

  • q.value: q value of the correlation coefficient.

Examples

1
2
3
data(Islam2011)
#Analysis on Islam2011 embryonic stem cells
results <- Linnorm.Cor(Islam2011[,1:48])