coocnet

Description

Read and preprocess data, and construct the co-occurrence network in one step.

Usage

1
2
3
4
5
coocnet(dataFile = "", dataType = "DNA", conservativeFilter = 0.95,
  cooccurFilter = NULL, networkFile = "cooccurNetwork", module = FALSE,
  moduleFile = "cooccurNetworkModule", property = FALSE,
  propertyFile = "cooccurNetworkProperty", siteCo = FALSE,
  siteCoFile = "siteCooccurr", sampleTimes = 100, debug = FALSE)

Arguments

dataFile

file name with full path of DNA, protein, SNP data or other kinds of data

dataType

the type of data. It could be 'DNA' (default), 'protein', 'SNP' or 'other'

conservativeFilter

0.95 by default. a number in the range of 0~1. The column with conservative score greater than it would be filtered in the later analyses;

cooccurFilter

a number in the range of 0~1. It determines whether two columns are perfect co-occurrence;

networkFile

file name with full path for storing the co-occurrence network for each row;

module

FALSE by default. If it is set to be TRUE, the modules in each network of the networkFile would be calculated.

moduleFile

file name with full path for storing the modules for co-occurrence network;

property

FALSE by default. If it is set to be TRUE, the properties for each network of the networkFile, including the network diameter, connectivity, ConnectionEffcient and so on, would be calculated.

propertyFile

character, file name with full path for storing the modules for co-occurrence network;

siteCo

FALSE by default. If it is set to be TRUE, the extent of co-occurrence between all pairs of columns would be calculated. It is defined as the ratio of rows with perfect co-occurrence.

siteCoFile

file name with full path for storing the extent of co-occurrence between all pairs of columns, and the related p-values. The later are calculated by simulations as follows firstly, all columns in the data are randomly permutated; then, the pairwise siteCos are calculated. This process would be repeated N times (the value depends on the parameter sampleTimes). For each pair of columns, the rank of the original siteCo in the N siteCos derived from simulations are considered as the p-value for the original siteCo.

sampleTimes

a integer of permutations in the simulation when calculating the p-values.

debug

FALSE by default; to indicate whether the debug message will be displayed or not

Value

a list and all the output file paths are attributed in it.
The attribute "networkFile" stores the co-occurrence network for each row;
The attribute "moduleFile" is optional. When the module is set to be TRUE, it would be output. It stores the modules for co-occurrence network;
The attribute "propertyFile" is optional. When the property is set to be TRUE, it would be output. It stores the properties for co-occurrence network;
The attribute "siteCoFile" is optional. When the property is set to be TRUE, it would be output. It stores all the pairwise siteCos between columns.

References

Du, X., Wang, Z., Wu, A., Song, L., Cao, Y., Hang, H., & Jiang, T. (2008). Networks of genomic co-occurrence capture characteristics of human influenza A (H3N2) evolution. Genome research, 18(1), 178-187. doi:10.1101/gr.6969007

Examples

1
cooccurNetwork  = coocnet(dataFile=getexample(dataType="protein"), dataType="protein")