siteco

Description

Read and preprocess data, and calculate the pairwise site co-occurrence in one step

Usage

1
2
3
siteco(dataFile = "", dataType = "DNA", conservativeFilter = 0.95,
  cooccurFilter = NULL, siteCoFile = "siteCooccurr", sampleTimes = 100,
  debug = FALSE)

Arguments

dataFile

file name with full path of DNA, protein, SNP data or other kinds of data

dataType

the type of data. It could be 'DNA' (default), 'protein', 'SNP' or 'other'

conservativeFilter

0.95 by default. a number in the range of 0~1. The column with conservative score greater than it would be filtered in the later analyses;

cooccurFilter

a number in the range of 0~1. It determines whether two columns are perfect co-occurrence;

siteCoFile

file name with full path for storing the extent of co-occurrence between all pairs of columns, and the related p-values. The later are calculated by simulations as follows firstly, all columns in the data are randomly permutated; then, the pairwise siteCos are calculated. This process would be repeated N times (the value depends on the parameter sampleTimes). For each pair of columns, the rank of the original siteCo in the N siteCos derived from simulations are considered as the p-value for the original siteCo.

sampleTimes

a integer of permutations in the simulation when calculating the p-values.

debug

FALSE by default; to indicate whether the debug message will be displayed or not

Value

a list and the output file path of 'sitecoFile' is attributed in it. The file stores all the pairwise siteCos between columns.

References

Du, X., Wang, Z., Wu, A., Song, L., Cao, Y., Hang, H., & Jiang, T. (2008). Networks of genomic co-occurrence capture characteristics of human influenza A (H3N2) evolution. Genome research, 18(1), 178-187. doi:10.1101/gr.6969007

Examples

1
#pairwiseCooccur = siteco(dataFile=getexample(dataType="protein"), dataType="protein")