siteco: siteco

Description

Read and preprocess data, and calculate the pairwise site co-occurrence in one step

Usage

1
2
3
siteco(dataFile = "", dataType = "protein", conservativeFilter = 0.95,
  cooccurFilter = NULL, siteCoFile = "siteCooccurr", sampleTimes = 100,
  debug = FALSE, parallel = FALSE, memory = NULL)

Arguments

dataFile

character, a FASTA data file name with full path.

dataType

character, 'protein' by default, the type of data will be processed. It could be 'DNA', 'RNA', 'protein', 'SNP' or 'other'.

conservativeFilter

numeric, a number in the range of 0~1, 0.95 by default. It's used to filter the highly conservative columns which the ratio of some residue is larger than the conservationFilter.

cooccurFilter

numeric, a number in the range of 0~1. It determines whether two columns are perfect co-occurrence. In default, for the data type of protein, it is set to be 0.9, while for the other data types, it is set to be 1.

siteCoFile

character, 'siteCooccurr' by default. It is a file name with full path for storing the RCOS between all pairs of columns, and the related p-values.

sampleTimes

numeric, an integer of permutations in the simulation when calculating the p-values. It should be greater than 100.

debug

logic, FALSE by default, indicates whether the debug message will be displayed or not.

parallel

logic, FALSE by default. It only supports Unix/Mac (not Windows) system.

memory

character, the type of matrix, NULL by default. It could be 'memory' or 'sparse'. If it's set to be 'memory', all data would be manipulated in the RAM by using normal matrix and package 'bigmemory'. If it's set to be 'sparse', the package "Matrix" would be used to manipulate massive matrices in memory and initialize huge sparse matrix, which could significantly reduce the RAM consumed. In default, it is set to be NULL, so that the system would determine automatically whether all data is manipulated in the RAM or not, according to the size of data inputted and the RAM available for R.

Value

list, the output file path of 'sitecoFile' is attributed in it. The file stores all the pairwise siteCos between columns.

References

Du, X., Wang, Z., Wu, A., Song, L., Cao, Y., Hang, H., & Jiang, T. (2008). Networks of genomic co-occurrence capture characteristics of human influenza A (H3N2) evolution. Genome research, 18(1), 178-187. doi:10.1101/gr.6969007

Examples

1
#pairwiseCooccur = siteco(dataFile=getexample(dataType="protein"), dataType="protein")

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

All documentation is copyright its authors; we didn't write any of that.