knitr::opts_chunk$set(echo = TRUE) library(NetFeaturePval)
These are interactions of viral and human proteins along with protein domain annotations. The aim of the analysis is to predict domains in human proteins likely to bind viral proteins. More details and be found in a paper.
data = fread("../data/viral_human_net_w_domains", sep = "\t", stringsAsFactors = F)
First, let's use multi-core parallelisation. Correct analysis would require many more premutations but this is just an example of how to use this package.
res = permutationPval(interactions2permute = IDs_interactor_viral ~ IDs_interactor_human, associations2test = IDs_interactor_viral ~ IDs_domain_human, # node_attr gives a way to add columns needed to compute statistic and filter the data, # in this case only domain_count is needed to filter # and node columns to compute statistic node_attr = list(IDs_interactor_viral ~ IDs_interactor_viral_degree, IDs_domain_human ~ domain_count, IDs_interactor_viral + IDs_domain_human ~ domain_frequency_per_IDs_interactor_viral), data = data, # in this example statistic is just count of # IDs_interactor_human for each IDs_interactor_viral / IDs_domain_human pair statistic = IDs_interactor_viral + IDs_domain_human ~ .N, select_nodes = IDs_domain_human ~ domain_count >= 1, N = 1, # number of permutations cores = 1, # how many cores to use on a local machine # computations are split into inner and outer replicate # to help manage memory load, here it is 50*2 # hint: you can use microbenchmark package to find optimal value split_comp_inner_N = 1, seed = 2) # seed for reproducible sampling (permutations)
Next, let's try multi-node parallelisation (on a computing cluster). Default installation of clustermq will still work but run multiple processes on a local machine.
res = permutationPval(interactions2permute = IDs_interactor_viral ~ IDs_interactor_human, associations2test = IDs_interactor_viral ~ IDs_domain_human, node_attr = list(IDs_domain_human ~ domain_count), data = data, statistic = IDs_interactor_viral + IDs_domain_human ~ .N, select_nodes = IDs_domain_human ~ domain_count >= 1, N = 30, clustermq = T, # use clustermq clustermq_jobs = 3, # how many cluster jobs to start # how much memory each job needs clustermq_mem = 2000, # caution: allocating more memory than needed is a waste of resources, # but allocating not enough will result in you jobs being killed, # in turn, this will freese R requiring restart # hint: try 2-3 jobs with realistic load and excess of memory split_comp_inner_N = 2, seed = 2)
res
plot(res)
Sys.Date. = Sys.Date() Sys.Date. session_info. = devtools::session_info() session_info.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.