pandaPy: Run Python implementation PANDA in R

View source: R/PANDA.R

pandaPyR Documentation

Run Python implementation PANDA in R

Description

PANDA(Passing Attributes between Networks for Data Assimilation) is a message-passing model to reconstruct gene regulatory network, which integrates multiple sources of biological data-including protein-protein interaction data, gene expression data, and transcription factor binding motifs data to reconstruct genome-wide, condition-specific regulatory networks. [(Glass et al. 2013)]) This function is designed to run the a derived PANDA implementation in Python Library "netZooPy" netZooPy.

Usage

pandaPy(
  expr_file,
  motif_file = NULL,
  ppi_file = NULL,
  computing = "cpu",
  precision = "double",
  save_memory = FALSE,
  save_tmp = TRUE,
  keep_expression_matrix = FALSE,
  modeProcess = "union",
  remove_missing = FALSE,
  with_header = FALSE
)

Arguments

expr_file

Character string indicating the file path of expression values file, with each gene(in rows) across samples(in columns).

motif_file

An optional character string indicating the file path of a prior transcription factor binding motifs dataset. When this argument is not provided, analysis will continue with Pearson correlation matrix.

ppi_file

An optional character string indicating the file path of protein-protein interaction edge dataset. Also, this can be generated with a list of proteins of interest by sourcePPI.

computing

'cpu' uses Central Processing Unit (CPU) to run PANDA; 'gpu' use the Graphical Processing Unit (GPU) to run PANDA. The default value is "cpu".

precision

'double' computes the regulatory network in double precision (15 decimal digits); 'single' computes the regulatory network in single precision (7 decimal digits) which is fastaer, requires half the memory but less accurate. The default value is 'double'.

save_memory

'TRUE' removes temporary results from memory. The result network is weighted adjacency matrix of size (nTFs, nGenes); 'FALSE' keeps the temporary files in memory. The result network has 4 columns in the form gene - TF - weight in motif prior - PANDA edge. PANDA indegree/outdegree of panda network, only if save_memory = FALSE. The default value is 'FALSE'.

save_tmp

'TRUE' saves middle data like expression matrix and normalized networks; 'FALSE' deletes the middle data. The default value is 'TURE'.

keep_expression_matrix

'TRUE' keeps the input expression matrix as an attribute in the result Panda object.'FALSE' deletes the expression matrix attribute in the Panda object. The default value is 'FALSE'.

modeProcess

'legacy' refers to the processing mode in netZooPy<=0.5, 'union': takes the union of all TFs and genes across priors and fills the missing genes in the priors with zeros; 'intersection': intersects the input genes and TFs across priors and removes the missing TFs/genes. Default values is 'union'.

remove_missing

Only when modeProcess='legacy': remove_missing='TRUE' removes all unmatched TF and genes; remove_missing='FALSE' keeps all tf and genes. The default value is 'FALSE'.

with_header

Boolean to read gene expression file with a header for sample names

Value

When save_memory=FALSE(default), this function will return a list of three items: Use $panda to access the standard output of PANDA as data frame, which consists of four columns: "TF", "Gene", "Motif" using 0 or 1 to indicate if this edge belongs to prior motif dataset, and "Score".

Use $indegree to access the indegree of PANDA network as data frame, which consists of two columns: "Gene", "Score".

Use $outdegree to access the outdegree of PANDA network as data frame, which consists of two columns: "TF", "Score".

When save_memory=TRUE, this function will return a weigheted adjacency matirx of size (nTFs, nGenes), use $WAMpanda to access.

Examples

# take the treated TB dataset as example here.
# refer to the datasets files path in inst/extdat

treated_expression_file_path <- system.file("extdata", "expr4_matched.txt", 
package = "netZooR", mustWork = TRUE)
treated_expression_file_path <- system.file("extdata", "expr4_matched.txt",
 package = "netZooR", mustWork = TRUE)
motif_file_path <- system.file("extdata", "chip_matched.txt", package = "netZooR", mustWork = TRUE)
ppi_file_path <- system.file("extdata", "ppi_matched.txt", package = "netZooR", mustWork = TRUE)


# Run PANDA for treated and control network

treated_all_panda_result <- pandaPy(expr_file = treated_expression_file_path, 
motif_file = motif_file_path, ppi_file = ppi_file_path, 
modeProcess="legacy", remove_missing = TRUE )

# access PANDA regulatory network
treated_net <- treated_all_panda_result$panda

# access PANDA regulatory indegree network.
indegree_net <- treated_all_panda_result$indegree

# access PANDA regulatory outdegree networks
outdegree_net <- treated_all_panda_result$outdegree



netZoo/netZooR documentation built on Oct. 16, 2024, 10:23 p.m.