The purpose of this software is to provide an R implementation of the methodology proposed by Tasan et al. (2015). Specifically, this method seeks to identify Prix Fixe (i.e., fixed price) dense subnetworks of genes in linkage disequillibrium with disease-associated tagSNPs. These subnetworks are identified heuristically using a simple genetic algorithm.
This package is available for download on github and was built under R version 3.6.1. The easiest way to accomplish this is to install the package from the R console:
library(devtools)
install_github("princeew/PFFindR")
Alternatively, you can clone the repository and manually install the package yourself.
cd <location/to/store/package>
git clone princeew/PFFindR
R -e "install.packages("./PFFindR", repos=NULL, source=TRUE)"
NOTE: If you are only interested in running the example script to test the software, the PFFindR package will be automatically installed, executed, and uninstalled. This behavior can be modified using the command line options (see Command Line Usage Example below).
To uninstall PFFindR run remove.packages("PFFindR")
from your R console.
PFFindR requires two separate input files. The first is a GMT file containing the loci information. It is expected that the first column is a metalabel, the second column is a locus ID, and every subsequent column are genes for each locus. It is expected that there is no header information in this file.
The second file is the interaction information and it is expected to be a tab- delimited file with the first two columns giving the gene interaction pairs and the third column is the weight for the interaction. It is expected that there is no header information in this file.
usage: ./PFFindR/scripts/PFFind.R [-h] [--use_FA_data] [--loci_data LOCI_DATA]
[--cfn_data CFN_DATA] [--population_size POPULATION_SIZE]
[--weighted_network] [--mutation_rate MUTATION_RATE]
[--optimizer_stop_threshold OPTIMIZER_STOP_THRESHOLD]
[--plot_fitness PLOT_FITNESS]
[--plot_fitness_path PLOT_FITNESS_PATH]
[--save_gene_scores SAVE_GENE_SCORES]
[--gene_scores_path GENE_SCORES_PATH]
[--null_trials NULL_TRIALS] [--num_to_save NUM_TO_SAVE]
[--save_network SAVE_NETWORK] [--network_path NETWORK_PATH]
[--random_seed RANDOM_SEED] [--uninstall_after]
optional arguments:
-h, --help show this help message and exit
--use_FA_data Use the example PF_FanconiAnemia dataset. Overrides
--loci_data and --cfn_data [default "False"]
--loci_data LOCI_DATA
Path to tab delimented loci data (i.e., a GMT file).
--cfn_data CFN_DATA Path to tab delimted co-function network data.
--population_size POPULATION_SIZE
The number of subnetworks to consider in a population.
[default "10"]
--weighted_network Use weighted edges instead of binary. [default
"False"]
--mutation_rate MUTATION_RATE
The percent (in decimal form) for which to mutate loci
in networks. [default "0.05"]
--optimizer_stop_threshold OPTIMIZER_STOP_THRESHOLD
The minimum percent improvement to continue
optimization. [default "0.05"]
--plot_fitness PLOT_FITNESS
Whether to plot the fitness during evolution.
--plot_fitness_path PLOT_FITNESS_PATH
File path for where to store optimizer performance
plot. [default "./PFFindR_Search_Performance.png"]
--save_gene_scores SAVE_GENE_SCORES
Whether to save gene scores or not. [default "True"]
--gene_scores_path GENE_SCORES_PATH
File path for where to store gene scores as GMT file.
[default "./Day3_Output.gmt"]
--null_trials NULL_TRIALS
The number of random trials to evaluate for the null
distribtuion to determine population significance.
[default "50"]
--num_to_save NUM_TO_SAVE
The number of networks to save to disk. [default "10"]
--save_network SAVE_NETWORK
Boolean whether to save network findings. [default
"True"]
--network_path NETWORK_PATH
File path for where to save networks. Note: using the
template *_Network_pval.txt will result in a modified
file name with network ID number and p-value for
population. [default "./Day3_Output_Network_pval.txt"]
--random_seed RANDOM_SEED
random number generator seed (for reproducibility).
[default "42"]
--uninstall_after Uninstall PFFindR after running this example. [default
"True"]
There are three types of outputs generated by PFFindR:
Fanconi anemia locus 0 Locus for PALB2 NUPR1 NA CTB-134H23.2 NA Fanconi anemia locus 2 Locus for RAD51C, BRIP1 PRKCA 3 FTSJ3 NA TBX4 NA
num_to_save
variable above). For example:ZNF788 TSSK4 0.182120 RUVBL2 RRAGC 0.667000 SNRNP70 BRIX1 0.286122
library(PFFindR)
# Load a PFData object
pf_data <- PFDataLoader(
"../app/preloaded_data/FA_loci.txt",
"../app/preloaded_data/Day3_STRING.txt")
# Run Optimizer
## This example will save all values to files.
population <- findPFNetworks(pf_data,
population_size = 500,
binary = FALSE,
mutation_rate = 0.05,
optimizer_stop_threhsold = 0.05,
plot_fitness = TRUE,
plot_fitness_path = "./pffinder_search_performance.png",
save_gene_scores = TRUE,
gene_scores_path ="./Day3_Output.gmt",
p_val_trials = 100,
num_to_save = 10,
save_network = TRUE,
network_path = "./Day3_Output_Network_pval.txt")
# Save the workspace
save.image()
Rich documentation exists for all major functions in the PFFindR package and can
be accessed using the native R methodology. For example, the initializePopulation
function documentation can be obtained using either of the following two
(equivalent) commands within the R console:
help(initializePopulation)
?initializePopulation
This package comes preloaded with an example dataset built using 12 loci, each
comprised of up to 46 genes, that are associated with the disease Fanconi Anemia,
and a co-function network built using the STRING database.
It can be accessed by the handle PF_FanconiAnemia
.
Please report any issues, comments, or questions to Eric Prince via email Eric.Prince@CUAnschutz.edu, or file an issue.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.