get_matched_data: Filtering data to matched predictions
In pKSEA: Prediction-Based Kinase-Substrate Enrichment Analysis

This function reformats summary statistic phosphoproteomicdata to single observations for each phosphorylation site, duplicating other fields for multiple sites on the same peptide. Next, it attempts to find predictions for each phosphorylation site in the provided database. It returns observations (phosphorylation sites) for which a prediction is detected in the database, matching based on HUGO gene name and phosphorylated residue.

1	get_matched_data(datafull, predictionDB)

datafull

Statistical summary data with an entry for each phosphopeptide. Required columns: GN = gene name identifier that will be matched with prediction database, Peptide = unique peptide identifier (for example, sequence with modifications), Phosphosites = comma-separated phosphorylation sites (eg. "T102,S105"), pval= pairwise test p-value, fc= mean fold change, t= pairwise test t-statistic. pval and fc are used for results reporting only, all others are important for database searching, calculation, and permutation testing.

predictionDB

Input database whose prediction scores will be used for calculations. Required columns: substrate_name= name of substrate corresponding to GN in datafull, kinase_id = identifiers for kinase predictors, position= phosphorylated residue number, score = numeric score for strength of prediction.

#Read in example summary statistics dataset from csv
summarydata_ex <- read.csv(system.file("extdata", "example_data1.csv", package="pKSEA"))

#Get matched data using predictions from NetworKIN
matched_data_ex <- get_matched_data(summarydata_ex, NetworKINPred_db)