getPPIs: Find Protein-Protein Interactions

Description Usage Arguments Details Value Examples

Description

getPPIs identifies proteins in PubMed titles and abstracts and if a match is found returns matching information as data frame. The function searches among each sentence that contains two proteins using a user provided list of keywords for matches. If a match is found, the sentences containing the keywords and the matched keywords are appended to data frame.

Usage

1

Arguments

data

A data frame containing PubMed Ids, gene A and B symbols, synonyms list (each symbol separated by '|'), gene A and B names, and article title and abstract.

regex

a large data frame of symbols and regular expression patterns.

getInteractionMatches

- see specific function documentation nlpUtilityBelt::getInteractionMatches.

keywords

a list of keywords to identify PPIs.

Details

This function requires the packages stringr. It also uses the getInteractionMacthes and extractPOS functions.

Value

The function "getPPIs" returns a data frame with seven columns: pmid, article_title, article_abstract, matched_symbols, matched_setences, int_sentences, int_keywords, and match. If matched sentences do not contain any of the provided keywords the cell will contain "No keywords found in sentence" and or "No keywords". The data frame will contain the following columns: pmid-PubMed IDs; article_title-PubMed article title; article_abstract-PubMed article abstract sentences separated by '|'; matched_symbols-A list of identified protein symbols separated by '|'; matched_sentences-Sentences from abstract containing a matched symbol, separated by '|'; int_sentences-Sentences from abstract containing a matched symbol and one or more of the keywords, separated by '|'; int_keywords-Matched keywords from int_sentences, separated by '|'; match-A '0' or '1' to indicate articles where gene A and B where identified

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
## interaction keywords
keywords <- c("bind", 
             "interact",
             "associate",
             "regulation",
             "bound",
             "localize",
             "stimulation",
             "regulate",
             "effect",
             "target",
             "component",
             "member",
             "mediate")

## loop over abstracts - get subset of the data for testing
PPI_results <- getPPIs(merged_biogrid_pubmed_results, patterns, getInteractionMatches, keywords)

## write out sentences
write.table(PPI_results, "PPI_BIOGRID_results.txt", quote  = FALSE, sep = '\t', col.names = TRUE, row.names = FALSE)

andreysoares/nlpUtilityBelt documentation built on May 6, 2019, 8:57 p.m.