Description Usage Arguments Details Value Examples
getPPIs
identifies proteins in PubMed titles and abstracts and if a match is found returns matching information as data frame. The function searches among each sentence that contains two proteins using a user provided list of keywords for matches. If a match is found, the sentences containing the keywords and the matched keywords are appended to data frame.
1 | getPPIs(data, regex, getInteractionMatches, keywords)
|
data |
A data frame containing PubMed Ids, gene A and B symbols, synonyms list (each symbol separated by '|'), gene A and B names, and article title and abstract. |
regex |
a large data frame of symbols and regular expression patterns. |
getInteractionMatches |
- see specific function documentation |
keywords |
a list of keywords to identify PPIs. |
This function requires the packages stringr
. It also uses the getInteractionMacthes
and extractPOS
functions.
The function "getPPIs" returns a data frame with seven columns: pmid, article_title, article_abstract, matched_symbols, matched_setences, int_sentences, int_keywords, and match. If matched sentences do not contain any of the provided keywords the cell will contain "No keywords found in sentence" and or "No keywords". The data frame will contain the following columns: pmid-PubMed IDs; article_title-PubMed article title; article_abstract-PubMed article abstract sentences separated by '|'; matched_symbols-A list of identified protein symbols separated by '|'; matched_sentences-Sentences from abstract containing a matched symbol, separated by '|'; int_sentences-Sentences from abstract containing a matched symbol and one or more of the keywords, separated by '|'; int_keywords-Matched keywords from int_sentences, separated by '|'; match-A '0' or '1' to indicate articles where gene A and B where identified
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | ## interaction keywords
keywords <- c("bind",
"interact",
"associate",
"regulation",
"bound",
"localize",
"stimulation",
"regulate",
"effect",
"target",
"component",
"member",
"mediate")
## loop over abstracts - get subset of the data for testing
PPI_results <- getPPIs(merged_biogrid_pubmed_results, patterns, getInteractionMatches, keywords)
## write out sentences
write.table(PPI_results, "PPI_BIOGRID_results.txt", quote = FALSE, sep = '\t', col.names = TRUE, row.names = FALSE)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.