README.md

WARNING: repository is outdated

This repository was published for review purposes and is now only useful for replicating the published results. Please see http://github.com/vanatteveldt/rsyntax for an updated version of the module.

Installation

You can install directly from github:

library(devtools)
install_github("anon-author/clauses")

Usage

The functions in this module assume that you have a list of tokens in a data frame. A simple example is provided with the module:

library(rsyntax)
data(example_tokens)
tokens
word parent sentence coref pos entity lemma relation offset aid id pos1 attack John 2 1 1 NNP PERSON John nsubj 0 156884180 1 M FALSE says NA 1 NA VBZ say 5 156884180 2 V FALSE that 5 1 NA IN that mark 10 156884180 3 P FALSE Mary 5 1 NA NNP PERSON Mary nsubj 15 156884180 4 M FALSE hit 2 1 NA VBD hit ccomp 20 156884180 5 V FALSE him 5 1 1 PRP he dobj 24 156884180 6 O FALSE

Get the text of a sentence, optionally specifying which column(s) to use:

get_text(tokens)

## [1] "John says that Mary hit him"

get_text(tokens, word.column = c("lemma", "pos"))

## [1] "John/NNP say/VBZ that/IN Mary/NNP hit/VBD he/PRP"

Plot the syntactic structure of a sentence: (Note: if you have multiple sentences in one token list, you should filter it or provide a sentence= argument)

g = graph_from_sentence(tokens)
plot(g)

Syntactic Structure of example sentence

Clauses and Sources

You can use the get_quotes function to extract quotes and paraphrases from the sentences. Note that for this, the token ids need to be globally unique. If that is not the case, you can use the unique.ids function to make them unique:

tokens = unique_ids(tokens)

You can get the quotes from the tokens with get_quotes:

quotes = get_quotes(tokens)
quotes
quote_id key quote_role id 1 2 source 1 1 2 quote 3 1 2 quote 4 1 2 quote 6 1 2 quote 5

A single quote was found, with node 2 ("say") as the key, node 1 ("John") as the sources, and nodes 3 through 6 ("that Mary hit him") as quote.

To find the clauses, you can use the get_clauses function, which takes the quotes as an optional argument to make sure that speech actions are not listed as clauses:

clauses = get_clauses(tokens, quotes=quotes)
clauses
clause_id clause_role id 1 subject 4 1 predicate 3 1 predicate 6 1 predicate 5

Finally, you can also provide the quotes and clauses to the graph_from_sentence function. This will fill the clauses in a desaturated rainbow, with the subject as a circle and the predicate as rectangle. Quotes are represented with a bright node for the source, and the border in the same colour for the quote.

g = graph_from_sentence(tokens, quotes = quotes, clauses = clauses)
plot(g)

Syntactic Structure of example sentence with clauses and quotes
marked



anon-author/clauses documentation built on May 10, 2019, 11:52 a.m.