In agricolamz/lingcorpora.R: Linguistic Corpora API

knitr::opts_chunk$set(echo = TRUE, message=F, warning = F)

About lingcorpora

lingcorpora package provides R with API from different linguistic corpora. A tutorial for this package is avaliable on GitHub wiki. This package includes APIs for:

library(lingtypology)
map.feature(c("Abkhaz", "Avar", "Polish", "Russian"))

Instalation {.tabset .tabset-fade .tabset-pills}

R version

Get the last version from GitHub:

install.packages("devtools")
devtools::install_github("agricolamz/lingcorpora.R", dependencies = TRUE)

Load a library:

library(lingcorpora)

Python version

If you want to install our package, please tap the following command in Terminal: ```{bash, eval = F} pip3 install git+https://github.com/alexeykosh/lingcorpora.py

For import it in your project, tap:
```{python, engine.path = '/usr/bin/python3'}
import lingcorpora

ADD about python3!!!

Usage

Most of the functions in lingcorpora have the same syntax: first part is a language iso code, the second part is _corpus.

Abkhaz Text Corpus {.tabset .tabset-fade .tabset-pills}

R version

The basic function for searching in Abkhaz Text Corpus is abk_corpus. This function creates a dataframe with a results from the corpus. The function abk_corpus have a lot of arguments (as in all R function, it is not obligatory to write names of the arguments):

query --- the sole obligatory argument with your query. I will use library DT for data frame visualization, but it is not necessary

df <- abk_corpus(query = "бызшәа")
head(df)

df <- abk_corpus(query = "бызшәа")
library(DT)
datatable(head(df), options = list(dom = 'tip'))

kwic (key word in context) is the format for resulted lines. If TRUE, then it returns a dataframe with query in the middle and left and right contexts. If FALSE, then it returns each result in one string. By default is TRUE.

df <- abk_corpus(query = "бызшәа", kwic = FALSE)
head(df)

write argument writes a file in the working derictory. If FALSE, then it creates a dataframe in Global Environment. Otherwise function writes a .tsv file with the name frome the argument value. By default is FALSE.

abk_corpus(query = "бызшәа", write = "myquiry")

The query argument can be filled with regular expressions or CQL (corpus query language), read more at the help page

df <- abk_corpus(query = "бызшәа*")
head(df)

df <- abk_corpus(query = "бызшәа*")
datatable(head(df), options = list(dom = 'tip'))

Python version

Avar Text Corpus {.tabset .tabset-fade .tabset-pills}

R version

The basic function for searching in Avar Text Corpus is ava_corpus. This function creates a dataframe with a results from the corpus. The function ava_corpus have a lot of arguments (as in all R function, it is not obligatory to write names of the arguments):

query --- the sole obligatory argument with your query

df <- ava_corpus(query = "шагьар")
head(df)

df <- ava_corpus(query = "шагьар")
datatable(head(df), options = list(dom = 'tip'))

kwic (key word in context) is the format for resulted lines. If TRUE, then it returns a dataframe with query in the middle and left and right contexts. If FALSE, then it returns each result in one string. By default is TRUE.

df <- ava_corpus(query = "вацазе", kwic = FALSE)
head(df)

write argument writes a file in the working derictory. If FALSE, then it creates a dataframe in Global Environment. Otherwise function writes a .tsv file with the name frome the argument value. By default is FALSE.

ava_corpus(query = "васазе", write = "myquiry")

The query argument can be filled with regular expressions or CQL (corpus query language), read more at the help page

df <- ava_corpus(query = "магIарул*")
head(df)

df <- ava_corpus(query = "магIарул*")
datatable(head(df), options = list(dom = 'tip'))

Python version

National Corpus of Polish {.tabset .tabset-fade .tabset-pills}

R version

The basic function for searching in National Corpus of Polish is pol_corpus. This function creates a dataframe with a results from the corpus. The function pol_corpus have a lot of arguments (as in all R function, it is not obligatory to write names of the arguments):

query --- the sole obligatory argument with your query

df <- pol_corpus(query = "tata")
head(df)

df <- pol_corpus(query = "tata")
datatable(head(df), options = list(dom = 'tip'))

tag --- if TRUE all the words in a result will have morphological tags

df <- pol_corpus(query = "tata", tag = TRUE)
head(df)

df <- pol_corpus(query = "tata", tag = TRUE)
datatable(head(df), options = list(dom = 'tip'))

n_results defines number of examples from the corpus. By default is 10.

df <- pol_corpus(query = "tata", n_results = 6)
df

df <- pol_corpus(query = "tata", n_results = 6)
datatable(df, options = list(dom = 'tip'))

corpus --- vector with a type of the corpus: "nkjp300", "nkjp1800", "nkjp1M", "ipi250", "ipi030", "frequency-dictionary"

df <- pol_corpus(query = "tata", corpus = "nkjp1M")
head(df)

df <- pol_corpus(query = "tata", corpus = "nkjp1M")
datatable(head(df), options = list(dom = 'tip'))

kwic (key word in context) is the format for resulted lines. If TRUE, then it returns a dataframe with query in the middle and left and right contexts. If FALSE, then it returns each result in one string. By default is TRUE.

df <- pol_corpus(query = "tata", kwic = FALSE)
head(df)

write argument writes a file in the working derictory. If FALSE, then it creates a dataframe in Global Environment. Otherwise function writes a .tsv file with the name frome the argument value. By default is FALSE.

pol_corpus(query = "tata", write = "myquiry")

The query argument can be filled with regular expressions or CQL (corpus query language), read more at the help page:

df <- pol_corpus("An*a")
head(df)

df <- pol_corpus("An*a")
datatable(head(df), options = list(dom = 'tip'))

df <- pol_corpus("[base = 'strzyc']")
head(df)

df <- pol_corpus("[base = 'strzyc']")
datatable(head(df), options = list(dom = 'tip'))

Python version

{python, engine.path = '/usr/bin/python3'} import lingcorpora print(lingcorpora.pol_search("tata"))

National Corpus of Russian Language {.tabset .tabset-fade .tabset-pills}

R version

Python version

agricolamz/lingcorpora.R documentation built on May 10, 2019, 7:34 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

agricolamz/lingcorpora.R
Linguistic Corpora API

In agricolamz/lingcorpora.R: Linguistic Corpora API

About lingcorpora

Instalation {.tabset .tabset-fade .tabset-pills}

R version

Python version

Usage

Abkhaz Text Corpus {.tabset .tabset-fade .tabset-pills}

R version

Python version

Avar Text Corpus {.tabset .tabset-fade .tabset-pills}

R version

Python version

National Corpus of Polish {.tabset .tabset-fade .tabset-pills}

R version

Python version

National Corpus of Russian Language {.tabset .tabset-fade .tabset-pills}

R version

Python version

R Package Documentation

Browse R Packages

We want your feedback!

agricolamz/lingcorpora.R Linguistic Corpora API

In agricolamz/lingcorpora.R: Linguistic Corpora API

About lingcorpora

Instalation {.tabset .tabset-fade .tabset-pills}

R version

Python version

Usage

Abkhaz Text Corpus {.tabset .tabset-fade .tabset-pills}

R version

Python version

Avar Text Corpus {.tabset .tabset-fade .tabset-pills}

R version

Python version

National Corpus of Polish {.tabset .tabset-fade .tabset-pills}

R version

Python version

National Corpus of Russian Language {.tabset .tabset-fade .tabset-pills}

R version

Python version

R Package Documentation

Browse R Packages

We want your feedback!

agricolamz/lingcorpora.R
Linguistic Corpora API