knitr::opts_chunk$set(echo = TRUE, message=F, warning = F)

About lingcorpora

lingcorpora package provides R with API from different linguistic corpora. A tutorial for this package is avaliable on GitHub wiki. This package includes APIs for:

library(lingtypology)
map.feature(c("Abkhaz", "Avar", "Polish", "Russian"))

Instalation {.tabset .tabset-fade .tabset-pills}

R version

Get the last version from GitHub:

install.packages("devtools")
devtools::install_github("agricolamz/lingcorpora.R", dependencies = TRUE)

Load a library:

library(lingcorpora)

Python version

If you want to install our package, please tap the following command in Terminal: ```{bash, eval = F} pip3 install git+https://github.com/alexeykosh/lingcorpora.py

For import it in your project, tap:
```{python, engine.path = '/usr/bin/python3'}
import lingcorpora

ADD about python3!!!

Usage

Most of the functions in lingcorpora have the same syntax: first part is a language iso code, the second part is _corpus.

Abkhaz Text Corpus {.tabset .tabset-fade .tabset-pills}

R version

The basic function for searching in Abkhaz Text Corpus is abk_corpus. This function creates a dataframe with a results from the corpus. The function abk_corpus have a lot of arguments (as in all R function, it is not obligatory to write names of the arguments):

df <- abk_corpus(query = "бызшәа")
head(df)
df <- abk_corpus(query = "бызшәа")
library(DT)
datatable(head(df), options = list(dom = 'tip'))
df <- abk_corpus(query = "бызшәа", kwic = FALSE)
head(df)
abk_corpus(query = "бызшәа", write = "myquiry")

The query argument can be filled with regular expressions or CQL (corpus query language), read more at the help page

df <- abk_corpus(query = "бызшәа*")
head(df)
df <- abk_corpus(query = "бызшәа*")
datatable(head(df), options = list(dom = 'tip'))

Python version

Avar Text Corpus {.tabset .tabset-fade .tabset-pills}

R version

The basic function for searching in Avar Text Corpus is ava_corpus. This function creates a dataframe with a results from the corpus. The function ava_corpus have a lot of arguments (as in all R function, it is not obligatory to write names of the arguments):

df <- ava_corpus(query = "шагьар")
head(df)
df <- ava_corpus(query = "шагьар")
datatable(head(df), options = list(dom = 'tip'))
df <- ava_corpus(query = "вацазе", kwic = FALSE)
head(df)
ava_corpus(query = "васазе", write = "myquiry")

The query argument can be filled with regular expressions or CQL (corpus query language), read more at the help page

df <- ava_corpus(query = "магIарул*")
head(df)
df <- ava_corpus(query = "магIарул*")
datatable(head(df), options = list(dom = 'tip'))

Python version

National Corpus of Polish {.tabset .tabset-fade .tabset-pills}

R version

The basic function for searching in National Corpus of Polish is pol_corpus. This function creates a dataframe with a results from the corpus. The function pol_corpus have a lot of arguments (as in all R function, it is not obligatory to write names of the arguments):

df <- pol_corpus(query = "tata")
head(df)
df <- pol_corpus(query = "tata")
datatable(head(df), options = list(dom = 'tip'))
df <- pol_corpus(query = "tata", tag = TRUE)
head(df)
df <- pol_corpus(query = "tata", tag = TRUE)
datatable(head(df), options = list(dom = 'tip'))
df <- pol_corpus(query = "tata", n_results = 6)
df
df <- pol_corpus(query = "tata", n_results = 6)
datatable(df, options = list(dom = 'tip'))
df <- pol_corpus(query = "tata", corpus = "nkjp1M")
head(df)
df <- pol_corpus(query = "tata", corpus = "nkjp1M")
datatable(head(df), options = list(dom = 'tip'))
df <- pol_corpus(query = "tata", kwic = FALSE)
head(df)
pol_corpus(query = "tata", write = "myquiry")

The query argument can be filled with regular expressions or CQL (corpus query language), read more at the help page:

df <- pol_corpus("An*a")
head(df)
df <- pol_corpus("An*a")
datatable(head(df), options = list(dom = 'tip'))
df <- pol_corpus("[base = 'strzyc']")
head(df)
df <- pol_corpus("[base = 'strzyc']")
datatable(head(df), options = list(dom = 'tip'))

Python version

{python, engine.path = '/usr/bin/python3'} import lingcorpora print(lingcorpora.pol_search("tata"))

National Corpus of Russian Language {.tabset .tabset-fade .tabset-pills}

R version

Python version



agricolamz/lingcorpora.R documentation built on May 10, 2019, 7:34 a.m.