knitr::opts_chunk$set(echo = TRUE, message=F, warning = F)
lingcorpora
package provides R with API from different linguistic corpora. A tutorial for this package is avaliable on GitHub wiki. This package includes APIs for:
library(lingtypology) map.feature(c("Abkhaz", "Avar", "Polish", "Russian"))
Get the last version from GitHub:
install.packages("devtools") devtools::install_github("agricolamz/lingcorpora.R", dependencies = TRUE)
Load a library:
library(lingcorpora)
If you want to install our package, please tap the following command in Terminal: ```{bash, eval = F} pip3 install git+https://github.com/alexeykosh/lingcorpora.py
For import it in your project, tap: ```{python, engine.path = '/usr/bin/python3'} import lingcorpora
ADD about python3!!!
Most of the functions in lingcorpora
have the same syntax: first part is a language iso code, the second part is _corpus
.
The basic function for searching in Abkhaz Text Corpus is abk_corpus
. This function creates a dataframe with a results from the corpus. The function abk_corpus
have a lot of arguments (as in all R function, it is not obligatory to write names of the arguments):
query
--- the sole obligatory argument with your query. I will use library DT
for data frame visualization, but it is not necessarydf <- abk_corpus(query = "бызшәа") head(df)
df <- abk_corpus(query = "бызшәа") library(DT) datatable(head(df), options = list(dom = 'tip'))
kwic
(key word in context) is the format for resulted lines. If TRUE, then it returns a dataframe with query in the middle and left and right contexts. If FALSE, then it returns each result in one string. By default is TRUE.df <- abk_corpus(query = "бызшәа", kwic = FALSE) head(df)
write
argument writes a file in the working derictory. If FALSE, then it creates a dataframe in Global Environment. Otherwise function writes a .tsv file with the name frome the argument value. By default is FALSE.abk_corpus(query = "бызшәа", write = "myquiry")
The query
argument can be filled with regular expressions or CQL (corpus query language), read more at the help page
df <- abk_corpus(query = "бызшәа*") head(df)
df <- abk_corpus(query = "бызшәа*") datatable(head(df), options = list(dom = 'tip'))
The basic function for searching in Avar Text Corpus is ava_corpus
. This function creates a dataframe with a results from the corpus. The function ava_corpus
have a lot of arguments (as in all R function, it is not obligatory to write names of the arguments):
query
--- the sole obligatory argument with your querydf <- ava_corpus(query = "шагьар") head(df)
df <- ava_corpus(query = "шагьар") datatable(head(df), options = list(dom = 'tip'))
kwic
(key word in context) is the format for resulted lines. If TRUE, then it returns a dataframe with query in the middle and left and right contexts. If FALSE, then it returns each result in one string. By default is TRUE.df <- ava_corpus(query = "вацазе", kwic = FALSE) head(df)
write
argument writes a file in the working derictory. If FALSE, then it creates a dataframe in Global Environment. Otherwise function writes a .tsv file with the name frome the argument value. By default is FALSE.ava_corpus(query = "васазе", write = "myquiry")
The query
argument can be filled with regular expressions or CQL (corpus query language), read more at the help page
df <- ava_corpus(query = "магIарул*") head(df)
df <- ava_corpus(query = "магIарул*") datatable(head(df), options = list(dom = 'tip'))
The basic function for searching in National Corpus of Polish is pol_corpus
. This function creates a dataframe with a results from the corpus. The function pol_corpus
have a lot of arguments (as in all R function, it is not obligatory to write names of the arguments):
query
--- the sole obligatory argument with your querydf <- pol_corpus(query = "tata") head(df)
df <- pol_corpus(query = "tata") datatable(head(df), options = list(dom = 'tip'))
tag
--- if TRUE all the words in a result will have morphological tagsdf <- pol_corpus(query = "tata", tag = TRUE) head(df)
df <- pol_corpus(query = "tata", tag = TRUE) datatable(head(df), options = list(dom = 'tip'))
n_results
defines number of examples from the corpus. By default is 10.df <- pol_corpus(query = "tata", n_results = 6) df
df <- pol_corpus(query = "tata", n_results = 6) datatable(df, options = list(dom = 'tip'))
corpus
--- vector with a type of the corpus: "nkjp300", "nkjp1800", "nkjp1M", "ipi250", "ipi030", "frequency-dictionary"df <- pol_corpus(query = "tata", corpus = "nkjp1M") head(df)
df <- pol_corpus(query = "tata", corpus = "nkjp1M") datatable(head(df), options = list(dom = 'tip'))
kwic
(key word in context) is the format for resulted lines. If TRUE, then it returns a dataframe with query in the middle and left and right contexts. If FALSE, then it returns each result in one string. By default is TRUE.df <- pol_corpus(query = "tata", kwic = FALSE) head(df)
write
argument writes a file in the working derictory. If FALSE, then it creates a dataframe in Global Environment. Otherwise function writes a .tsv file with the name frome the argument value. By default is FALSE.pol_corpus(query = "tata", write = "myquiry")
The query
argument can be filled with regular expressions or CQL (corpus query language), read more at the help page:
df <- pol_corpus("An*a") head(df)
df <- pol_corpus("An*a") datatable(head(df), options = list(dom = 'tip'))
df <- pol_corpus("[base = 'strzyc']") head(df)
df <- pol_corpus("[base = 'strzyc']") datatable(head(df), options = list(dom = 'tip'))
{python, engine.path = '/usr/bin/python3'}
import lingcorpora
print(lingcorpora.pol_search("tata"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.