Extra-AnalysingOtherLanguages"
In finnsurveytext: Analyse Open-Ended Survey Responses in Finnish

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(finnsurveytext)
library(udpipe)
library(stopwords)

How to Use `finnsurveytext` in another language!

Despite the package's name, finnsurveytext can be used to analyse surveys in LOTS of different languages. This vignette aims to explain how to use finnsurveytext in another language with as little additional effort as possible.

The reason finnsurveytext can be used with other languages is that the packages it employs to process the raw survey data work in multiple languages! So we have the developers of the udpipe and stopwords packages to thank!

There is a survey in English provided with the package called english_sample_survey which we will use to demonstrate the use of the package in a language other than Finnish.

knitr::kable(head(english_sample_survey, 5))

1. Essential: Your language has a language model available for `udpipe`

The udpipe package is available from the CRAN. The relevant udpipe function we use is udpipe::udpipe_download_model. You can see the list of available models in the udpipe manual.

At the time of writing this vignette, these were:

afrikaans-afribooms, ancient_greek-perseus, ancient_greek-proiel, arabic-padt, armenian-armtdp, basque-bdt, belarusian-hse, bulgarian-btb, buryat-bdt, catalan-ancora, chinese-gsd, chinese-gsdsimp, coptic-scriptorium, croatian-set, czech-cac, czech-cltt, czech-fictree, czech-pdt, danish-ddt, dutch-alpino, dutch-lassysmall, english-ewt, english-gum, english-lines, english-partut, estonian-edt, finnish-ftb, finnish-tdt, french-gsd, french-partut, french-sequoia, french-spoken, galician-ctg, galician-treegal, german-gsd, german-hdt, gothic-proiel, greek-gdt, hebrew-htb, hindi-hdtb, hungarian-szeged, indonesian-gsd, irish-idt, italian-isdt, italian-partut, italian-postwita, italian-twittiro, japanese-gsd, kazakh-ktb, korean-gsd, korean-kaist, kurmanji-mg, latin-ittb, latin-perseus, latin-proiel, latvian-lvtb, lithuanian-hse, maltese-mudt, marathi-ufal, north_sami-giella, norwegian-bokmaal, norwegian-nynorsk, norwegian-nynorsklia, old_church_slavonic-proiel, old_french-srcmf, persian-seraji, polish-lfg, polish-sz, portuguese-bosque, portuguese-br, portuguese-gsd, romanian-nonstandard, romanian-rrt, russian-gsd, russian-syntagrus, russian-taiga, sanskrit-ufal, scottish_gaelic-arcosg, serbian-set, slovak-snk, slovenian-ssj, slovenian-sst, spanish-ancora, spanish-gsd, swedish-lines, swedish-talbanken, tamil-ttb, telugu-mtg, turkish-imst, ukrainian-iu, upper_sorbian-ufal, urdu-udtb, uyghur-udt, vietnamese-vtb

Alternatively, you can find the list of available models by running fst_print_available_models(). By providing a search term, the list will be filtered for models containing this language:

fst_print_available_models()

fst_print_available_models(search = 'estonian')

fst_print_available_models('sami')

How to use:

The relevant model, eg "swedish-talbanken", should be used for the model input in fst_format() or fst_prepare()

Demonstration:

We find an English model and format our English data below:

fst_print_available_models("english")

en_df <- fst_format(data = english_sample_survey,
           question = 'text', 
           id = 'id', 
           model = 'english-ewt'
           )

knitr::kable(head(en_df, 5))

2. Recommended: Your language has a stopwords list available for `stopwords` package

The stopwords package is available from the CRAN. The relevant stopwords functions are stopwords::stopwords, stopwords::stopwords_getsources and stopwords::stopwords_getlanguages. We recommend you first identify the two-letter ISO code for the language you are using. You can see the list of available sources and languages in the stopwords manual or by running the 'get sources' and 'get languages' functions:

stopwords_getsources()
stopwords::stopwords_getlanguages(source = 'nltk')
stopwords('da', source = 'nltk')
stopwords('da') # The default source is 'snowball'

Alternatively, you can use our function fst_find_stopwords to simplify this process. This function provides a table of lists available through the stopwords package for a language and provides the contents for comparison (if you have multiple options!). To run this, you need the two-letter ISO language code:

knitr::kable(fst_find_stopwords(language = 'lv'))
fst_find_stopwords(language = 'no')

How to use:

The relevant language and stopword list ('source'), eg "sv" and "nltk", should be used for the language and stopword_list inputs respectively in fst_prepare() (or fst_rm_stop_punct() which is automatically called within fst_prepare()).

Demonstration:

We can find and compare English stopwords lists as below. Once we have chosen a stopwords list, we can run fst_prepare() to format the data and remove the stopwords:

knitr::kable(head(fst_find_stopwords(language = 'en'), 5))

en_df2 <- fst_prepare(data = english_sample_survey,
                      question = 'text',
                      id = 'id',
                      model = 'english-ewt',
                      stopword_list = 'smart', 
                      language = 'en')

knitr::kable(head(en_df2, 5))

2b. Optional: Provide your own list of stopwords

If a stopword list is not available for your language, or you would like to provide your own, you can use the manual_list option within fst_prepare() (or fst_rm_stop_punct()) making sure to also either set manual = TRUE or stopwords_list = "manual".

You can also chose to not remove stopwords but you may find that you want to remove them to get more meaningful results!

If you provide a manual list, you can leave language as its default values.

Demonstration

#EXAMPLE OF PROVIDING A MANUAL LIST
manualList <- c('and', 'the', 'of', 'you', 'me', 'ours', 'mine', 'them', 'theirs')
manualList2 <- "to, the, I"

df1 <- fst_prepare(data = english_sample_survey,
                  question = 'text',
                  id = 'id',
                  model = 'english-ewt',
                  manual_list = manualList,
                  stopword_list = 'manual'
                  )

knitr::kable(head(df1, 5))

df2 <- fst_prepare(data = english_sample_survey,
                  question = 'text',
                  id = 'id',
                  model = 'english-ewt',
                  manual = TRUE,
                  manual_list = manualList2
                  )

knitr::kable(head(df2, 5))

The remainder of the package works the same regardless of language of survey responses.

unlink('english-ewt-ud-2.5-191206.udpipe')

Any scripts or data that you put into this service are public.

finnsurveytext documentation built on April 4, 2025, 5:07 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

finnsurveytext
Analyse Open-Ended Survey Responses in Finnish

Extra-AnalysingOtherLanguages"
In finnsurveytext: Analyse Open-Ended Survey Responses in Finnish

How to Use `finnsurveytext` in another language!

1. Essential: Your language has a language model available for `udpipe`

How to use:

Demonstration:

2. Recommended: Your language has a stopwords list available for `stopwords` package

How to use:

Demonstration:

2b. Optional: Provide your own list of stopwords

Demonstration

Try the finnsurveytext package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

finnsurveytext Analyse Open-Ended Survey Responses in Finnish

Extra-AnalysingOtherLanguages" In finnsurveytext: Analyse Open-Ended Survey Responses in Finnish

How to Use finnsurveytext in another language!

1. Essential: Your language has a language model available for udpipe

How to use:

Demonstration:

2. Recommended: Your language has a stopwords list available for stopwords package

How to use:

Demonstration:

2b. Optional: Provide your own list of stopwords

Demonstration

Try the finnsurveytext package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

finnsurveytext
Analyse Open-Ended Survey Responses in Finnish

Extra-AnalysingOtherLanguages"
In finnsurveytext: Analyse Open-Ended Survey Responses in Finnish

How to Use `finnsurveytext` in another language!

1. Essential: Your language has a language model available for `udpipe`

2. Recommended: Your language has a stopwords list available for `stopwords` package