Words used in Portuguese Wikipedia
This data-package contains a dataset with words used in a random sample from ~15.000 pages from the Portuguese Wikipedia.
It can be installed using:
devtools::install_github("dfalbel/ptwikiwords")
After installing the package, you can load the dataset using:
library(ptwikiwords)
data(ptwikiwords)
head(ptwikiwords)
#> # A tibble: 6 × 3
#> word count check
#> <chr> <int> <lgl>
#> 1 de 210954 TRUE
#> 2 a 109652 TRUE
#> 3 e 100028 TRUE
#> 4 o 87839 TRUE
#> 5 em 67040 TRUE
#> 6 do 59489 TRUE
The dataset contains 3 columns:
Here is a wordcloud of those words:
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(wordcloud))
words_filter <- ptwikiwords %>%
filter(check == T) %>%
slice(1:300)
wordcloud(words_filter$word, words_filter$count)
Here is a wordcloud of the 2-grams.
data(ngrams)
words_filter <- ngrams %>%
slice(1:100)
wordcloud(words_filter$ngrams, words_filter$count)
#> Warning in wordcloud(words_filter$ngrams, words_filter$count): com o could
#> not be fit on page. It will not be plotted.
#> Warning in wordcloud(words_filter$ngrams, words_filter$count): o primeiro
#> could not be fit on page. It will not be plotted.
#> Warning in wordcloud(words_filter$ngrams, words_filter$count): é um could
#> not be fit on page. It will not be plotted.
#> Warning in wordcloud(words_filter$ngrams, words_filter$count): para a could
#> not be fit on page. It will not be plotted.
#> Warning in wordcloud(words_filter$ngrams, words_filter$count): de um could
#> not be fit on page. It will not be plotted.
#> Warning in wordcloud(words_filter$ngrams, words_filter$count): janeiro de
#> could not be fit on page. It will not be plotted.
#> Warning in wordcloud(words_filter$ngrams, words_filter$count): é uma could
#> not be fit on page. It will not be plotted.
#> Warning in wordcloud(words_filter$ngrams, words_filter$count): setembro de
#> could not be fit on page. It will not be plotted.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.