Make a word cloud"
In petro.One: Statistics and Text Mining for Oil and Gas Papers from OnePetro Metadata

Load 2918 papers metadata

library(petro.One)
library(tm)
library(tibble)

use_example(1)

p1 <- onepetro_page_to_dataframe("neural_network-s0000r1000.html")
p2 <- onepetro_page_to_dataframe("neural_network-s1000r1000.html")
p3 <- onepetro_page_to_dataframe("neural_network-s2000r1000.html")
p4 <- onepetro_page_to_dataframe("neural_network-s3000r1000.html")

nn_papers <- rbind(p1, p2, p3, p4)
nn_papers

get_papers_count("neural_network-s0000r1000.html")
get_papers_count("neural_network-s1000r1000.html")
get_papers_count("neural_network-s3000r1000.html")

Convert and clean document for text mining

vdocs <- VCorpus(VectorSource(nn_papers$book_title))
vdocs <- tm_map(vdocs, content_transformer(tolower))      # to lowercase
vdocs <- tm_map(vdocs, removeWords, stopwords("english")) # remove stopwords

Summary table with words frequency

tdm <- TermDocumentMatrix(vdocs)

tdm.matrix <- as.matrix(tdm)
tdm.rs <- sort(rowSums(tdm.matrix), decreasing=TRUE)
tdm.df <- data.frame(word = names(tdm.rs), freq = tdm.rs, stringsAsFactors = FALSE)
as.tibble(tdm.df)                          # prevent long printing of dataframe

There are r sum(nrow(tdm.df)) words under analysis. We will focus our attention on those papers were the frequency is greater than 50 occurrances.

Word cloud with words that occur at least 50 times

library(wordcloud)

set.seed(1234)
wordcloud(words = tdm.df$word, freq = tdm.df$freq, min.freq = 50,
          max.words=200, random.order=FALSE, rot.per=0.35,
          colors=brewer.pal(8, "Dark2"))

Note that in the word cloud there are words of common use such as using, use, new, approach and case. These words are not necessarily technical enough to improve where the papers we are analyzing are focusing. In the next example, we will build our own custom stopwords to prevent these words from showing.

Any scripts or data that you put into this service are public.

petro.One documentation built on May 2, 2019, 3:10 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

petro.One
Statistics and Text Mining for Oil and Gas Papers from OnePetro Metadata

Make a word cloud"
In petro.One: Statistics and Text Mining for Oil and Gas Papers from OnePetro Metadata

Load 2918 papers metadata

Convert and clean document for text mining

Summary table with words frequency

Word cloud with words that occur at least 50 times

Try the petro.One package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

petro.One Statistics and Text Mining for Oil and Gas Papers from OnePetro Metadata

Make a word cloud" In petro.One: Statistics and Text Mining for Oil and Gas Papers from OnePetro Metadata

Load 2918 papers metadata

Convert and clean document for text mining

Summary table with words frequency

Word cloud with words that occur at least 50 times

Try the petro.One package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

petro.One
Statistics and Text Mining for Oil and Gas Papers from OnePetro Metadata

Make a word cloud"
In petro.One: Statistics and Text Mining for Oil and Gas Papers from OnePetro Metadata