View source: R/Create.semantic.twitter.R
Create.semantic.twitter | R Documentation |
Creates a semantic network from tweets returned from the twitter search query. Semantic networks describe the semantic relationships between concepts. In this network the concepts are significant words and hashtags extracted from the tweet text. Network edges are weighted and represent occurrence of words and hashtags in the same tweets.
The creation of twitter semantic networks requires text processing and the tokenization of tweets. As such this function requires the additional installation of the tidyr and tidytext packages to achieve this.
## S3 method for class 'semantic.twitter' Create( datasource, type, removeRetweets = TRUE, removeTermsOrHashtags = NULL, stopwords = TRUE, stopwordsLang = "en", stopwordsSrc = "smart", removeNumbers = TRUE, removeUrls = TRUE, termFreq = 5, hashtagFreq = 50, assoc = "limited", verbose = TRUE, ... )
datasource |
Collected social media data with |
type |
Character string. Type of network to be created, set to |
removeRetweets |
Logical. Removes detected retweets from the tweet data. Default is |
removeTermsOrHashtags |
Character vector. Words or hashtags to remove from the semantic network. For example,
this parameter could be used to remove the search term or hashtag that was used to collect the data by removing any
nodes with matching name. Default is |
stopwords |
Logical. Removes stopwords from the tweet data. Default is |
stopwordsLang |
Character string. Language of stopwords to use. Refer to the stopwords package for further
information on supported languages. Default is |
stopwordsSrc |
Character string. Source of stopwords list. Refer to the stopwords package for further
information on supported sources. Default is |
removeNumbers |
Logical. Removes whole numerical tokens from the tweet text. For example, a year value such as
|
removeUrls |
Logical. Removes twitter shortened URL tokens from the tweet text. Default is |
termFreq |
Numeric integer. Specifies the percentage of most frequent words to include. For example,
|
hashtagFreq |
Numeric integer. Specifies the percentage of most frequent |
assoc |
Character string. Association of nodes. A value of |
verbose |
Logical. Output additional information about the network creation. Default is |
... |
Additional parameters passed to function. Not used in this method. |
Network as a named list of two dataframes containing $nodes
and $edges
.
The words and hashtags passed to the function in the removeTermsOrHashtags
parameter are removed before
word frequencies are calculated and are therefore excluded from top percentage of most frequent terms completely
rather than simply filtered out of the final network.
The top percentage of frequently occurring hashtags hashtagFreq
and words termFreq
are calculated to
a minimum frequency and all terms that have an equal or greater frequency than the minimum are included in the
network as nodes. For example, of unique hashtags of varying frequencies in a dataset the top 50% of total
frequency or most common hashtags may calculate to being the first 20 hashtags. The frequency of the 20th hashtag
is then used as the minimum and all hashtags of equal or greater frequency are included as part of the top 50% most
frequently occurring hashtags. So the number of top hashtags may end up being greater than 20 if there is more than
one hashtag that has frequency matching the minimum. The exception to this is if the minimum frequency is 1 and the
hashtagFreq
is set to less than 100, in this case only the first 20 hashtags will be included.
Hashtags and words in the top percentages are included in the network as isolates if there are no instances of them occurring in tweet text with other top percentage frequency terms.
## Not run: # twitter semantic network creation additionally requires the tidytext # and stopwords packages for working with text data # install.packages(c("tidytext", "stopwords")) # create a twitter semantic network graph removing the hashtag "#auspol" # and using the top 2% frequently occurring words and 10% most frequently # occurring hashtags as nodes net_semantic <- collect_tw |> Create("semantic", removeTermsOrHashtags = c("#auspol"), termFreq = 2, hashtagFreq = 10, verbose = TRUE) # network # net_semantic$nodes # net_semantic$edges ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.