knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 8, fig.height = 6 )
This tutorial demonstrates the use of the package to analyse a survey question. It is intended to demonstrate the use of the package from start to finish in detail. In this example, we look at responses split by gender but it is expected that you might look at a variety of different splits of your data in analysis.
Once the package is installed, you can load the finnsurveytext
package as below:
(Other required packages such as dplyr
and stringr
will also be installed if they are not currently installed in your environment.)
library(finnsurveytext)
We will look at our Q11_3 data from the Development Cooperation 2012 survey data which is included as sample data with our package. The specific question we're looking at is as follows:
The prepared data can be found in data/fst_dev_coop.rda
.
This tutorial covers functions from throughout the package. For further details on the functions, see the previous tutorials.
The following is an excerpt from our prepared datasets:
knitr::kable(head(fst_dev_coop, 10))
First, let's consider all the data to get some initial information about the responses to this question. We will run the following functions:
fst_summarise()
fst_pos()
fst_length_summary()
fst_wordcloud()
For further information on these functions, please see "InDetail2-DataExploration".
knitr::kable( fst_summarise(fst_dev_coop) ) knitr::kable( fst_pos(fst_dev_coop) ) knitr::kable( fst_length_summary(fst_dev_coop) )
Remarks:
fst_wordcloud(fst_dev_coop)
Remarks:
Next, we will look at the most common words (unigrams), bigrams and trigrams in that data.
fst_freq(fst_dev_coop, 10, strict = FALSE, norm = "number_resp") fst_ngrams(fst_dev_coop, 10, ngrams = 2, strict = FALSE, norm = "number_resp") fst_ngrams(fst_dev_coop, 10, ngrams = 3, strict = FALSE, norm = "number_resp")
Remarks:
norm = 'number_resp'
as we're interested in the proportion of responses that list as specific word. Next, let's look at some Concept Networks before we look into gender. (For further detail on the Concept Network functions, see "InDetail3-ConceptNetworkOverview".) Our Concept Network function can be used to visualise the words that occur around our words of interest.
Again, we will set norm = 'number_resp'
.
fst_concept_network(fst_dev_coop, concepts = "köyhyys, nälänhätä, sota", title = "Q11_3 - No threshold", norm = "number_resp" ) fst_concept_network(fst_dev_coop, concepts = "köyhyys, nälänhätä, sota", threshold = 5, title = "Q11_3 - Threshold = 5", norm = "number_resp" ) fst_concept_network(fst_dev_coop, concepts = "köyhyys, nälänhätä, sota", threshold = 3, title = "Q11_3 - Threshold = 3", norm = "number_resp" )
Remarks:
Now, let's add more words in:
fst_concept_network(fst_dev_coop, concepts = "köyhyys, nälänhätä, sota, ilmastonmuutos, puute, ihminen, vesi, epätasa-arvo", title = "Q11_3 - Lots of words", norm = "number_words" ) fst_concept_network(fst_dev_coop, concepts = "köyhyys, nälänhätä, sota, ilmastonmuutos, puute, ihminen, vesi, epätasa-arvo", title = "Q11_3 - Lots of words, threshold = 3", threshold = 5, norm = "number_resp" )
Remarks:
Now, let's see what do people who talk about water tend to also say in their responses:
fst_concept_network(fst_dev_coop, concepts = "vesi", title = "Q11_3 - Vesi") fst_concept_network(fst_dev_coop, concepts = "puute, puhdas, vesi", title = "Q11_3 - ")
Remarks:
Now we've looked at responses as a whole, we consider whether gender impacts word choice. For further details on the comparison functions, please refer to "InDetail4-ComparisonFunctions".
We will start the comparison by looking at the summary functions:
fst_summarise_compare()
fst_pos_compare()
fst_length_compare()
knitr::kable( fst_summarise_compare(fst_dev_coop, field = 'gender') ) knitr::kable( fst_pos_compare(fst_dev_coop, field = 'gender') ) knitr::kable( fst_length_compare(fst_dev_coop, field = 'gender') )
Remarks:
Now let's consider the comparison cloud. (We will weight responses by our weight column we have included in our formatted data, and exclude responses without a gender listed.)
fst_comparison_cloud( fst_dev_coop, field = 'gender', use_column_weights = TRUE, exclude_nulls = TRUE, max = 50 )
Based on comparison cloud, it looks like:
Now we look at common words and n-grams. Here, we will continue to exclude the responses with 'NA' for gender as we go on, as there are very few of these and always weight the data.
fst_freq_compare( fst_dev_coop, field = 'gender', number = 10, norm = NULL, pos_filter = NULL, unique_colour = "indianred", strict = TRUE, use_column_weights = TRUE, exclude_nulls = TRUE ) fst_ngrams_compare( fst_dev_coop, field = 'gender', number = 10, ngrams = 2, pos_filter = NULL, unique_colour = "indianred", strict = TRUE, use_column_weights = TRUE, exclude_nulls = TRUE )
Remarks:
Here we see that male and female respondents have the same top 4 words, but that 'nälänhätä' is more frequent for females than males, and 'ilmastonmuutos' conversely. This may prompt further research into whether male survey participants show more concern about climate change than females in the remainder of the survey.
Also, we note that the most frequent words by female respondents occur in nearly 30% of responses whereas for the males it's just over 20%. Perhaps there is less consistency in what males are mentioning?
For the bigrams, we can identify that
Now, we'll look at the Concept Network for the 4 top words in female and male responses. These words are 'sota', 'köyhyys', nälänhätä', and 'ilmastonmuutos'.
fst_concept_network_compare( fst_dev_coop, field = 'gender', concepts = "köyhyys, nälänhätä, sota, ilmastonmuutos", threshold = 1, exclude_nulls = TRUE )
Remarks:
Interestingly, our Concept Network plots of these top 4 words show that there is a much larger variety of terms used by female respondents who mention these 4 common responses.
Only part of this is explained by the fact that we have nearly 4 times as many responses from female participants since each of these plots have their vertex weights determined from only their cohort’s responses. Therefore, there may be other subgroups within the male respondents who discuss independent themes to these.
This tutorial demonstrates the use of finnsurveytext
to look at a single question and then consider whether gender impacts response. Gender is just one of many ways you could split the data and is used as an example in this context. It is likely an analyst would also want to split the data in other ways such as age, education level, etc.
Following the use of finnsurveytext
, we have a number of hypotheses, such as "Male survey respondents are more concerned about climate change than females", which could inform further analysis of the survey as a whole.
Finnish Children and Youth Foundation: Young People's Views on Development Cooperation 2012 [dataset]. Version 2.0 (2019-01-22). Finnish Social Science Data Archive [distributor]. http://urn.fi/urn:nbn:fi:fsd:T-FSD2821
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.