opi_impact: Statistical assessment of impacts of a specified theme from a...
In opitools: Analyzing the Opinions in a Big Text Document

Description Usage Arguments Details Value References Examples

This function assesses the impacts of a theme (or subject) on the overall opinion computed for a DTD Different themes in a DTD can be identified by the keywords used in the DTD. These keywords (or words) can be extracted by any analytical means available to the users, e.g. word_imp function. The keywords must be collated and supplied this function through the theme_keys argument (see below).

1
2
3

opi_impact(textdoc, theme_keys=NULL, metric = 1,
fun = NULL, nsim = 99, alternative="two.sided",
quiet=TRUE)

`textdoc`	An `n` x `1` list (dataframe) of individual text records, where `n` is the total number of individual records.
`theme_keys`	(a list) A one-column dataframe (of any number of length) containing a list of keywords relating to the theme or secondary subject to be investigated. The keywords can also be defined as a vector of characters.
`metric`	(an integer) Specify the metric to utilize for the calculation of opinion score. Default: `1`. See detailed documentation in the `opi_score` function.
`fun`	A user-defined function given that parameter `metric` (above) is set equal to `5`. See detailed documentation in the `opi_score` function.
`nsim`	(an integer) Number of replicas (ESD) to generate. See detailed documentation in the `opi_sim` function. Default: `99`.
`alternative`	(a character) Default: `"two.sided"`, indicating a two-tailed test. A user can override this default value by specifying `“less”` or `“greater”` to run the analysis as one-tailed test when the observed score is located at the lower or upper regions of the expectation distribution, respectively. Note: for `metric=1`, the `alternative` parameter should be set equal to `"two.sided"` because the opinion score is bounded by both positive and negative values. For an opinion score bounded by positive values, such as when `metric = 2, 3 or 4`, the `alternative` parameter should be set as "greater", and set as "less" otherwise. If metric parameter is set equal to `5`, with a user-defined opinion score function (i.e. `fun` not NULL ), the user is required to determine the limits of the opinion scores, and set the `alternative` argument appropriately.
`quiet`	(TRUE or FALSE) To suppress processing messages. Default: `TRUE`.

This function calculates the statistical significance value (p-value) of an opinion score by comparing the observed score (from the opi_score function) with the expected scores (distribution) (from the opi_sim function). The formula is given as p = (S.beat+1)/(S.total+1), where S_total is the total number of replicas (nsim) specified, S.beat is number of replicas in which their expected scores are than the observed score (See further details in Adepeju and Jimoh, 2021).

Details of statistical significance of impacts of a secondary subject B on the opinion concerning the primary subject A.

(1) Adepeju, M. and Jimoh, F. (2021). An Analytical Framework for Measuring Inequality in the Public Opinions on Policing – Assessing the impacts of COVID-19 Pandemic using Twitter Data. https://doi.org/10.31235/osf.io/c32qh

# Application in marketing:

#`data` -> 'reviews_dtd'
#`theme_keys` -> 'refreshment_theme'

#RQ2a: "Do the refreshment outlets impact customers'
#opinion of the services at the Piccadilly train station?"

##execute function
output <- opi_impact(textdoc = reviews_dtd,
          theme_keys=refreshment_theme, metric = 1,
          fun = NULL, nsim = 99, alternative="two.sided",
          quiet=TRUE)

#To print results
print(output)

#extracting the pvalue in order to answer RQ2a
output$pvalue