paper.md
In MAnalytics/opitools: Analyzing the Opinions in a Big Text Document

title: "R-Opitools – An Opinion Analytical Tool for Big Digital Text Document (DTD)" date: "30th July 2021" bibliography: paper.bib output: pdf_document affiliations: - name: Crime and Well-being Big Data Centre, Manchester Metropolitan University, United Kingdom index: 1 tags: - digital text document - sentiment analysis - opinion mining - randomization testing authors: - name: Monsuru Adepeju orcid: 0000-0002-9006-4934 affiliation: 1

Statement of Need

Since the year 2000, various computational intelligence techniques have been developed for analyzing sentiments of users in the field of natural language processing (NLP). To date, the majority of the techniques as deployed across various fields, including social sciences [@Somasundaran:2010; @Nikolovska:2020; @Ansari:2020] and market research [@Feldman:2011; @Otaibi:2018], have focused largely on detecting subjectivity, and/or extracting and classifying sentiments and opinions in a text document. Building on this existing work, the current paper advances an opinion impact analytical tool, named Opitools, that not only extracts inherent themes from within a digital text document (DTD), but also evaluates the extent to which a specified theme may have contributed to the overall opinions expressed by the document. Based on this advancement, Opitools has wider applications in the aforementioned application fields. For example, in law enforcement, the package can be deployed to understand factors (themes) that drive public perception of police services [@Adepeju:2021]; and in product marketing, to identify factors that underlie customers satisfaction in a product.

Implementation

Having extracted a set of thematic keywords from a digital text document, the goal is to computationally classify the sentiments expressed in each text record into positive, negative or a neutral sentiment, using a lexicon-based classification approach [@Nielsen:2011; @Adepeju:2021]. The resulting sentiment scores are combined in order to estimate the overall opinion score of the document. To assess the impacts of a selected theme (or a subject) on the estimated opinion score, we simply ask the question; If expected opinion scores were generated under the null hypothesis, how likely would we be to find a score higher than the estimated score?. The question is answered by employing a non-parametric randomization testing strategy [@Fisher:1935; @Good:2006] which involves random re-assignment of sentiment labels of the original text document to derive the expectation distribution, which is then compared with the observed score to obtain the statistical significance of the impacts.

Key Functionalities

The package contains text exploratory functions for extracting themes from a digital text document. In order to conduct impact analysis, a user can draw on a number of interrelated functions to compute the required measures, such as the observed opinion score, the expectation distribution, and the statistical significance of impacts. Whilst different types of opinion score functions are embedded in the package, there is also a provision that allows a user to integrate his/her own pre-defined user score function. This feature is to further facilitate the uptake of the package in more application fields.

Acknowledgment

We gratefully acknowledge the Economic and Social Research Council (ESRC), who funded the Understanding Inequalities project (Grant Reference ES/P009301/1) through which this research was conducted.