This exploratory data analysis (EDA) is comprised of three parts: (1) the univariate analysis which examines the frequencies and proportions of categorical variables and the centrality, variability, spread and shape of the distributions of quantitative variables, (2) the bivariate analysis which explores the relationships between the independent variables and audience scores, and (3) an association/correlation analysis which reveals any potential collinearity that may arise as a consequence of the relationships among the independent variables.
r kfigr::figr(label = "films_summary", prefix = TRUE, link = TRUE, type="Figure")
, shows the counts and percentages for three summary variables: the drama, the feature film, and the MPAA R rating indicator variables. The drama genre constituted nearly half of the observations in the sample. A plurality of films were R rated and nearly all films surveyed were indeed, feature films.
edaUni$qual$summary$plot
r kfigr::figr(label = "films_summary", prefix = TRUE, link = TRUE, type="Figure")
: Drama, features and R Rated Films
r kfigr::figr(label = "performance", prefix = TRUE, link = TRUE, type="Figure")
summarizes the counts and proportions of films in the sample which have achieved notability at the box office or with the Academy. Best Picture winners, top 200 box office earners, and Best Picture nominees claimed the top one, two, and three percent of the films, respectively. Films earning Best Director, Best Actress, and Best Actor Oscars were slightly less rarefied at 7%, 11%, and 14% of the sample respectively.
edaUni$qual$performance$plot
r kfigr::figr(label = "performance", prefix = TRUE, link = TRUE, type="Figure")
: Oscar Awards and Top 200 Box Office Class
The Oscar season, starting in October and lasting until 31 December, marks the period in which Hollywood studios release their more critically acclaimed films. As indicated in r kfigr::figr(label = "Oscar", prefix = TRUE, link = TRUE, type="Figure")
, approximately 30% of films in the dataset were released during the Oscar season.
edaUni$qual$season$plot
r kfigr::figr(label = "season", prefix = TRUE, link = TRUE, type="Figure")
: Oscar and Summer Season Releases
The summer season, which starts the first weekend of May and ends on Labour Day, accounts for a disproportionate share of Hollywood studios' annual box office revenue. Similarly, some 32% of the feature films in the dataset were launched during the summer months.
years <- edaUni$qual$year$data low <- years %>% filter(N == min(N)) ave <- years %>% summarize(Mean = mean(N)) med <- years %>% summarize(Median = median(N)) high <- years %>% filter(N == max(N)) last <- years %>% filter(Category == "2014")
The dataset contained some r nrow(preprocessed)
feature films released between 1970 and 2014. As presented in r kfigr::figr(label = "year", prefix = TRUE, link = TRUE, type="Figure")
, the number of films in the sample by year of release, tended to grow somewhat linearly from r low$N
film in 1970 to a peak of approximately r high$N[1]
films in 2006 and 2007, then drops until settling at r last$N
films in 2014. The number of films per year centered at a mean and median of r round(ave$Mean,1)
and r med$Median
films, respectively.
edaUni$qual$year$plot
r kfigr::figr(label = "year", prefix = TRUE, link = TRUE, type="Figure")
: Theatrical Releases by Year
Moving on to the quantitative variables, critics' score, ranging from 1 to 100, was obtained from the Rotten Tomatoes website and its summary statistics are described below in r kfigr::figr(label = "critics_score_stats", prefix = TRUE, link = TRUE, type="Table")
.
r kfigr::figr(label = "critics_score_stats", prefix = TRUE, link = TRUE, type="Table")
: Critics score summary statistics
knitr::kable(edaUni$quant$critics_score$stats, digits = 2) %>% kableExtra::kable_styling(bootstrap_options = c("hover", "condensed", "responsive"), full_width = T, position = "center")
The distribution of critics scores represented in r kfigr::figr(label = "critics_score_dist", prefix = TRUE, link = TRUE, type="Figure")
and further supported by r kfigr::figr(label = "critics_score_box", prefix = TRUE, link = TRUE, type="Figure")
departs rather substantively from normality. That said, Bayesian inference does not rely upon an assumption of normality with respect to the distribution of predictors.
gridExtra::grid.arrange(edaUni$quant$critics_score$hist, edaUni$quant$critics_score$qq, ncol = 2)
r kfigr::figr(label = "critics_score_dist", prefix = TRUE, link = TRUE, type="Figure")
: Critics score histogram and QQ Plot
edaUni$quant$critics_score$box
r kfigr::figr(label = "critics_score_box", prefix = TRUE, link = TRUE, type="Figure")
: Critics score box plot
Central Tendency: r kfigr::figr(label = "critics_score_stats", prefix = TRUE, link = TRUE, type="Table")
reports that r edaUni$quant$critics_score$central
Dispersion: r edaUni$quant$critics_score$disp
Shape of Distribution: r edaUni$quant$critics_score$skew
r edaUni$quant$critics_score$kurt
The histogram and QQ plot in r kfigr::figr(label = "critics_score_dist", prefix = TRUE, link = TRUE, type="Figure")
reveals a left skewed distribution that departs from normality. Fortunately, Bayesian inference is not based upon an assumption of normality of predictors.
Outliers: The box plot in r kfigr::figr(label = "critics_score_box", prefix = TRUE, link = TRUE, type="Figure")
, which graphically depicts the median, the IQR, and maximum and minimum values, suggested that r ifelse(nrow(edaUni$quant$critics_score$outliers) == 0, "no", " ")
outliers were extant. r edaUni$quant$critics_score$out
This variable, obtained from the IMDb website represents the number of IMDb votes cast for each film.
r kfigr::figr(label = "imdb_votes_stats", prefix = TRUE, link = TRUE, type="Table")
: IMDb votes summary statistics
knitr::kable(edaUni$quant$imdb_num_votes$stats, digits = 2) %>% kableExtra::kable_styling(bootstrap_options = c("hover", "condensed", "responsive"), full_width = T, position = "center")
gridExtra::grid.arrange(edaUni$quant$imdb_num_votes$hist, edaUni$quant$imdb_num_votes$qq, ncol = 2)
r kfigr::figr(label = "imdb_votes_dist", prefix = TRUE, link = TRUE, type="Figure")
: IMDb votes histogram and QQ Plot
edaUni$quant$imdb_num_votes$box
r kfigr::figr(label = "imdb_votes_box", prefix = TRUE, link = TRUE, type="Figure")
: IMDb votes box plot
Central Tendency: The summary statistics (r kfigr::figr(label = "imdb_votes_stats", prefix = TRUE, link = TRUE, type="Table")
) show that r edaUni$quant$imdb_num_votes$central
Dispersion: r edaUni$quant$imdb_num_votes$disp
Shape of Distribution: r edaUni$quant$imdb_num_votes$skew
r edaUni$quant$imdb_num_votes$kurt
The histogram and QQ plot in r kfigr::figr(label = "imdb_votes_dist", prefix = TRUE, link = TRUE, type="Figure")
reveal a distribution which departs significantly from normality.
Outliers: The box plot in r kfigr::figr(label = "imdb_votes_box", prefix = TRUE, link = TRUE, type="Figure")
, which graphically depicts the median, the IQR, and maximum and minimum values, suggested that r ifelse(nrow(edaUni$quant$imdb_num_votes$outliers) == 0, "no", " ")
outliers were extant. r edaUni$quant$imdb_num_votes$out
This was a log transformation of the IMDb votes variable.
r kfigr::figr(label = "imdb_votes_log_stats", prefix = TRUE, link = TRUE, type="Table")
: Log IMDb votes summary statistics
knitr::kable(edaUni$quant$imdb_num_votes_log$stats, digits = 2) %>% kableExtra::kable_styling(bootstrap_options = c("hover", "condensed", "responsive"), full_width = T, position = "center")
gridExtra::grid.arrange(edaUni$quant$imdb_num_votes_log$hist, edaUni$quant$imdb_num_votes_log$qq, ncol = 2)
r kfigr::figr(label = "imdb_votes_log_dist", prefix = TRUE, link = TRUE, type="Figure")
: Log IMDb votes histogram and QQ Plot
edaUni$quant$imdb_num_votes_log$box
r kfigr::figr(label = "imdb_votes_log_box", prefix = TRUE, link = TRUE, type="Figure")
: Log IMDb votes box plot
Central Tendency: The summary statistics (r kfigr::figr(label = "imdb_votes_log_stats", prefix = TRUE, link = TRUE, type="Table")
) report that r edaUni$quant$imdb_num_votes_log$central
Dispersion: r edaUni$quant$imdb_num_votes_log$disp
Shape of Distribution: r edaUni$quant$imdb_num_votes_log$skew
r edaUni$quant$imdb_num_votes_log$kurt
The histogram and QQ plot in r kfigr::figr(label = "imdb_votes_log_dist", prefix = TRUE, link = TRUE, type="Figure")
reveal a nearly normal distribution.
Outliers: The box plot in r kfigr::figr(label = "imdb_votes_log_box", prefix = TRUE, link = TRUE, type="Figure")
, which graphically depicts the median, the IQR, and maximum and minimum values, suggested that r ifelse(nrow(edaUni$quant$imdb_num_votes_log$outliers) == 0, "no", " ")
outliers were extant. r edaUni$quant$imdb_num_votes_log$out
This variable captured the IMDb rating for each film
r kfigr::figr(label = "imdb_rating_stats", prefix = TRUE, link = TRUE, type="Table")
: IMDb rating summary statistics
knitr::kable(edaUni$quant$imdb_rating$stats, digits = 2) %>% kableExtra::kable_styling(bootstrap_options = c("hover", "condensed", "responsive"), full_width = T, position = "center")
gridExtra::grid.arrange(edaUni$quant$imdb_rating$hist, edaUni$quant$imdb_rating$qq, ncol = 2)
r kfigr::figr(label = "imdb_rating_dist", prefix = TRUE, link = TRUE, type="Figure")
: IMDb rating histogram and QQ Plot
edaUni$quant$imdb_rating$box
r kfigr::figr(label = "imdb_rating_box", prefix = TRUE, link = TRUE, type="Figure")
: IMDb rating box plot
Central Tendency: The summary statistics (r kfigr::figr(label = "imdb_rating_stats", prefix = TRUE, link = TRUE, type="Table")
) shows that r edaUni$quant$imdb_rating$central
Dispersion: r edaUni$quant$imdb_rating$disp
Shape of Distribution: r edaUni$quant$imdb_rating$skew
r edaUni$quant$imdb_rating$kurt
The histogram and QQ plot in r kfigr::figr(label = "imdb_rating_dist", prefix = TRUE, link = TRUE, type="Figure")
reveal a nearly normal distribution.
Outliers: The box plot in r kfigr::figr(label = "imdb_rating_box", prefix = TRUE, link = TRUE, type="Figure")
, which graphically depicts the median, the IQR, and maximum and minimum values, suggested that r ifelse(nrow(edaUni$quant$imdb_rating$outliers) == 0, "no", " ")
outliers were extant. r edaUni$quant$imdb_rating$out
This is an analysis of moving runtimes.
r kfigr::figr(label = "runtime_stats", prefix = TRUE, link = TRUE, type="Table")
: Runtime summary statistics
knitr::kable(edaUni$quant$runtime$stats, digits = 2) %>% kableExtra::kable_styling(bootstrap_options = c("hover", "condensed", "responsive"), full_width = T, position = "center")
gridExtra::grid.arrange(edaUni$quant$runtime$hist, edaUni$quant$runtime$qq, ncol = 2)
r kfigr::figr(label = "runtime_dist", prefix = TRUE, link = TRUE, type="Figure")
: Runtime histogram and QQ Plot
edaUni$quant$runtime$box
r kfigr::figr(label = "runtime_box", prefix = TRUE, link = TRUE, type="Figure")
: Runtime box plot
Central Tendency: The summary statistics (r kfigr::figr(label = "runtime_stats", prefix = TRUE, link = TRUE, type="Table")
) show that r edaUni$quant$runtime$central
Dispersion: r edaUni$quant$runtime$disp
Shape of Distribution: r edaUni$quant$runtime$skew
r edaUni$quant$runtime$kurt
The histogram and QQ plot in r kfigr::figr(label = "runtime_dist", prefix = TRUE, link = TRUE, type="Figure")
reveals a left skewed distribution that appears reasonably normal.
Outliers: The box plot in r kfigr::figr(label = "runtime_box", prefix = TRUE, link = TRUE, type="Figure")
, which graphically depicts the median, the IQR, and maximum and minimum values, suggested that r ifelse(nrow(edaUni$quant$runtime$outliers) == 0, "no", " ")
outliers were extant. r edaUni$quant$runtime$out
At last, the dependent variable, "audience_score" is examined.
r kfigr::figr(label = "audience_score_stats", prefix = TRUE, link = TRUE, type="Table")
: Audience Score Summary Statistics
knitr::kable(edaUni$quant$audience_score$stats, digits = 2) %>% kableExtra::kable_styling(bootstrap_options = c("hover", "condensed", "responsive"), full_width = T, position = "center")
gridExtra::grid.arrange(edaUni$quant$audience_score$hist, edaUni$quant$audience_score$qq, ncol = 2)
r kfigr::figr(label = "audience_score_dist", prefix = TRUE, link = TRUE, type="Figure")
: Audience Score Histogram and QQ Plot
edaUni$quant$audience_score$box
r kfigr::figr(label = "audience_score_box", prefix = TRUE, link = TRUE, type="Figure")
: Audience Score Box plot
Central Tendency: The summary statistics (r kfigr::figr(label = "audience_score_stats", prefix = TRUE, link = TRUE, type="Table")
) shows that r edaUni$quant$audience_score$central
Dispersion: r edaUni$quant$audience_score$disp
Shape of Distribution: r edaUni$quant$audience_score$skew
r edaUni$quant$audience_score$kurt
The histogram and QQ plot in r kfigr::figr(label = "audience_score_dist", prefix = TRUE, link = TRUE, type="Figure")
reveals a slightly right skewed distribution that approximates normality.
Outliers: The box plot in r kfigr::figr(label = "audience_score_box", prefix = TRUE, link = TRUE, type="Figure")
, which graphically depicts the median, the IQR, and maximum and minimum values, suggested that r ifelse(nrow(edaUni$audience_score$outliers) == 0, "no", " ")
outliers were extant. r edaUni$audience_score$out
Next, the relationships between the independent variables and audience scores are studied. The analysis continues with an exploration of the categorical variables vis-a-vis audience score, then an examination of the quantitative variables and the dependent variable.
The summary statistics in r kfigr::figr(label = "bivariate_best_actor_stats", prefix = TRUE, link = TRUE, type="Table")
evince very similar distributions of audience scores between those films which have won the best actor Oscar and those that had not.
r kfigr::figr(label = "bivariate_best_actor_stats", prefix = TRUE, link = TRUE, type="Table")
: Audience Scores by Best Actor Oscar Win Summary Statistics
knitr::kable(edaBi$best_actor_win$stats, digits = 2) %>% kableExtra::kable_styling(bootstrap_options = c("hover", "condensed", "responsive"), full_width = T, position = "center")
The box plot shown in r kfigr::figr(label = "bivariate_best_actor_box", prefix = TRUE, link = TRUE, type="Figure")
supports an initial impression that best actor Oscar winnings have no statistically significant association with audience scores.
edaBi$best_actor_win$boxPlot
r kfigr::figr(label = "bivariate_best_actor_box", prefix = TRUE, link = TRUE, type="Figure")
: Audience Scores by Best Actor Oscar Win
r edaBi$best_actor_win$statement
As such, the data does not support an association between best actor Oscar award and audience score.
Similarly, the summary statistics in r kfigr::figr(label = "bivariate_best_actress_stats", prefix = TRUE, link = TRUE, type="Table")
reveal almost identical distributions for audience score for both groups.
r kfigr::figr(label = "bivariate_best_actress_stats", prefix = TRUE, link = TRUE, type="Table")
: Audience Scores by Best Actress Oscar Win Summary Statistics
knitr::kable(edaBi$best_actress_win$stats, digits = 2) %>% kableExtra::kable_styling(bootstrap_options = c("hover", "condensed", "responsive"), full_width = T, position = "center")
Again, the box plot shown in r kfigr::figr(label = "bivariate_best_actress_box", prefix = TRUE, link = TRUE, type="Figure")
graphically supports an assertion of little to no association between best actress Oscar winning and audience scores.
edaBi$best_actress_win$boxPlot
r kfigr::figr(label = "bivariate_best_actress_box", prefix = TRUE, link = TRUE, type="Figure")
: Audience Scores by Best Actress Oscar Win
r edaBi$best_actress_win$statement
The summary statistics shown in r kfigr::figr(label = "bivariate_best_director_stats", prefix = TRUE, link = TRUE, type="Table")
suggest that films which have been awarded the best director Oscar are also associated with higher average audience scores.
r kfigr::figr(label = "bivariate_best_director_stats", prefix = TRUE, link = TRUE, type="Table")
: Audience Scores by Best Director Oscar Win Summary Statistics
knitr::kable(edaBi$best_dir_win$stats, digits = 2) %>% kableExtra::kable_styling(bootstrap_options = c("hover", "condensed", "responsive"), full_width = T, position = "center")
The box plot in r kfigr::figr(label = "bivariate_best_director_box", prefix = TRUE, link = TRUE, type="Figure")
reveals a slightly higher center for audience scores among those films with best director acclaim.
edaBi$best_dir_win$boxPlot
r kfigr::figr(label = "bivariate_best_director_box", prefix = TRUE, link = TRUE, type="Figure")
: Audience Scores by Best Director Oscar Win
r edaBi$best_dir_win$statement
Though winning films tended to have slightly higher audience scores, the data do not indicate a statistically significant difference.
The association between Oscar performance and audience scores becomes extant for the first time with the best picture nomination. The summary statistics shown in r kfigr::figr(label = "bivariate_best_pic_nom_stats", prefix = TRUE, link = TRUE, type="Table")
reveal a sizable difference in average and median audience scores between the films so nominated and those that were not.
r kfigr::figr(label = "bivariate_best_pic_nom_stats", prefix = TRUE, link = TRUE, type="Table")
: Audience Scores by Best Picture Nomination Summary Statistics
knitr::kable(edaBi$best_pic_nom$stats, digits = 2) %>% kableExtra::kable_styling(bootstrap_options = c("hover", "condensed", "responsive"), full_width = T, position = "center")
The box plot in r kfigr::figr(label = "bivariate_best_pic_nom_box", prefix = TRUE, link = TRUE, type="Figure")
supports a priori hypothesis that best picture nominations are associated with higher audience scores.
edaBi$best_pic_nom$boxPlot
r kfigr::figr(label = "bivariate_best_pic_nom_box", prefix = TRUE, link = TRUE, type="Figure")
: Audience Scores by Best Picture Nomination
r edaBi$best_pic_nom$statement
Indeed the data support an association between best picture nomination and audience scores.
As one my expect, given prior results, the summary statistics shown in r kfigr::figr(label = "bivariate_best_pic_win_stats", prefix = TRUE, link = TRUE, type="Table")
suggests an association between Oscar best picture acclaim and higher audience scores.
r kfigr::figr(label = "bivariate_best_pic_win_stats", prefix = TRUE, link = TRUE, type="Table")
: Audience Scores by Best Picture Oscar Summary Statistics
knitr::kable(edaBi$best_pic_win$stats, digits = 2) %>% kableExtra::kable_styling(bootstrap_options = c("hover", "condensed", "responsive"), full_width = T, position = "center")
Similarly, the box plot in r kfigr::figr(label = "bivariate_best_pic_win_box", prefix = TRUE, link = TRUE, type="Figure")
clarifies the potential association between Oscar performance and audience scores.
edaBi$best_pic_win$boxPlot
r kfigr::figr(label = "bivariate_best_pic_win_box", prefix = TRUE, link = TRUE, type="Figure")
: Audience Scores by Best Picture Oscar
r edaBi$best_pic_win$statement
Surely, the data do in fact support an association between best picture award and audience scores.
The summary statistics in r kfigr::figr(label = "bivariate_drama_stats", prefix = TRUE, link = TRUE, type="Table")
report a slight difference in the central audience scores between dramas and non-drama films.
r kfigr::figr(label = "bivariate_drama_stats", prefix = TRUE, link = TRUE, type="Table")
: Audience Scores by Genre Summary Statistics
knitr::kable(edaBi$drama$stats, digits = 2) %>% kableExtra::kable_styling(bootstrap_options = c("hover", "condensed", "responsive"), full_width = T, position = "center")
The box plot in r kfigr::figr(label = "bivariate_drama_box", prefix = TRUE, link = TRUE, type="Figure")
also shows a slight tendency towards higher audience scores for dramas, but it significant?
edaBi$drama$boxPlot
r kfigr::figr(label = "bivariate_drama_box", prefix = TRUE, link = TRUE, type="Figure")
: Audience Scores by Genre
r edaBi$drama$statement
Dramas are indeed, associated with slightly higher audience scores.
The summary statistics in r kfigr::figr(label = "bivariate_feature_film_stats", prefix = TRUE, link = TRUE, type="Table")
expose a rather sizable difference in audience scores between feature films and other film types. In fact, feature films appear to be associated with significantly lower average audience scores.
r kfigr::figr(label = "bivariate_feature_film_stats", prefix = TRUE, link = TRUE, type="Table")
: Audience Scores by Film Type Summary Statistics
knitr::kable(edaBi$feature_film$stats, digits = 2) %>% kableExtra::kable_styling(bootstrap_options = c("hover", "condensed", "responsive"), full_width = T, position = "center")
The box plot in r kfigr::figr(label = "bivariate_feature_film_box", prefix = TRUE, link = TRUE, type="Figure")
visually characterizes the difference in the distribution of audience scores between the film types.
edaBi$feature_film$boxPlot
r kfigr::figr(label = "bivariate_feature_film_box", prefix = TRUE, link = TRUE, type="Figure")
: Audience Scores by Film Type
r edaBi$feature_film$statement
The data shows that non feature films are associated with higher audience scores.
The summary statistics in r kfigr::figr(label = "bivariate_mpaa_rating_R_stats", prefix = TRUE, link = TRUE, type="Table")
suggest very similar distributions of audience scores between rated R and other films.
r kfigr::figr(label = "bivariate_mpaa_rating_R_stats", prefix = TRUE, link = TRUE, type="Table")
: Audience Scores by MPAA Rating Summary Statistics
knitr::kable(edaBi$mpaa_rating_R$stats, digits = 2) %>% kableExtra::kable_styling(bootstrap_options = c("hover", "condensed", "responsive"), full_width = T, position = "center")
The box plot in r kfigr::figr(label = "bivariate_mpaa_rating_R_box", prefix = TRUE, link = TRUE, type="Figure")
shows a slightly lower central audience score for R rated films.
edaBi$mpaa_rating_R$boxPlot
r kfigr::figr(label = "bivariate_mpaa_rating_R_box", prefix = TRUE, link = TRUE, type="Figure")
: Audience Scores by MPAA Rating
r edaBi$mpaa_rating_R$statement
As such the data do not support an association between audience scores and MPAA R ratings.
Again, the summary statistics in r kfigr::figr(label = "bivariate_Oscar_season_stats", prefix = TRUE, link = TRUE, type="Table")
report nearly identical distributions of audience scores between films released during the Oscar season and those that were not.
r kfigr::figr(label = "bivariate_Oscar_season_stats", prefix = TRUE, link = TRUE, type="Table")
: Audience Scores and Oscar Season Release Summary Statistics
knitr::kable(edaBi$oscar_season$stats, digits = 2) %>% kableExtra::kable_styling(bootstrap_options = c("hover", "condensed", "responsive"), full_width = T, position = "center")
The box plot in r kfigr::figr(label = "bivariate_Oscar_season_box", prefix = TRUE, link = TRUE, type="Figure")
echo the summary statistics.
edaBi$oscar_season$boxPlot
r kfigr::figr(label = "bivariate_Oscar_season_box", prefix = TRUE, link = TRUE, type="Figure")
: Audience Scores and Oscar Season Release
r edaBi$oscar_season$statement
As such the data do not support an association between audience scores and Oscar season release dates.
Similarly, the summary statistics in r kfigr::figr(label = "bivariate_summer_season_stats", prefix = TRUE, link = TRUE, type="Table")
report nearly identical distributions of audience scores between films released during the summer season and those that were not.
r kfigr::figr(label = "bivariate_summer_season_stats", prefix = TRUE, link = TRUE, type="Table")
: Audience Scores and Summer Season Release Summary Statistics
knitr::kable(edaBi$summer_season$stats, digits = 2) %>% kableExtra::kable_styling(bootstrap_options = c("hover", "condensed", "responsive"), full_width = T, position = "center")
The box plot in r kfigr::figr(label = "bivariate_summer_season_box", prefix = TRUE, link = TRUE, type="Figure")
backs the summary statistics.
edaBi$summer_season$boxPlot
r kfigr::figr(label = "bivariate_summer_season_box", prefix = TRUE, link = TRUE, type="Figure")
: Audience Scores and Summer Season Release
r edaBi$summer_season$statement
Therefore the data do not indicate that an association between audience scores and summer release dates is extant.
As one might conjecture, the top 200 box office films might be associated with higher audience scores, almost be definition. This is indicated by the summary statistics in r kfigr::figr(label = "bivariate_top200_box_stats", prefix = TRUE, link = TRUE, type="Table")
which show significantly higher central audience scores for the highest grossing films.
r kfigr::figr(label = "bivariate_top200_box_stats", prefix = TRUE, link = TRUE, type="Table")
: Audience Scores and Summer Season Release Summary Statistics
knitr::kable(edaBi$top200_box$stats, digits = 2) %>% kableExtra::kable_styling(bootstrap_options = c("hover", "condensed", "responsive"), full_width = T, position = "center")
The box plot in r kfigr::figr(label = "bivariate_top200_box_box", prefix = TRUE, link = TRUE, type="Figure")
illuminates this difference.
edaBi$top200_box$boxPlot
r kfigr::figr(label = "bivariate_top200_box_box", prefix = TRUE, link = TRUE, type="Figure")
: Audience Scores and Summer Season Release
r edaBi$top200_box$statement
Notwithstanding, the data doesn't support the assertion that the highest grossing films are more popular from an audience score perspective.
The scatter plot (r kfigr::figr(label = "bivariate_critics_score", prefix = TRUE, link = TRUE, type="Figure")
) indicates a positive correlation between critics score and audience score.
edaBi$critics_score$scatterPlot
r kfigr::figr(label = "bivariate_critics_score", prefix = TRUE, link = TRUE, type="Figure")
: Audience Score and Critics Score
r edaBi$critics_score$statement
The scatter plot (r kfigr::figr(label = "bivariate_imdb_num_votes", prefix = TRUE, link = TRUE, type="Figure")
) indicates a moderate positive correlation between the number of IMDB votes and audience score.
edaBi$imdb_num_votes$scatterPlot
r kfigr::figr(label = "bivariate_imdb_num_votes", prefix = TRUE, link = TRUE, type="Figure")
: Audience Score and IMDB Num Votes
r edaBi$imdb_num_votes$statement
The scatter plot (r kfigr::figr(label = "bivariate_imdb_num_votes_log", prefix = TRUE, link = TRUE, type="Figure")
) reveals a slight positive correlation between the log of IMDB number of votes and audience score.
edaBi$imdb_num_votes_log$scatterPlot
r kfigr::figr(label = "bivariate_imdb_num_votes_log", prefix = TRUE, link = TRUE, type="Figure")
: Audience Score and IMDB Number of Votes (Log)
r edaBi$imdb_num_votes_log$statement
The scatter plot (r kfigr::figr(label = "bivariate_imdb_rating", prefix = TRUE, link = TRUE, type="Figure")
) suggests a strong positive correlation between IMDB rating and audience score.
edaBi$imdb_rating$scatterPlot
r kfigr::figr(label = "bivariate_imdb_rating", prefix = TRUE, link = TRUE, type="Figure")
: Audience Score and IMDB Rating
r edaBi$imdb_rating$statement
The scatter plot (r kfigr::figr(label = "bivariate_runtime", prefix = TRUE, link = TRUE, type="Figure")
) suggests a weak positive correlation between runtime and audience score.
edaBi$runtime$scatterPlot
r kfigr::figr(label = "bivariate_runtime", prefix = TRUE, link = TRUE, type="Figure")
: Audience Score and Runtime
r edaBi$runtime$statement
The scatter plot (r kfigr::figr(label = "bivariate_thtr_rel_year", prefix = TRUE, link = TRUE, type="Figure")
) suggests the lack of a correlation between the year of theatrical release and audience score.
edaBi$thtr_rel_year$scatterPlot
r kfigr::figr(label = "bivariate_thtr_rel_year", prefix = TRUE, link = TRUE, type="Figure")
: Audience Score and Year of Theatrical Release
r edaBi$thtr_rel_year$statement
Next, the significance and strength of associations between categorical variables were examined. Pairwise chi-squared and association tests were conducted to reveal the significance (p.value) and the strength, Cramer's V [@Cramer1946] of each association. The chi-square results summarized in r kfigr::figr(label = "chi", prefix = TRUE, link = TRUE, type="Table")
reveal several associations that could present as collinearity issues for regression. Focusing on those regressors most highly associated with audience score, the degree of association among Academy awarded films was significant. There was also a strong association between films that won Best Picture and those that were nominated. Both significant and strong associations were observed among films with theatrical releases in the Oscar and Summer seasons.
r kfigr::figr(label = "chi", prefix = TRUE, link = TRUE, type="Table")
: Chi-squared test of independence between categorical variables.
chi <- x2(preprocessed) vars <- data.frame(Terms = as.character(rownames(chi))) chi <- lapply(chi, function(x) { kableExtra::cell_spec(x, "html", bold = ifelse(x < .05 ,T,F)) }) chi <- cbind(vars, chi) knitr::kable(chi, escape = F) %>% kableExtra::kable_styling(bootstrap_options = c("hover", "condensed", "responsive"), full_width = T, position = "center")
r kfigr::figr(label = "cramer", prefix = TRUE, link = TRUE, type="Table")
: Cramer's V measure of association between categorical variables.
cv <- cramers(preprocessed) vars <- data.frame(Terms = as.character(rownames(cv))) cv <- lapply(cv, function(x) { kableExtra::cell_spec(x, "html", bold = ifelse(x > .3 ,T,F)) }) cv <- cbind(vars, cv) knitr::kable(cv, escape = F) %>% kableExtra::kable_styling(bootstrap_options = c("hover", "condensed", "responsive"), full_width = T, position = "center")
As shown in r kfigr::figr(label = "cp", prefix = TRUE, link = TRUE, type="Figure")
, the correlations among the quantitative variables did not surprise. As expected, a moderate correlation between critics scores and IMDb rating was observed.
cp <- pearsons(preprocessed) cp
r kfigr::figr(label = "cp", prefix = TRUE, link = TRUE, type="Figure")
: Correlations among quantitative predictors
Whereas acknowledgments of individual achievement from the Academy had no statistically significant correlation with audience scores, films that were nominated for, or won Best Picture, were associated with higher audience scores to a statistically significant degree. It would appear that audiences prefer teamwork. Non-feature films were also associated with higher audience scores. As one might expect, high positive correlations were extant between critics scores, IMDb ratings, and audience scores. Weak positive correlations between IMDb number of votes and runtime were observed and finally, audience scores saw a slight, but statistically significant downward trend over the time period observed. The associations between the qualitative and quantitative variables revealed potential collinearity among the Oscar award variables. There was a moderate to strong correlation between critics scores and IMDb ratings.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.