library(papaja)
library(rmarkdown)
library(kableExtra)
library(here)
source(here("R/stevens_etal_2021_rcode.R"))
r_refs(file = "r-references.bib")
my_citations <- cite_r(file = "r-references.bib", pkgs = c("bayestestR", "C50", "caret", "e1071", "foreach", "ggbeeswarm", "ggcorrplot", "here", "lme4", "papaja", "patchwork", "psych", "randomForest", "rpart", "tidymodels", "tidyverse", "vip"), withhold = FALSE)

Introduction

Imagine you are running late for work and you need to walk your dog quickly so it can relieve itself. Would you rather your dog be barking at passers by, straining to lunge at anything that moves or rather calmly walking on the leash, completing its business and coming back inside when called? Most would choose the latter, especially if you know the perils of dealing with an unruly dog. But in practice, we see the full range of training levels in dogs when in public spaces. Can these differences in training be attributed to individual differences among dogs, their owners, or their relationship? The aim of this study is to investigate what characteristics of dogs and their owners predict dog training success.

The benefits of having a trained dog are numerous to both the dog and owner. Trained dogs have better life outcomes, including fewer behavioral problems [@Jagoe.Serpell.1996; @Bennett.Rohlf.2007; @Kobelt.etal.2003a], less separation anxiety [@Clark.Boyer.1993; @Jagoe.Serpell.1996], and less competitive aggression towards other dogs [@Jagoe.Serpell.1996]. Training also increases how connected owners feel towards their dogs [@Clark.Boyer.1993]. Behavioral issues are one of the leading reasons for surrender to United States shelters [@Kwan.Bain.2013], and many of those animals are euthanized [@Rowan.Kartal.2018]. Training not only benefits dogs, their owners, and the human-animal relationship, but it is also a critical welfare issue that can decrease the number of dogs in shelters.

To improve success in training, we aimed to understand what characteristics of dogs and their owners are associated with training success. Previous research has investigated how the age of the dog, age of acquisition, prior experience with dogs, breed, and dog personality types influence training success [@Bennett.Rohlf.2007; @Kubinyi.etal.2009; @Hsu.Serpell.2003]. Much of the work in this area focuses on demographic (i.e., characteristics of populations such as age, sex/gender) or behavioral (i.e., characteristics that describe a dog or person's behavior) characteristics. Yet many other characteristics have not been well studied. In addition to standard dog and owner behavioral and demographic characteristics, we conducted an exploratory analysis to investigate which characteristics of dogs and their owners have the largest impact on training success.

Training success requires the interaction of three distinct components---dogs, their owners, and the interconnection between the two---and we assessed each component's effect on success. For the dog characteristics, we assessed owner-rated behavioral characteristics, including aggression, destructiveness, disobedience, excitability, and nervousness. These measures are important because higher levels of owner-reported disobedience and destructiveness are associated with lower training engagement [@Bennett.Rohlf.2007]. Furthermore, increased aggression and excitability correlate with more frequent use of punishment from owners [@Arhant.etal.2010]. Given the association between these characteristics and training engagement and methods, we investigated whether these characteristics extend to predicting training success.

Owner characteristics may also be an important part of predicting training success, and we explored both behavioral and cognitive characteristics. The behavioral characteristics include owner stress levels, optimism, and personality traits (i.e., extraversion, agreeableness, conscientiousness, stability, openness). Previous work has examined how owner personality relates to dog aggression [@Podberscek.Serpell.1997; @Daye.2011], dog behavior problems [@OFarrell.1995; @OFarrell.1997; @Dodman.etal.2018], dog separation anxiety [@Konok.etal.2015], and dog-human relationships [@Cavanaugh.etal.2008; @Schoberl.etal.2012; @Curb.etal.2013; @Chopik.Weaver.2019; reviewed in @Payne.etal.2015]. @Kis.etal.2012 explored the connection between owner personality and aspects of dog training and obedience. They found that owner personality (specifically neuroticism) was related to latency to follow commands. Conscientiousness could also be important for training success as it is associated with self-control, industriousness, responsibility, and reliability, all of which could be important in dog training. However, to our knowledge, no one has examined the effect of owner personality on training success.

Though owner behavioral characteristics are a common metric in many studies of dog behavior, few studies have assessed owners' cognitive abilities. Yet many aspects of cognitive ability are critical to good decision making, which could be relevant to training. In particular, cognitive reflection [the flexibility to inhibit an impulsive "wrong" decision to arrive at a correct solution; @Frederick.2005] and numeracy [the ability to comprehend numbers and assess risk; @Cokely.etal.2012] predict superior decision making across a range of contexts [@Sobkow.etal.2020]. This improved decision making may influence how people with high cognitive ability interact with and train their dogs. Therefore, we tested whether aspects of owner cognitive ability predict dog training success.

Finally, we assessed dog-owner characteristics including behavioral measures such as latency to complete a sit and a down command, amount of training prior to class, and the strength of the dog-owner relationship. A dog-owner characteristic can be distinguished from a dog or a human characteristic because both parties contribute. For example, the behavioral qualification of latency to sit requires the human to ask the dog for a cue but cannot be completed until the dog obeys. Training-related characteristics such as command-following and time spent training are likely predictors of training success. In addition, we investigated the quality of the dog-human relationship because it is related to many important components of success such as dog cognitive performance [@Topal.etal.1997], dog quality of life [@Marinelli.etal.2007], ownership satisfaction [@Herwijnen.etal.2018], and some elements of dog training such as class attendance and type of training aid [@Herwijnen.etal.2018].

To accompany our behavioral data, we collected saliva samples from the dogs to measure their levels of cortisol. Cortisol is a hormone released from the hypothalamic-pituitary-adrenal system that can indicate stress response in dogs [@Dreschel.Granger.2009], but not perfectly [@Cobb.etal.2016]. Although we intended to include cortisol levels and reactivity as predictors in our models, our low sampling success prevented this analysis (see Methods).

For our study, we partnered with a local trainer who taught Canine Good Citizen training classes (JM). The American Kennel Club's Canine Good Citizen program consists of 10 behaviors that dogs must exhibit to pass a national standardized behavioral qualification. The Canine Good Citizen test is meant to assess social skills with dogs and humans, responses to simple obedience commands, as well as touch tolerance. To our knowledge, no other studies have used Canine Good Citizen training success for their primary training measure. But given the widespread and fairly consistent use of this test, it offers a promising and potentially reliable measure of training. Upon enrolling in the Canine Good Citizen course, owners completed an online survey about themselves and their dog. Immediately before the first class meeting and after saliva collection, we video recorded each dog completing a sit and down command. The week after the final class meeting, owner and dog pairs were invited to take the Canine Good Citizen test.

Because this was an exploratory study with many potential predictors, we employed a machine-learning approach to data analysis. Machine learning is a powerful set of tools that can classify data by using predictors to predict responses [@Hastie.etal.2009]. We compared machine-learning algorithms to the standard statistical technique of regression analysis. These comparisons provide complementary approaches to measuring the importance of the dog, owner, and dog-owner characteristics as predictors of training success.

Methods

Participants

We recruited participants through the Prairie Skies Dog Training Canine Good Citizen classes from Jan 2018 $-$ Oct 2019. This resulted in data from r dim(all_data)[1] dogs. Of those, we collected complete survey data on r dim(survey_data)[1] dogs (r dog_sex_nums[2] male, r dog_sex_nums[1] female, ranging in age from less than 1 year old to r max(survey_data$dog_age_num, na.rm=T) years old) and owners (r owner_gender_nums[2] male, r owner_gender_nums[1] female). Of the r dim(all_data)[1] dogs, we collected saliva samples and assayed measurable levels of cortisol at least once from r cort_nums dogs and for all four samples in r all_cort_nums dogs. Of the r dim(survey_data)[1] dogs with survey data, r survey_cort_nums had at least one measurable sample of cortisol, and r all_survey_cort_nums had all four samples. Of all r dim(all_data)[1] dogs, r all_cgc_test took the Canine Good Citizen test during the study, while r survey_cgc_test of the r dim(survey_data)[1] dogs with survey data took the test.

Procedures

The class instructor (JM) recruited students in her Canine Good Citizen classes to participate in the study by completing a survey prior to the start of the first class. Most participants did so, but seven completed the survey after the first or second class. Research assistants attended the first and final (sixth) weekly class to record behavioral observations and collect saliva samples to assay cortisol. The week after the final class, the instructor scheduled a Canine Good Citizen test with an independent examiner.

Surveys

Participants completed an online Qualtrics survey at home that consisted of dog and owner demographics; questions about time spent with dog, training practices, and feeding/exercise; and a number of published scales (all questions are available as Supplementary Materials). Some of these scales included subscales for individual components (e.g., the personality scale included subscales for extraversion, agreeableness, etc.). Each scale or subscale was composed of multiple questions. To calculate an aggregated score for each scale and subscale, we calculated the mean response over all of the questions for that scale/subscale. We calculated Revelle's omega total ($\omega_{T}$) as our measure of internal consistency reliability of scales [@Revelle.Zinbarg.2008; @McNeish.2018].

@Bennett.Rohlf.2007 assessed dog behavior problems with 24 questions on a seven-point scale. Our first 24 participants were mistakenly tested on a five-point scale, so we z-transformed both the five- and seven-point scale data to analyze all participants on a similar scale. The scale included five subscales: disobedience (Revelle's $\omega_{T}$ = r round(dog_behavior_disobedient_reliability$omega.tot, 2)), aggression (Revelle's $\omega_{T}$ = r round(dog_behavior_aggressive_reliability$omega.tot, 2)), nervousness (Revelle's $\omega_{T}$ = r round(dog_behavior_nervous_reliability$omega.tot, 2)), destructiveness (Revelle's $\omega_{T}$ = r round(dog_behavior_destructive_reliability$omega.tot, 2)), and excitability (Revelle's $\omega_{T}$ = r round(dog_behavior_excitable_reliability$omega.tot, 2)).

@Hiby.etal.2004 assessed obedience and problem behaviors in dogs. Obedience was assessed on a five-point scale with seven specific tasks and an overall obedience score (Revelle's $\omega_{T}$ = r round(dog_obedience_reliability$omega.tot, 2)). Behavioral problems were assessed by participants indicating whether their dogs had never, previously, or currently shown 13 behavioral problems (Revelle's $\omega_{T}$ = r round(dog_problematic_behavior_reliability$omega.tot, 2)).

The Dog Impulsivity Assessment Scale [@Wright.etal.2011] assessed impulsivity in dogs using a five-point scale (plus "don’t know/not applicable"). The scale included 18 questions divided over three subscales (two questions are used in more than one subscale): behavioral regulation (Revelle's $\omega_{T}$ = r round(dias_behavioral_regulation_reliability$omega.tot, 2)), aggression (Revelle's $\omega_{T}$ = r round(dias_aggression_reliability$omega.tot, 2)), and responsiveness (Revelle's $\omega_{T}$ = r round(dias_responsiveness_reliability$omega.tot, 2)). Because we added this scale after we started collecting data, we only have impulsivity data on r nrow(survey_data_dias) of the r nrow(survey_data) dogs for which we have survey data, so we did not include this measure in the analysis.

The Monash Dog Owner Relationship Scale [@Dwyer.etal.2006] assessed human-dog relationships by measuring how frequently owners engage in nine activities with their dogs using a seven-point scale (Revelle's $\omega_{T}$ = r round(mdors_reliability$omega.tot, 2)).

The brief Big-Five personality scale [@Gosling.etal.2003] assessed owner personality using a five-point scale. The scale included 10 questions divided over five subscales: extraversion (Cronbach's $\alpha$ = r round(extraversion_reliability$total$raw_alpha, 2)), agreeableness (Cronbach's $\alpha$ = r round(agreeableness_reliability$total$raw_alpha, 2)), conscientiousness (Cronbach's $\alpha$ = r round(conscientiousness_reliability$total$raw_alpha, 2)), emotional stability (Cronbach's $\alpha$ = r round(stability_reliability$total$raw_alpha, 2)), and openness to experience (Cronbach's $\alpha$ = r round(openness_reliability$total$raw_alpha, 2)). While some of the internal consistency reliability values were low, (1) there were only two items per subscale (which forced us to calculate Cronbach's $\alpha$ for reliability, as we could not compute Revelle's $\omega_{T}$), (2) our values are similar to the original study, and (3) the test-retest reliability and convergent correlations with a ten-item inventory were quite high in the original study.

The Life Orientation Test Revised scale [@Scheier.etal.1994] assessed optimism in owners with 10 questions using a five-point scale (Revelle's $\omega_{T}$ = r round(lotr_reliability$omega.tot, 2)).

The Perceived Stress Scale [@Cohen.etal.1983] assessed owner stress with 10 questions using a five-point scale (Revelle's $\omega_{T}$ = r round(pss_reliability$omega.tot, 2)).

The Cognitive Reflection Task [@Frederick.2005] assessed cognitive reflection in owners with three multiple-choice questions (Revelle's $\omega_{T}$ = r round(crt_reliability$omega.tot, 2)). The Berlin Numeracy Test [@Cokely.etal.2012] assessed owner numeracy with four multiple choice questions (Revelle's $\omega_{T}$ = r round(numeracy_reliability$omega.tot, 2)). Scores for both tests were calculated by summing the number of correct responses. Because many participants skipped answering some of these questions, we coded missing responses as incorrect when calculating reliability. We summed the scores from these two tests to generate an index of cognitive ability.

Behavioral data collection and reliability

At the beginning of the first and last class session, we video recorded the dogs' responses to their owners giving the sit and down commands to assess their initial training levels. Coders recorded the time at which the owner gave each command and the time that each command was completed. For sit, that occurred when the dog's rear end was flush with the ground. For down, that occurred when the dog's chest was flush with the ground. We then subtracted these two times and rounded to the nearest whole second to calculate the latency for each command. If the latency was less than 1 s, it was scored as 0. We scored the session as missing data if any of the following occurred: the dog was already in the correct position when the command was given or the video did not allow for the determination of whether the command was given or completed (N${sit}$ = r survey_na_sit; N${down}$ = r survey_na_down). If the dog did not attempt to complete the command or attempted but failed to complete the command during the video, we scored that as a maximum time of 30 s.

We selected 15 of the r nrow(behavioral_data_sit3) videos that we recorded for five raters (including LW) to score. None of the raters were aware of the response variable outcomes for any dogs when they scored the videos. From their ratings, we assessed inter-rater reliability by calculating the intraclass correlation using a two-way random effects model for the average of five raters (ICC2k). Based on interpretations from @Koo.Li.2016, the ICC demonstrated excellent reliability for both sit (r round(sit_icc1$results$ICC[which(sit_icc1$results$type == "ICC2k")], 2) $\pm$ r round(sit_icc1$results$ICC[which(sit_icc1$results$type == "ICC2k")] - sit_icc1$results$'lower bound'[which(sit_icc1$results$type == "ICC2k")], 2)) and down (r round(down_icc1$results$ICC[which(down_icc1$results$type == "ICC2k")], 2) $\pm$ r round(down_icc1$results$ICC[which(down_icc1$results$type == "ICC2k")] - down_icc1$results$'lower bound'[which(down_icc1$results$type == "ICC2k")], 2)). LW then provided additional training to the other four raters and had them score another 15 videos. The reliability increased for both sit (r round(sit_icc2$results$ICC[which(sit_icc2$results$type == "ICC2k")], 2) $\pm$ r round(sit_icc2$results$ICC[which(sit_icc2$results$type == "ICC2k")] - sit_icc2$results$'lower bound'[which(sit_icc2$results$type == "ICC2k")], 2)) and down (r round(down_icc2$results$ICC[which(down_icc2$results$type == "ICC2k")], 2) $\pm$ r round(down_icc2$results$ICC[which(down_icc2$results$type == "ICC2k")] - down_icc2$results$'lower bound'[which(down_icc2$results$type == "ICC2k")], 2)). To score the videos for analysis, we split the 146 videos up among the four raters (not LW), each of whom rated between 66-80 videos. Every video (including the 30 used to calibrate ratings) was scored by two raters. We achieved good reliability for sit (r round(sit_icc3$results$ICC[which(sit_icc3$results$type == "ICC2k")], 2) $\pm$ r round(sit_icc3$results$ICC[which(sit_icc3$results$type == "ICC2k")] - sit_icc3$results$'lower bound'[which(sit_icc3$results$type == "ICC2k")], 2)) and excellent reliability for down (r round(down_icc3$results$ICC[which(down_icc3$results$type == "ICC2k")], 2) $\pm$ r round(down_icc3$results$ICC[which(down_icc3$results$type == "ICC2k")] - down_icc3$results$'lower bound'[which(down_icc3$results$type == "ICC2k")], 2)). However, if the scored latencies differed by more than 1 s between raters or if only one of the two raters scored a session as missing data, LW scored that session and replaced the most divergent score with her own. This occurred 39 times out of the r nrow(behavioral_data3) scoring events. We then calculated the mean (in seconds) of the two raters' scores as our measure of latency. We only used latencies from the first class session for our analyses.

Saliva collection and cortisol assays

We collected saliva samples immediately before (prior to collecting behavioral data on training levels) and after the first and last class meeting (6-9pm) at the class location. Our team used the SalivaBio Children’s Swab (Salimetrics LLC, State College, PA), a synthetic swab specifically designed to improve volume collection and increase participant compliance, and validated for use with salivary cortisol. We did not use any salivary stimulants or flavorings to induce salivation [@Dreschel.Granger.2009]. We placed the swab across the dog's tongue in front of their molars and had the dog chew on the swab for at least 30 sec. We then placed the swab in a Salimetrics polypropylene swab storage tube and stored the tube in a storage box that was transported in a cooler with ice packs to a $-20^\circ$C freezer before assaying. The samples were analyzed in three batches, about two months apart, using the High Sensitivity Salivary Cortisol Enzyme Immunoassay Kit (Salimetrics LLC, State College, PA) for quantitative determination of salivary cortisol levels (in $\mu$g/dL) without modification to the manufacturer’s protocols. On the day of assaying, saliva samples were thawed and centrifuged at 3500 rpm for 15 minutes to remove mucins. Samples were assayed in duplicate. Intra- and inter-assay coefficients were 4.97% and 5.22%, respectively, and assay sensitivity was 0.007 $\mu$g/dL. Of the 314 saliva samples collected, r cort_samples_nums were successfully analyzed (unsuccessful assays were primarily due to insufficient quantity of saliva, with three due to excessively high assay values). Given that we aimed to collect r 99*4 samples, our sample pool resulted in many missing data values. Because many machine-learning algorithms cannot work with missing data, we were not able to include any cortisol predictors in our analyses.

Ethics

All procedures were conducted in an ethical and responsible manner, in full compliance with all relevant codes of experimentation and legislation and were approved by the UNL Internal Review Board (protocol # 17922) and Institutional Animal Care and Use Committee (protocol # 1621). All participants offered consent to participate, and they acknowledged that de-identified data could be published publicly.

Data Analysis

This project used r my_citations for all of the analyses (package usage is described in the R script found in Supplementary Materials). The manuscript was created using rmarkdown [Version r packageVersion("rmarkdown"); @R-rmarkdown_a] and papaja [Version r packageVersion("papaja"); @R-papaja]. Data, analysis scripts, supplementary tables and figures, and the reproducible research materials are available in Supplementary Materials and at the Open Science Framework (https://osf.io/3p5vx/).

Response variables

We were interested in characteristics of dogs and their owners as well as dog-owner characteristics as predictors of success on the Canine Good Citizen test. Our response variable was test success and we scored two outcomes: passing (N=r survey_cgc_pass) and failing/not taking the test. We combined failing (N=r survey_cgc_fail) and not taking the test (N=r survey_cgc_notest) because few dogs failed the test, preventing a proper analysis of the data. Thus, our response variable is best interpreted as successful participation in the Canine Good Citizen test, though we shorten this to training success.

Machine-learning analysis

Prediction here means that models are fit to a subset of the data, then model parameters are fixed and used to predict new (out-of-sample) data [@Yarkoni.Westfall.2017]. There are a wide range of machine-learning algorithms available for classifying responses [@Hastie.etal.2009]. We have chosen to work with four algorithms, plus regression, based on their (1) frequent use in the machine-learning literature, (2) ability to extract predictor importance (see below), and (3) implementation in tidymodels, the R package we used to conduct the analysis. For clarity, we use algorithms to refer to the machine-learning algorithms only and models to refer to the algorithms plus regression.

We selected three decision-tree algorithms (CART, C5.0, random forest) and a neural network algorithm. CART (Classification and Regression Trees) is an algorithm that builds decision trees [@Furnkranz.2010] by starting with the predictor that best splits the data into the responses and then adds additional splits with other predictors that further divide the data until it classifies all cases [@Breiman.etal.1984]. C5.0 uses a related but different method for creating decision trees [@Quinlan.1993; @R-C50]. Random forest algorithms generate a large group of decision trees built on random subsets of predictors and aggregate predictions across those trees [@Breiman.2001a; @Sammut.Webb.2010]. Finally, neural networks are layers of nodes that link predictors to responses via weighted connections [@Laine.2003].

Predictor selection

We analyzed aggregated scores for scales or subscales and demographic information, resulting in r ncol(preprocessed_predictors_cgc) predictors (Table \ref{tab:cgc-table}). We did not include breed as a predictor because we had survey data on so few dogs (N=r dim(survey_data)[1]) and so many breeds (N=27, plus many mixed breeds). We analyzed the r ncol(numeric_predictors) numeric predictors (everything except dog sex and neuter status) for skewness (Figure S1) and log-transformed five predictors that were highly skewed (dog age, dog aggression, dog sit latency, dog down latency, and time spent training). We also imputed missing numeric values using the predictor mean (owner stress), converted factor values to dummy variables (dog sex and neuter status), and checked for near zero variance in all predictors. Because multicollinearity (highly correlated predictors) can be a problem for some machine-learning algorithms [@Kuhn.Johnson.2013], we computed pairwise correlations for all predictors to find predictors that were highly correlated with other predictors ($r$ > 0.7). The cognitive ability index was highly correlated with its two constituent scores (cognitive reflection and numeracy), so we removed the constituent scores since the index had more possible score values. Similarly, we removed the overall score for dog problem behaviors (Bennett & Rohlf) because it correlated with its subscale scores.

The r ncol(trimmed_predictors_cgc) remaining predictors were still too many to analyze, so we used a simple filter as a feature selection criteria to further restrict the set of predictors used in our analysis. Simple filters "screen the predictors to see if any have a relationship with the outcome prior to including them in a model" [@Kuhn.Johnson.2019, section 11.2]. While these filters are often a series of frequentist statistical tests (e.g., t-tests), we conducted a logistic regression for each predictor because our response variable (training success) is binary (we used the glm function in the lme4 package). We then estimated the Bayes factor for that predictor. A Bayes factor (BF) compares the weight of evidence for an alternative model relative to the null [@Wagenmakers.2007]. Specifically, we compared each model containing the predictor to an intercept-only model. We estimated Bayes factors by converting each model's Bayesian Information Criterion (BIC) using BF = $e^{(BIC_{null}-BIC_{alernative}) / 2}$ [@Wagenmakers.2007]. We only included predictors with BF > 0.33 because BF < 0.33 indicates at least moderate evidence for the null hypothesis (intercept-only model) over the alternative hypothesis (model with predictor). Thus, we kept all predictors in which the regression analysis did not eliminate as having the potential to influence the response. In addition to the machine-learning analyses, we conducted a traditional multiple regression analysis on these predictors and calculated the Bayes factors for these predictors to account for the multiple testing problem associated with computing separate regressions for each predictor.

predictor_order <- c("dog_age_num", "dog_sex_Male", "dog_neutered_Yes", "dog_behavior_bennett_aggressive_score", "dog_behavior_bennett_destructive_score", "dog_behavior_bennett_disobedient_score", "dog_behavior_bennett_excitable_score", "dog_behavior_bennett_nervous_score", "dog_behavior_bennett_overall_score", "dog_obedience_hiby_score", "dog_problematic_behaviors_hiby_score", "latency_sit_mean", "latency_down_mean", "lotr_score", "pss_score", "personality_agreeableness_score", "personality_conscientiousness_score", "personality_extraversion_score", "personality_openness_score", "personality_stability_score", "crt", "numeracy", "cognitive", "mdors_score", "time_train_dog_weekly_num")
predictor_bf_table <- left_join(data.frame(predictor = predictor_order), predictor_bf_cgc, by = "predictor") %>%  # order CGC BFs by predictor_order
  # left_join(predictor_bf_impulsivity, by = "predictor") %>%  # add impulsivity BFs
  mutate(predictor = fct_recode(predictor, "Dog age" = "dog_age_num", "Dog sex" = "dog_sex_Male", "Dog neutered" = "dog_neutered_Yes", "Dog aggression (Bennett \\& Rohlf)" =  "dog_behavior_bennett_aggressive_score", "Dog destructiveness (Bennett \\& Rohlf)" = "dog_behavior_bennett_destructive_score", "Dog disobedience (Bennett \\& Rohlf)" = "dog_behavior_bennett_disobedient_score", "Dog excitability (Bennett \\& Rohlf)" = "dog_behavior_bennett_excitable_score", "Dog nervousness (Bennett \\& Rohlf)" = "dog_behavior_bennett_nervous_score", "Dog problematic behaviors overall (Bennett \\& Rohlf)*" = "dog_behavior_bennett_overall_score", "Dog obedience (Hiby)" =  "dog_obedience_hiby_score", "Dog problematic behaviors (Hiby)" = "dog_problematic_behaviors_hiby_score", "Dog sit latency" = "latency_sit_mean", "Dog down latency" = "latency_down_mean", "Owner optimism" = "lotr_score", "Owner stress" = "pss_score", "Owner agreeableness" = "personality_agreeableness_score", "Owner conscientiousness" = "personality_conscientiousness_score", "Owner extraversion" = "personality_extraversion_score", "Owner openness" = "personality_openness_score", "Owner stability" = "personality_stability_score", "Owner cognitive reflection*" = "crt", "Owner numeracy*" = "numeracy",  "Owner cognitive ability" = "cognitive", "Dog-owner relationship" = "mdors_score", "Time spent training" = "time_train_dog_weekly_num"),  # recode predictors
         # bf.x = printnum(bf.x, digits = 2),
         bf = printnum(bf, digits = 2))  # round BFs

knitr::kable(predictor_bf_table, col.names = c("Predictor", "CGC BF"), booktabs = TRUE, linesep = "", caption = "Canine Good Citizen test Bayes factors for predictors", align = 'lr', escape = FALSE) %>%
  kable_styling(position = "center", font_size = 9)  %>% 
  add_footnote("These predictors were not used in the analysis.", notation = "symbol")

Model prediction

To calculate predictive accuracy and predictor importance, we applied a series of steps for all machine-learning algorithms and regression. We first split the data into training and testing sets via 10-fold cross-validation [@deRooij.Weeda.2020], using stratified sampling. This resulted in partitioning the data into 10 subsets of the data with comparable distributions of the response variable across all subsets. Numeric predictors were then scaled and centered within the splits. Each model was fitted on 9 of the 10 subsets and then the fitted parameters were used to predict the 10th subset. This analysis rotated through the other nine subsets such that each subset was used as a testing set once. We repeated this 10-fold cross validation 10 times, randomly re-partitioning the data set (with stratified responses) each time. From these repetitions, we calculated the mean predictive accuracy as the proportion of testing set responses correctly predicted by models fit on training sets.

We also fit each model on the full data set to generate estimates for predictor importance ["relative contribution of each input variable in predicting the response"; @Hastie.etal.2009] for each model using the vi function from the vip package [@R-vip]. Because each model has a different metric for importance, we scaled importance values, with the most important variable importance set to 100. Thus, for each model and predictor, we had importance measures scaled similarity across models.

Results

We collected survey and behavioral data on r nrow(survey_data) dogs: r survey_cgc_pass dogs passed the Canine Good Citizen test, r survey_cgc_fail dogs failed the test, and r survey_cgc_notest dogs did not take the test. We combined the dogs who failed or did not take the test into an 'unsuccessful' category to investigate which dog and owner characteristics best predicted successful completion of the Canine Good Citizen test. We first examined the pairwise relationships between predictors and training success using a series of single-factor logistic regressions (Table \ref{tab:cgc-table}). This resulted in r length(selected_predictors_cgc) predictors with Bayes factors greater than 0.33, meaning there was not evidence supporting the null hypothesis of no relationship with success (Figure \ref{fig:cgc-predictors}). These predictors included dog characteristics (disobedience), owner characteristics (cognitive ability, perceived stress, and extraversion), and dog-owner characteristics (time spent training, relationship quality). Based on their Bayes factors, dog disobedience and owner cognitive ability provided moderate evidence that they predicted the dog's success in the Canine Good Citizen test when tested with pairwise logistic regressions (Figure \ref{fig:cgc-predictors}). Combining the predictors into a multiple logistic regression indicated that training success was predicted by dog disobedience (Table S2; r apa_print(cgc_glm_fit)$full_result$Dog_disobedience), owner cognitive ability (r apa_print(cgc_glm_fit)$full_result$Owner_cognitive_ability), and owner stress (r apa_print(cgc_glm_fit)$full_result$Owner_stress).

(ref:cgc-predictors-cap) Effects of predictors on Canine Good Citizen training success. We conducted logistic regression analyses for each predictor. Open circles represent individual data points, curves represented fitted logistic regression lines, and the bands represent 95% confidence intervals for regression curves.

knitr::include_graphics(path = c(here("figures/cgc_plots.png")))

Though multiple regression is the standard model for investigating factors that influence response variables, we also used machine-learning techniques to further explore these factors. We had four machine-learning algorithms and logistic regression predict training success using the six predictors from the pairwise analysis. First, we examined the predictive accuracy of the models and found considerable differences across models (Figure S2), with C5.0 producing the highest accuracy (r printnum(cgc_model_accuracy$mean[which(cgc_model_accuracy$model == "C5.0")]*100, digits = 1)±r printnum(cgc_model_accuracy$ci[which(cgc_model_accuracy$model == "C5.0")]*100, digits = 1)%). Logistic regression (r printnum(cgc_model_accuracy$mean[which(cgc_model_accuracy$model == "Regression")]*100, digits = 1)±r printnum(cgc_model_accuracy$ci[which(cgc_model_accuracy$model == "Regression")]*100, digits = 1)%), random forest (r printnum(cgc_model_accuracy$mean[which(cgc_model_accuracy$model == "Random forest")]*100, digits = 1)±r printnum(cgc_model_accuracy$ci[which(cgc_model_accuracy$model == "Random forest")]*100, digits = 1)%), and neural networks (r printnum(cgc_model_accuracy$mean[which(cgc_model_accuracy$model == "Neural network")]*100, digits = 1)±r printnum(cgc_model_accuracy$ci[which(cgc_model_accuracy$model == "Neural network")]*100, digits = 1)%) yielded intermediate accuracy and CART (r printnum(cgc_model_accuracy$mean[which(cgc_model_accuracy$model == "CART")]*100, digits = 1)±r printnum(cgc_model_accuracy$ci[which(cgc_model_accuracy$model == "CART")]*100, digits = 1)%) performed worst.

With the regression and machine-learning models, we can calculate predictor importance, which offers a continuous measure of the contribution of each predictor to the predictive accuracy of the models. Figure \ref{fig:cgc-predictor-imp} shows the mean importance of each predictor for the Canine Good Citizen training success as well as predictor importance for each model. When aggregating across the models, owner cognitive ability is the most important predictor of training success. Dog disobedience was the second most important predictor, followed by training time. Some machine-learning algorithms found important predictors that regression did not favor (e.g., training time), and regression favored predictors not strongly favored by all algorithms (e.g., owner stress).

(ref:cgc-predictor-imp-cap) Predictor importance for Canine Good Citizen training success. The first panel represents the mean importance over all predictors (predictors ordered by mean importance). The remaining panels show importance for each predictor (panels ordered by predictor accuracy). Closed circles represent importance scores for each model and predictor.

knitr::include_graphics(path = c(here("figures/cgc_predictor_importance_algorithm.png")))

Discussion

Using logistic regression models and machine-learning algorithms, we found that characteristics of the dog (low levels of disobedience), the owner (high levels of certain cognitive abilities), and dog-owner interactions (more time spent training) were all important in predicting Canine Good Citizen training success. In terms of dog characteristics, disobedience (Bennett & Rohlf's (2007) disobedience subscale) predicted passing the test. This is perhaps not surprising as the Canine Good Citizen test focuses on simple obedience behaviors including sit, down, and stay. The Bennett & Rohlf disobedience subscale asks about good manners, sit, stay, come, and soiling in the house. Demographic information about the dog, including sex, age, and neuter status did not predict training success.

We found no strong predictive power of any owner personality dimensions. Surprisingly, the diligence required to consistently and successfully train a dog was not captured in the owner personality trait of conscientiousness. Also, neuroticism---a trait linked to command-following [@Kis.etal.2012]---was not related to training success. Perhaps the brief personality scale used here did not provide the most reliable measure of owner personality.

One of the strongest owner characteristics that predicted training success was cognitive ability. We combined the scores from two tests of cognitive ability: the Cognitive Reflection Test and the Berlin Numeracy Test. The Cognitive Reflection Test [@Frederick.2005] assesses the cognitive flexibility to inhibit falling for an obvious but incorrect solution to a problem instead reflecting deeply to find the correct solution. Cognitive reflection is associated with high-level reasoning, reduced cognitive biases, and superior decision making [@Sobkow.etal.2020]. The Berlin Numeracy Test assesses understanding and processing of probabilistic and statistical information, and it is crucial for interpreting risk and superior decision making [@Cokely.etal.2012; @Skagerlund.etal.2018]. Both measures capture cognitive performance above and beyond traditional measures of cognitive ability [@Sobkow.etal.2020]. We combined these two measures additively and found it to be one of the strongest predictors of training success potentially due to enhanced decision making. Owner cognitive ability may have a direct effect on training success by owners high in these certain aspects of cognitive ability making better decisions about selecting dogs that are likely to succeed in the test. They may research and select breed types or specific breeders that tend to have well-behaved or easily trainable dogs. If they adopt dogs, they may take more time to observe the dog's behavior or simply be better at selecting trainable dogs. Alternatively, the higher cognitive abilities may not directly result in training success. Instead, these cognitive abilities may be correlated with other characteristics that have more of a direct influence on training success. For instance, high cognitive ability owners may foresee the value of a well-trained dog and be more consistent and exert more time and effort in their training than lower cognitive ability owners. Relatedly, the cognitive ability scores may capture the participants' amount of effort exerted on the survey (rather than actual cognitive ability), which could also relate to the effort that they are willing to invest in training. Unfortunately, we do not have a measure of training time during the training class, but we would predict that cognitive ability would correlate with training effort, which, in turn, would predict training success. This result, however, was not predicted based on a theoretically driven framework. Thus, this exploratory result must be replicated to validate the findings.

Finally, we consider dog-owner interactions, that is, characteristics that require both dog and owner. The quality of the dog-human relationship is an important characteristic that is related to many important components such as dog cognitive performance [@Topal.etal.1997], dog quality of life [@Marinelli.etal.2007], ownership satisfaction [@Herwijnen.etal.2018], and some elements of dog training such as class attendance and type of training aid [@Herwijnen.etal.2018]. Though dog-owner relationship quality was included in the potential predictors for our study, it ranked second to last in terms of predictor importance, suggesting it did not strongly predict training success. However, other dog-owner interactions were important predictors: amount of time spent training the dog before the first class period. Dogs whose owners spent more time training their dog before the course started were more likely to pass the test. Again, training before the class began likely resulted in more time spent training during course, which would increase the likelihood of passing the test. This finding suggests that exposing dogs to training before formal classes could go a long way to improving their training success.

For this analysis, we combined dogs who failed the Canine Good Citizen test with those who did not take the test as our unsuccessful training outcome. While ideally we would exclude dogs who did not take the test, only r survey_cgc_fail dogs failed the test, which did not provide enough unsuccessful responses to properly analyze our data. We combined these two outcomes for this analysis because it is likely that some owners did not take the test because their dogs were not trained sufficiently to pass the test. However, owners may have avoided the test for other reasons, including they dropped out of the course, their dogs were not quite prepared for the test, or their schedule did not allow it. Therefore, we should interpret these results cautiously, and further larger-scale studies should replicate these methods to confirm the findings.

As a note of caution, this study only included the training success of dog/owner pairs who took a single trainer's class and were evaluated by a single examiner. Trainers likely vary dramatically in the philosophies and techniques used to teach owners how to work with their dogs [@Feng.etal.2018]. Also, examiners likely vary in the criteria used to establish test success. Our sample of dog owners were primarily female (r owner_gender_nums[1] of r dim(survey_data)[1] owners were female), which could potentially bias our results. Further, our sample of dogs and owners may be biased due to the self-selection of volunteers. This variability coupled with the exploratory nature of this study suggests that we should be cautious about generalizing these findings beyond the study sample, and further work should attempt to replicate the findings. The Canine Good Citizen program, however, can provide some degree of standardization in terms of a consistent measure of training. Given the large number of dogs participating in the program, this offers an interesting avenue for future research on dog training.

Machine learning

Machine learning is a powerful set of tools that can apply across a wide range of data [@Hastie.etal.2009; @Kuhn.Johnson.2013]. While it is commonly used in other fields, comparative psychology has been slow to pick up machine-learning methods, though they have been introduced in the field of animal behavior more generally [@Valletta.etal.2017]. Our field has traditionally relied on various forms of linear models [@Lindelov.2019] for our statistical analyses.

Machine learning opens up new ways of thinking about our data analysis. For instance, machine learning highlights the notion of prediction over explanation [@Yarkoni.Westfall.2017]. That is, most psychological studies attempt to explain patterns of data by fitting statistical models to them. However, often we really want to predict new data---we want to see if our models generalize beyond the data that we collected. Machine-learning approaches do this by training models to a subset of the data, fixing the parameters of the models based on the training data, using the trained models to predict new data, and measuring how well the trained models predict the test data. One key benefit to prediction over fitting is that it reduces bias in our conclusions [@Brighton.Gigerenzer.2015]. Typically, though, we do not have very large data sets, so the training and testing subsets may be rather small, thus creating a lot of variance. To reduce the variance, we can use cross-validation, where we repeatedly partition the data into training and testing subsets, fit the models, predict new data, and calculate a mean predictive accuracy over all of the repetitions [@deRooij.Weeda.2020]. Therefore, we reduce bias error by predicting new data and variance error by repeatedly sampling from our data.

Of course, prediction and cross-validation can be used with regression models as well as machine-learning algorithms [@deRooij.Weeda.2020]. So does machine learning offer more than regression? Our results suggest that it can. Because machine-learning models use completely different methods for classifying responses compared to regression, they can generate completely different results, which provides two benefits. First, machine-learning algorithms can predict responses better than regression. For example, we found that the decision-tree algorithm C5.0 dramatically outpredicted regression for Canine Good Citizen training success. If model prediction accuracy is important, some machine-learning models may outperform regression. Second, different methods in machine-learning models can allow them to discover distinct predictors that regression may overlook. By using exclusively regression analyses, we are limiting our understanding of the relationship between predictors and responses by focusing on a single set of assumptions and analytical techniques. Machine learning breaks us out of the constraints imposed by regression. This can be important in both confirming existing theories and developing new hypotheses.

There can be drawbacks to machine-learning approaches, however [@Adjerid.Kelley.2018; @Jacobucci.Grimm.2020]. Unlike linear models, many core machine-learning algorithms cannot handle missing data. Therefore, researchers must discard cases with missing data or impute missing values. It is likely that both of these strategies can bias results. Also, some machine-learning algorithms (along with linear models) perform poorly if predictors are highly correlated, or multicollinear [@Kuhn.Johnson.2013]. So some predictors need to be removed to minimize this. Further, some models do not perform well with large number of predictors, so filters must be used to remove extra predictors [@Kuhn.Johnson.2019]. We used a simple filter based on regression analyses, so our results could have been different if we did not use that filter. Finally, there are many machine-learning algorithms available, so it can be difficult to choose which algorithms to include in the analysis. Fortunately, there are a number of core algorithms that are used frequently and are well-understood mathematically [@Hastie.etal.2009; @Valletta.etal.2017]. Because of their usefulness and common use, they are relatively easy to implement in statistical packages such as R [@R-base] and JASP [@JASPTeam.2020].

Conclusion

In the present study, we conducted an exploration of dog, owner, and dog-owner characteristics that predict training success. We found that certain aspects of owner cognitive ability, dog disobedience, and time spent training were the most important factors in predicting training success. Therefore, dog, owner, and dog-owner characteristics were all important for completion of the Canine Good Citizen training program. Though dogs with owner-perceived problem behaviors and disobedience issues struggle, owners who put forth time, energy, and effort towards their goals are most likely to succeed in training. Assessing characteristics of dogs and owners can provide important insights into potential interventions and training techniques that may cater to the specific characteristics of dog-owner pairs for pet dogs and potentially working dogs.

Acknowledgments

This research was funded by a University of Nebraska-Lincoln College of Arts and Sciences Partnership Seed Grant. We would like to thank Kylie Hughes, Elise Thayer, Toria Biancalana, and McKenna Yohe for collecting saliva samples, Tierney Lorenz for her expertise on dog saliva collection, Jessica Calvi and the University of Nebraska-Lincoln Salivary Bioscience Laboratory for assaying saliva samples, and Billy Lim for comments on an early draft.

Declarations

Funding

This research was funded by a University of Nebraska-Lincoln College of Arts and Sciences Partnership Seed Grant.

Conflicts of interest

JRS and his dog completed one of the Prairie Skies Dog Training courses in this study starting with the second class session. He did not complete any of the surveys or behavioral tasks, so he has no data included in this analysis. JRS received donations for research and outreach activities from Prairie Skies Dog Training (owned by JM), Arnie's Pet Food Store, Green Spot, Kenl Inn, Nature's Logic, Nature's Variety, Nebraska Animal Medical Center, Norland Pure, Pepsi, Raising Cane's, Sirius Veterinary Orthopedic Center, Smarty Dog Training, Stubbs Chiropractic, and private donors. JM is the owner of Prairie Skies Dog Training and received payments for training services from the participants in this study.

Ethics approval

All procedures were conducted in an ethical and responsible manner, in full compliance with all relevant codes of experimentation and legislation and were approved by the UNL Internal Review Board (protocol # 17922) and Institutional Animal Care and Use Committee (protocol # 1621).

Consent to participate and for publication

All participants offered consent to participate, and they acknowledged that de-identified data could be published publicly.

Availability of data and material

All data files, data analysis scripts, and supplementary materials (surveys, tables, figures) are available at https://osf.io/3p5vx/.

Authors' contributions

The authors made the following contributions. Jeffrey R. Stevens: Conceptualization, Data Curation, Formal Analysis, Funding Acquisition, Investigation, Methodology, Project Administration, Resources, Software, Supervision, Visualization, Writing - Original Draft Preparation, Writing - Review & Editing; London M. Wolff: Formal Analysis, Investigation, Methodology, Writing - Original Draft Preparation, Writing - Review & Editing; Megan Bosworth: Investigation, Methodology, Writing - Review & Editing; Jill Morstad: Conceptualization, Resources, Writing - Review & Editing.

References



unl-cchil/dogobedience2021 documentation built on Dec. 23, 2021, 2:01 p.m.