Journal Information
Journal ID (publisher-id): jgi
ISSN: 1910-7595
Publisher: Centre for Addiction and Mental Health
Article Information
Article Categories: Original Research
Publication date: Spring 2020
Publisher Id: jgi.2020.44.6
DOI: 10.4309/jgi.2020.44.6

Evaluating the Reliability and Validity of the Short Gambling Harm Screen: Are Binary Scales Worse Than Likert Scales at Capturing Gambling Harm?

James McLauchlan Experimental Gambling Research Laboratory, School of Health, Medical and Applied Sciences, CQUniversity, Bundaberg, Queensland, Australia
Matthew Browne Experimental Gambling Research Laboratory, School of Health, Medical and Applied Sciences, CQUniversity, Bundaberg, Queensland, Australia
Alex M. T. Russell Experimental Gambling Research Laboratory, School of Health, Medical and Applied Sciences, CQUniversity, Sydney, New South Wales, Australia
Matthew Rockloff Experimental Gambling Research Laboratory, School of Health, Medical and Applied Sciences, CQUniversity, Bundaberg, Queensland, Australia


Gambling-related harm has become a key metric for measuring the adverse consequences of gambling on a population level. Yet, despite this renewed understanding in contemporary research, little exploration has been conducted to evaluate which instrument is best suited to capture the harmful consequences of gambling. This study was designed with the aim of determining whether Likert scales were better suited to capture gambling harm than binary scales. We hypothesized that the Short Gambling Harm Screen (SGHS), initially scored using a binary scale, would perform similarly to the alternate form that was Likertized for the purpose of this study. A corresponding comparison in the reverse direction was executed for the Problem Gambling Severity Index. The SGHS’s performance was assessed via a repeated-measures design in combination with three other measures of validity administered at the conclusion of the survey. In the end, we found that changing the scoring format (i.e., from binary to Likert) had negligible impact on the SGHS’s psychometric performance. We conclude that the original scoring method of the SGHS is not only appropriate but also no less suitable than Likert scales in measuring gambling harm.

Keywords: gambling harm, Short Gambling Harm Screen (SGHS), forced-choice binary, dichotomous scale, binary scale, Likert scale comparison, Problem Gambling Severity Index (PGSI)


Les dommages liés au jeu sont devenus une mesure clé pour évaluer les conséquences néfastes du jeu à l’échelle de la population. Pourtant, malgré cette compréhension renouvelée dans la recherche contemporaine, on effectue très peu d’exploration pour évaluer quel instrument est le mieux adapté pour comprendre les conséquences néfastes du jeu. Cette étude a été conçue dans le but de déterminer si les échelles de Likert étaient mieux adaptées que les échelles binaires pour saisir les dommages liés au jeu. Nous avons émis l’hypothèse que le dépistage rapide du jeu problématique (Short Gambling Harm Screen ou SGHS), initialement évalué à l’aide d’une échelle binaire, ne fonctionnera pas différemment de la forme de Likert alternative qui a été créée aux fins de cette étude. Une comparaison correspondante dans la direction inverse a été effectuée pour l'indice de gravité du jeu excessif (PGSI). Les performances du SGHS ont été évaluées par un plan de mesures répétées, combinés à trois autres mesures de validité administrées à la fin du sondage. En fin de compte, nous avons constaté que le changement du format de pointage (c.-à-d. du binaire au Likert) avait un impact négligeable sur le rendement psychométrique du SGHS. Nous concluons que la méthode de pointage originale du SGHS est non seulement appropriée, mais également non moins appropriée que les échelles de Likert pour évaluer les dommages liés au jeu.


Contemporary research has focused on gambling-related harm as a key metric of the negative impacts of gambling at the population level (Blaszczynski, 2009; Browne, Greer, Rawat, & Rockloff, 2017; Rodgers, Caldwell, & Butterworth, 2009; Sproston, Erens, & Orford, 2000). The emphasis on harm, rather than gambling disorders, recognizes that traditional measures such as the Problem Gambling Severity Index (PGSI; Ferris & Wynne, 2001) are not well suited to measure the impact of harm on a population level. The need for a new measure of harm was met by a new 10-item screen dedicated to measuring harm—the Short Gambling Harm Screen (SGHS; Browne, Goodwin, & Rockloff, 2017). However, Delfabbro and King (2017) have raised concerns regarding the use of binary scoring of each of the harm symptomology indicators. This dispute raises the question of whether a count of the presence of symptoms, as used by the SGHS, is inferior to measures that elicit degree of frequency or intensity with respect to gambling harm. The present study aimed to evaluate this question via a repeated measures design, in which the performance of the two response formats are compared on several psychometric criteria.

Harm-centred measurement approaches

A population health approach to gambling problems implies that harm, understood as a decrement to health and wellbeing, is the key outcome to be addressed. A corollary to this is that harm can occur on a continuum from mild to severe; and a practical observation is that prevalence is much lower at the severe end of the spectrum (Browne, Greer, et al., 2017). For instance, Raisamo, Mäkelä, Salonen, and Lintonen’s (2014) found that considerable harms were reported even at the lower end of gambling frequency and expenditure levels. A population study conducted in the UK has revealed similar trends, reporting individuals experiencing harms were most prevalent in the lower gambling consumption groups (Canale, Vieno, & Griffiths, 2016). In Australia, Browne and Rockloff (2018) conducted a study assessing the prevalence of harmful consequences across four problem-gambling risk categories, including no-risk, low-risk, moderate-risk, and problem gamblers. The data, again, showed that most gambling-related harms are much more common in combined categories of low-risk gamblers than the high-risk problem gamblers. Together, the evidence suggests there is merit in gauging population-level impact across the spectrum of harm, rather than relying solely on prevalence of problem gamblers as a proxy for harm.

Binary scales or Likert scales?

It is perhaps intuitively appealing to suppose that Likert scales are generally more reliable and accurate than a binary response format because of their potential for capturing more information. However, the extant research suggests this is not generally the case. Grassi et al.’s (2007) study provides a useful illustration. The authors replaced the Likert scales in the 36-item short-form health survey (SF-36) with forced-choice binary scales, and found that the answering format had “no substantial effect” on the test-retest reliability or internal consistency. In another study, Geldhof et al. (2015) compared the responses collected using both binary and Likert format of the Selection Optimisation and Compensation (SOC) questionnaire and concluded that the answering formats were practically interchangeable. Further, in a study published by Litong-Palima, Albers and Glückstad’s (2018) the binary format outperformed its Likert counterparts on measures of reliability. Considering research in the marketing context, binary scales have consistently demonstrated similar reliability to Likert scales (Dolnicar & Grün, 2013a; Dolnicar & Grün, 2013b Dolnicar, Grün, & Leisch, 2011; Dolnicar & Leisch, 2012). A common thread running through these studies is the findings that binary scale do not perform significantly differently from their Likert counterparts.

The lack of evidence for the superiority of Likert over binary response formats is counterintuitive considering the greater potential for informational content in an interval scale. Likert scales provide participants with the opportunity to choose from a range of responses to denote a degree of agreement, frequency, or severity. These ordered responses, typically between four to seven points (Adelson & McCoach, 2010), provide the potential for participants to indicate a more precise response to the probe. Nevertheless, there is an absence of guidelines on the way in which Likert scales are to be designed. For instance, there are several options for answer stems (e.g., likely, agree, most of the time, etc.). There is also no definitive way by which the resulting scores should be aggregated. For example, certain researchers advocate for the use of neutral mid-points (Raaijmakers, Van Hoof, ’t Hart, Verbogt, & Vollebergh, 2000; Velez & Ashworth, 2007), while others warn against them (Guy & Norvell, 1977; Wakita, Ueshima, & Noguchi, 2012). The optimal number of rating categories vary from two (McCallum, Keith, & Wiebe, 1988) to eleven (Cummins & Gullone, 2000; Leung, 2011). Certain researchers argue that reliability increases with the number of scale points (Lozano, García-Cueto, & Muñiz, 2008; Weng, 2004), while others have found evidence suggesting that reliability is largely independent of the number of scale points (Bendig, 1954; Komorita, 1963; Matell & Jacoby, 1971).

Theoretical considerations may also explain why Likert response formats do not, in practice, tend to perform better than their binary counterparts for many applications. Given that Likert items typically yield scores (e.g., 0, 1, 2, 3) that are then summed across items to create a scale score, this format requires the strong (item-response theoretic) assumption that each step in the ordered response represents an identical difference of degree on the hypothesized latent construct (Michell, 2012). Binary scales involve only the weaker assumption that the various items are similarly related to, or load onto, the underlying construct. It is also worth considering the higher degree of cognitive effort employed by respondents in answering with a Likert scale, and the degree to which differences in ordered responses might therefore reflect either noise, or a systematic bias in terms of minimising or maximising responses. Binary responses, such as reporting whether an event happened or alternatively whether a symptom is present, are arguably inherently more concrete and less ambiguous, and may therefore be less vulnerable to these forms of error.

Despite the heavy reliance on surveys as the main method for data collection on gambling harm, the question of response format has not yet been explored within gambling research. Even though the SGHS and the FocaL Adult Gambling Screen (FLAGS; Schellinck, Schrans, Schellinck, & Bliemel, 2015) are both scored using a binary response format, neither has been subject to a similar analysis in response to the concerns raised by Delfabbro and King (2017). The aim of the present study is, therefore, to examine the influence of different response formats have on the psychometric properties of the SGHS. More specifically, the research objective is to compare the reliability of the SGHS, initially scored using a binary scale, against a Likert version of SGHS to determine which scale format is more suited for capturing gambling harm. The present study hypotheses that psychometric performance of the binary SGHS will not differ substantially (i.e., the difference will be below the p < .05 threshold), in both reliability and validity, from the alternate Likert form.



Adult gamblers (n = 618) who gamble at least two to four times a month were recruited for this study through TurkPrime, a North American online research panel recruitment service. Participants who had missing answers (n = 42), showed pattern responding (n = 17), or scored greater than 2 standard deviations apart in their responses between the repeated measures were excluded (n = 4). Additional multivariate outliers (n = 23) were identified using Mahalanobis distance with a p < .05 threshold, and subsequently removed from the sample. A total of 532 (female = 204) participants aged from 18 to 87 (M = 42.07, SD = 13.13) were included for analysis. See Table 1 for the participant demographic characteristic summary.

Table 1 Ethnicity, Marital status, Income, and Employment Status Summary


Participants completed two tests with alternative forms of SGHS and PGSI over a one-week test-retest interval. They were randomly allocated to either complete the same form (i.e., Likert-Likert or Binary-Binary) or the alternative forms (i.e., Likert-Binary or Binary-Likert) at the one-week follow-up. Though the participants might complete different forms of the SGHS and PGSI across time one and time two, the forms did not differ in the same testing (i.e., if a participant received the Binary SGHS at time-one, they will also receive the binary PGSI at time-one). See Table 2 for a summary of the different permutations. Approximately 44% (n = 234) completed repeat assessment of the same form, while the remaining 56% (n = 298) completed the alternate form at follow-up. Participants also completed several other validation measures (described below) at the end of the one-week follow-up survey.

Table 2 Measures used in each permutation and the number of participants randomly allocated into each set


Participants were recruited to participate in two online surveys. They were compensated in the form of either reward points, cash or gift cards of their choice. This study was approved by the Central Queensland University Ethics Committee (approval number 0000021464), and informed consent was obtained at the outset of the first survey.


Each measure’s internal consistency was calculated using Cronbach’s alpha calculated on either the tetrachoric (for binary) or polychoric (for Likert) item correlation matrix. The SGHS’s test-retest reliability, alternate-form reliability, convergent validity and discriminant validity was computed using Spearman correlations. Comparisons between two forms of SGHS was done using Fisher’s Z test (Myers & Sirois, 2006), which provides a test of significance between nonparametric correlation coefficients by converting them into standardized (z) scores (Zar, 2005).


In addition to the SGHS and PGSI, several measures were included to assess external validity for each version of the scale. The Kessler Psychological Distress Scale (K6; Kessler et al., 2002) and the Personal Wellbeing Index (PWI; Cummins, 1997) served as outcome-oriented validity; that is, harms and problems should be associated with increased distress and lower wellbeing. Additionally, impulsivity is a known risk factor for both gambling problems and harm (Browne et al., 2019; Russell, Hing, Li, & Vitartas, 2019), and therefore trait-impulsivity was measured using the Barratt Impulsiveness Scale-Brief (BIS-Brief; Steinberg, Sharp, Stanford, & Tharp, 2013).

Gambling Harm

The SGHS (Browne, Goodwin et al., 2017) is designed to assess a respondent’s degree of gambling-related harm. The original SGHS is scored on a forced-choice binary scale and contains 10 items derived from a comprehensive 72-item checklist (Browne, Goodwin et al., 2017, Langham et al., 2015). The SGHS includes harmful consequences that are more prevalent (e.g., “decreased savings” or “sold personal items”) and uses a binary response scale. It has been shown to be a good proxy of the full checklist (r = .94) and enjoys a negative linear relationship to wellbeing (Browne, Goodwin et al., 2017). This scale was compared with a four-point Likert version of the same measure with 0 being “never,” 1 denoting “sometimes,” 2 representing “most of the time,” and 3 indicating “almost always.” Alpha reliability in the present study for the original binary SGHS was .95 at both times one and two. The Likert-scored SGHS alpha reliability was also identical across surveys: .97 at time one and two. Refer to Table 6 for the combined response distribution of both forms of SGHS.

Problem Gambling

Participants completed the PGSI (Ferris and Wynne, 2001), a standard tool for assessing degree of gambling problems in surveys. The nine-item PGSI contains questions such as “has gambling caused you any health problems, including stress or anxiety?” It is scored on a four-point Likert scale, with 0 representing “never” and 3 indicating “almost always.” An alternate binary form of the PGSI was also calculated for this study. Alpha reliability in this study for the standard (Likert) PGSI was .96 at time one and .98 at time two; the alternative binary version had a reliability of .89 at both time one and two. See Table 5 for the response distribution of the PGSI.

Psychological Distress

The Kessler Screening Scale for Psychological Distress (K6; Kessler et al., 2002) was chosen to measure the presence of distress among the participants. The K6 consists of six items scored on a five-point Likert scale from 0 (“none of the time”) to 4 (“all the time”). Coefficient alpha in the current study was .94. As noted above, the SGHS should predict greater psychological distress (Brown, Oldenhoff, Allen, & Dowling, 2016).

Personal Wellbeing

The PWI, adapted from the Comprehensive Quality of Life Scale (Cummins, 1997) was used to measure general life satisfaction. It is an eight-item questionnaire designed to measure multiple domains associated with quality of life, including living standards, health, achievements, safety, sense of belonging and future prospects. These items are scored on an 11-point scale, with 0 denoting “no satisfaction at all” and 10 indicating “completely satisfied.” Alpha reliability in this study was .93. The SGHS should predict lower wellbeing (Blackman, Browne, Rockloff, Hing, & Russell, 2019).


Impulsiveness was assessed via the Barratt Impulsiveness Scale-Brief (BIS-Brief; Steinberg et al., 2013), an abbreviated version of its 30-item predecessor (Barratt, 1959). The BIS-brief consists of eight items and uses a four-point Likert scale to capture the degree to which the participants agreed with statements such as “I don’t pay attention” or “I act on the spur of the moment.” Cronbach’s alpha in the present study was .64. Behavioural impulsivity is a risk factor for gambling problems and harm (Russell et al., 2019).


Test-retest reliability of the SGHS

The test-retest results are summarized and presented in Table 3. Both forms of SGHS showed strong test-retest reliability, being strongly correlated between time one and time two (all p < .001): .86 for the binary form, and .88 for the Likert form. These correlations were not significantly different, Z = -1.27, p = .10.

Table 3 Test-retest reliability, means and standard deviation of the SGHS and PGSI in each form

Table 4 Correlation between SGHS, PGSI, BIS-Brief, PWI and K6

Alternate-form reliability of the SGHS

Alternate form reliability was assessed by comparing the correlations between different forms of the SGHS across time one and time two. The similar form test-retest correlations mentioned above (.86 / .88) formed the benchmark with which to evaluate the alternative forms. When comparing alternate forms across Time One and Time Two, the correlations were .75 and .74 for the Binary-Likert and Likert-Binary administration, respectively. Comparing test-retest reliability across forms (approx .745) and within forms (approximately .87) allows us to estimate the variance uniquely attributable to varying the form along. Squaring these correlations to approximate the proportion of shared variance, this finding corresponds to approximately 76% of shared variance, or 24% of the variance attributable to random effects over time. In the case of alternate forms, in which error is attributable to both random time effects and differing response formats, approximately 56% of variance was shared. Thus, 20% of variance in responses can be attributed to the differing forms.

Convergent validity of the SGHS

The results of the correlations between each form of SGHS, K6, PWI and BIS are summarized in Table 5.

Table 5 Combined response distribution (T1 and T2) for Likert and binary PGSI

Table 6 Combined Response distribution (T1 and T2) for binary and Likert format of SGHS

Time one

There were no significant differences between the correlations of the binary and Likert response formats with external measures. The correlation with the K6 was .57 for the binary form and .67 for the Likert form. Although the Likert form performed better, the difference between these correlations were not significantly different, Z = 1.87, p = .06. With regards to the BIS, the correlation was .22 for the binary scale and .23 for the Likert scale, Z = -.12, p = .90. The correlation between SGHS and the PWI was -.12 for the binary scale and -.04 for the Likert scale. This difference in correlation between the two scales was also not statistically significant, Z = .92, p = .36.

Time two

At time two, the correlation between SGHS and K6 was .60 for the binary scale and .68 for the Likert scale. This difference in the size of the correlation between the two scales was not statistically significant, Z = 1.56, p = .12. The correlation with the BIS was .17 for the binary scale and .28 for the Likert scale. There was no significant difference between the two correlations, Z = 1.33, p = .18. As for the PWI, the correlation was .10 for the binary scale and .11 for the Likert scale. This difference in the size of the correlation between the two scales was again not statistically significant, Z = .12, p = .90.

Concurrent Validity of the SGHS

At time one, the correlation between the SGHS and PGSI was .84 for the binary (binary SGHS-binary PGSI) and .94 for the Likert form (Likert SGHS-Likert PGSI). This difference in the size of the correlation between the two scales was statistically significant, Z = - 6.29, p < .001. For time two, the correlation between the SGHS and PGSI was .84 for the binary and .91 for the Likert form. This difference in the size of the correlation between the two scales was also statistically significant, Z – 3.51, p < .001.


This study investigated the psychometric properties of the Short Gambling Harm Screen and compared whether scoring methods had an influence on said properties. We hypothesized that the performance of the binary SGHS would not differ significantly compared to its alternate, Likertized form in terms of reliability and validity. As predicted, the type of scales employed did not have substantial effects on the SGHS’s internal consistency, test-retest, and alternate-form reliability. We found that both the binary and Likert format produced similar results with respect to convergent validity and discriminant validity. We note a general pattern of the Likert scale performing slightly better than the binary scale, but also note that differences between the performance of the scales was not statistically significant for any analysis, and that effect sizes were small.

In terms of concurrent validity, however, our data did suggest that the Likert format correlated significantly higher with the PGSI than the binary version, although the difference in correlations was not large. This may be because of the fact that the PGSI is designed to detect problem gamblers, who lie at the extreme end of the spectrum of harm, while the SGHS was intended to capture gambling harm across a broader scale. As such, it is possible that the Likert version correlated better with the PGSI because it allows for the detection of more extreme states of harm compared to those who experience it only occasionally.

Binary scales come with several other practical advantages beyond reliability and validity estimates. As discussed above, force-choice questionnaires, such as the SGHS, are quicker to administer and less ambiguous than Likert responses. Its concise nature helps to mitigate certain of the common artefacts observed in health surveys, particularly with respondent fatigue (O’Reilly-Shah, 2017). Moreover, unlike Likert scales, which involve strong psychometric assumptions (Michell, 2012), binary scales involves the more limited assumption that each item should provide independent information about the underlying construct it is measuring. Finally, the interpretation of Likert scales in this context is more straightforward: the score reflects the number of distinct symptoms the individual is presenting. Our study is primarily limited by a relatively small sample size. Our conclusion suggests that there are no differences in psychometric properties between the two forms, but our significance tests are sensitive to sample size. That is, small samples tend to produce null results. A larger sample size could detect more subtle differences in item performance between the two forms. Nevertheless, our current results suggest that differences between the forms, if they exist, are likely to be small. Another limitation of our results is that we only compared two response scales: force-choice binary and four-point Likert scales. Future studies could perhaps explore other formats to ensure these findings can be generalized in another context as well.

Overall, our findings suggest that response format did not yield a major impact on measures of reliability and validity. These results appear to resolve Delfabbro and King’s (2017) concern that the binary scoring might have a negative impact on the validity of the SGHS. Finally, the present study also offers some practical advice for the use of forced-choice binary response scales in psychological testing in general; that is, at least in the context of measuring gambling harm, there is no reason to assume that a binary response format is any less suited than other answering scales at capturing participant responses.


The question surrounding whether binary scales are suited to measure gambling harm was raised as a key concern for the use of SGHS in population surveys. In this study, we hypothesized that the scoring format should not have a substantial impact on the SGHS’s reliability or validity. Our data demonstrated that while there was one slight difference in concurrent validity, the binary version of the SGHS did not generally perform significantly differently to the Likert scales on several measures of scale performance. Consequently, we tentatively conclude that the binary format used to score SGHS is just as effective as Likert-type scales.


Adelson, J. L., & McCoach, D. B. (2010). Measuring the mathematical attitudes of elementary students: The effects of a 4-point or 5-point Likert-type scale. Educational and Psychological Measurement, 70, 796–807.

Barratt, E. S. (1959). Anxiety and impulsiveness related to psychomotor efficiency. Perceptual and Motor Skills, 9, 191–198.

Bendig, A. W. (1954). Reliability and the number of rating-scale categories. Journal of Applied Psychology, 38, 38–40.

Blackman, A., Browne, M., Rockloff, M., Hing, N., & Russell, A. M. T. (2019). Contrasting effects of gambling consumption and gambling problems on subjective wellbeing. Journal of Gambling Studies, 35, 773–792.

Blaszczynski, A. (2009). Problem gambling: We should measure harm rather than “cases.” Addiction, 104, 1072–1074.

Brown, M., Oldenhof, E., Allen, J. S., & Dowling, N. A. (2016). An empirical study of personality disorders among treatment-seeking problem gamblers. Journal of Gambling Studies, 32, 1079–1100.

Browne, M., Goodwin, B. C., & Rockloff, M. J. (2017). Validation of the Short Gambling Harm Screen (SGHS): A tool for assessment of harms from gambling. Journal of Gambling Studies, 34, 499–512.

Browne, M., Greer, N., Rawat, V., & Rockloff, M. (2017). A population-level metric for gambling-related harm. International Gambling Studies, 17, 163–175.

Browne, M., Hing, N., Rockloff, M., Russell, A. M. T., Greer, N., Nicoll, F., & Smith, G. (2019). A multivariate evaluation of 25 proximal and distal risk-factors for gambling-related harm. Journal of Clinical Medicine, 8, 509–524.

Browne, M., & Rockloff, M. J. (2018). Prevalence of gambling-related harm provides evidence for the prevention paradox. Journal of Behavioral Addictions 7, 410–422.

Canale, N., Vieno, A., & Griffiths, M. D. (2016). The extent and distribution of gambling-related harms and the prevention paradox in a British population survey. Journal of Behavioral Addictions, 5, 204–212.

Cummins, R. A. (1997). Comprehensive quality of life scale: Adult: Manual (5th ed., pp. 1–51). Melbourne, Australia: School of Psychology, Deakin University.

Cummins, R. A., & Gullone, E. (2000, March). Why we should not use 5-point Likert scales: The case for subjective quality of life measurement. Paper presented at the Proceedings, second international conference on quality of life in cities.

Delfabbro, P., & King, D. (2017). Prevention paradox logic and problem gambling: Does low-risk gambling impose a greater burden of harm than high-risk gambling? Journal of Behavioral Addictions, 6, 163–167.

Dolnicar, S., & Grün, B. (2013a). Translating between survey answer formats.. Journal of Business Research, 66, 1298–1306.

Dolnicar, S., & Grün, B. (2013b). Validly measuring destination image in survey studies. Journal of Travel Research, 52, 3–14.

Dolnicar, S., Grün, B., & Leisch, F. (2011). Quick, simple and reliable: Forced binary survey questions. International Journal of Market Research, 53, 231–252.

Dolnicar, S., Rossiter, J. R., & Grün, B. (2012). Pick Any measures contaminate brand image studies.. International Journal of Market Research, 54, 821–834.

Ferris, J. A., & Wynne, H. J. (2001). The Canadian Problem Gambling Index: Final report. Ottawa, ON: Canadian Centre on Substance Abuse.

Geldhof, G. J., Gestsdottir, S., Stefansson, K., Johnson, S. K., Bowers, E. P., & Lerner, R. M. (2015). Selection, optimization, and compensation: The structure, reliability, and validity of forced-choice versus Likert-type measures in a sample of late adolescents. International Journal of Behavioral Development, 39, 171–185.

Grassi, M., Nucera, A., Zanolin, E., Omenaas, E., Anto, J. M., & Leynaert, B. (2007). Performance comparison of Likert and binary formats of SF-36 version 1.6 across ECRHS II adults populations. Value in Health, 10, 478–488.

Guy, R. F., & Norvell, M. (1977). The Neutral point on a Likert scale. The Journal of Psychology, 95, 199–204.

Kessler, R. C., Andrews, G., Colpe, L. J., Hiripi, E., Mroczek, D. K., Normand, S. L. T., Walters, E. E., & Zaslavsky, A. M. (2002). Short screening scales to monitor population prevalences and trends in non-specific psychological distress. Psychological Medicine, 32, 959–976.

Komorita, S. S. (1963). Attitude content, intensity, and the neutral point on a Likert scale. The Journal of Social Psychology, 61, 327–334.

Langham, E., Thorne, H., Browne, M., Donaldson, P., Rose, J., & Rockloff, M. (2015). Understanding gambling related harm: A proposed definition, conceptual framework, and taxonomy of harms. BMC Public Health, 16, 80.

Leung, S.-O. (2011). A comparison of psychometric properties and normality in 4-, 5-, 6-, and 11-Point Likert scales. Journal of Social Service Research, 37, 412–421.

Litong-Palima, M., Albers, K. J., & Glückstad, F. K. (2018, June). Stability and similarity of clusters under reduced response data [Paper presentation]. 32nd Annual Conference of the Japanese Society for Artificial Intelligence, Kagoshima, Japan.

Lozano, L. M., García-Cueto, E., & Muñiz, J. (2008). Effect of the number of response categories on the reliability and validity of rating scales. Methodology, 4, 73–79.

Matell, M. S., & Jacoby, J. (1971). Is there an optimal number of alternatives for Likert scale items Study I: Reliability and validity.. Educational and Psychological Measurement, 31, 657–674.

McCallum, D. M., Keith, B. R., & Wiebe, D. J. (1988). Comparison of response formats for Multidimensional Health Locus of Control Scales: Six levels versus two levels. Journal of Personality Assessment, 52, 732–736.

Michell, J. (2012). The constantly recurring argument: Inferring quantity from order.. Theory & Psychology, 22, 255–271.

Myers, L., & Sirois, M. (2006). Spearman correlation coefficients, differences between. In S. Kotz, C. B. Read, N. Balakrishnan, & B. Vidakovic (Eds.), Encyclopedia of statistical sciences (2nd ed. Pp. 7901-7903). Hoboken, NJ: Wiley-Interscience.

Raaijmakers, Q. A. W., Van Hoof, J. T. C., ’t Hart, H., Verbogt, T. F. M. A., & Vollebergh, W. A. M. (2000). Adolescents’ midpoint responses on Likert-type scale items: Neutral or missing values? International Journal of Public Opinion Research, 12, 208–216.

Raisamo, S. U., Mäkelä, P., Salonen, A. H., & Lintonen, T. P. (2014). The extent and distribution of gambling harm in Finland as assessed by the Problem Gambling Severity Index. The European Journal of Public Health, 25, 716–722.

O’Reilly-Shah, V. (2017). Factors influencing healthcare provider respondent fatigue answering a globally administered in-app survey. Journal of Life and Environmental Sciences, 5,1–17.

Rodgers, B., Caldwell, T., & Butterworth, P. (2009). Measuring gambling participation. Addiction, 104, 1065–1069.

Russell, A. M. T., Hing, N., Li, E., & Vitartas, P. (2019). Gambling risk groups are not all the same: Risk factors amongst sports bettors. Journal of Gambling Studies, 35, 225–246.

Schellinck, T., Schrans, T., Schellinck, H., & Bliemel, M. (2015). Construct development for the FocaL Adult Gambling Screen (FLAGS): A risk measurement for gambling harm and problem gambling associated with electronic gambling machines. Journal of Gambling Issues, 30, 140–173.

Sproston, K., Erens, B., & Orford, J. (2000). Gambling behaviour in Britain: Results from the British gambling prevalence survey. London, UK: National Centre for Social Research.

Steinberg, L., Sharp, C., Stanford, M. S., & Tharp, A. T. (2013). New tricks for an old measure: The development of the Barratt Impulsiveness ScaleBrief (BIS-Brief).. Psychological Assessment, 25, 216–226.

Velez, P., & Ashworth, S. D. (2007). The impact of item readability on the endorsement of the midpoint response in surveys. Survey Research Methods, 1, 69–74.

Wakita, T., Ueshima, N., & Noguchi, H. (2012). Psychological distance between categories in the Likert scale: Comparing different numbers of options. Educational and Psychological Measurement, 72, 533–546.

Weng, L.-J. (2004). Impact of the number of response categories and anchor labels on coefficient alpha and test-retest reliability. Educational and Psychological Measurement, 64, 956–972.

Zar, J. H. (2005). Spearman rank correlation: Overview. In P. Armitage & T. Colton (Eds.), Encyclopedia of biostatistics (2nd ed., Vol. 7, 5095–5101). Chichester, UK: John Wiley & Sons.


Submitted May 27, 2020; accepted June 8, 2020. This article was peer reviewed. All URLs were available at the time of submission.

For correspondence: James Robert Bell McLauchlan, BPsych, Experimental Gambling Research Laboratory, 38 Regent Avenue, Springvale, 3171, Victoria, Australia. E-mail:

Competing interests: None declared (all authors).

Ethics approval: This study was approved by the Central Queensland University Ethics Committee (Approval number - 0000021464) on May 10, 2019.

Acknowledgements: This research was supported in part by grants from the 2018 Summer Research Scholarship program at CQUniversity. I would like to express my deepest gratitude to my supervisor, Professor Matthew Browne, for his patience, unwavering support, and constructive feedback. His willingness to offer his time so generously has been very much appreciated. My grateful thanks also extend to Dr Alex Russell for assisting with data analysis and to Professor Matthew Rockloff, who has helped with the design of the study and editing the manuscript.

Article Categories:
  • Original Research

Related Article(s):