- Research article
- Open Access
Psychometric properties and measurement invariance of the Beck hopelessness scale (BHS): results from a German representative population sample
BMC Psychiatry volume 18, Article number: 110 (2018)
The Beck Hopelessness Scale (BHS) has been the most frequently used instrument for the measurement of hopelessness in the past 40 years. Only recently has it officially been translated into German. The psychometric properties and factor structure of the BHS have been cause for intensive debate in the past.
Based on a representative sample of the German population (N = 2450) item analysis including item sensitivity, item-total correlation and item difficulty was performed. Confirmatory factor analyses (CFA) for several factor solutions from the literature were performed. Multiple group factor analysis was performed to assess measurement invariance. Construct validity was assessed via the replication of well-established correlations with concurrently assessed measures.
Most items exhibited adequate properties. Items #4, #8 and #13 exhibited poor item characteristics– each of these items had previously received negative evaluations in international studies. A one-dimensional factor solution, favorable for the calculation and interpretation of a sum score, was regarded as adequate.
A bi-factor model with one content factor and two method factors (defined by positive/negative item coding) resulted in an excellent model fit. Cronbach’s alpha in the current sample was .87. Hopelessness, as measured by the BHS, significantly correlated in the expected direction with suicidal ideation (r = .36), depression (r = .53) and life satisfaction (r = −.53). Strict measurement invariance could be established regarding gender and depression status. Due to limited research regarding the interpretation of fit indices with dichotomous data, interpretation of CFA results needs to remain tentative.
The BHS is a valid measure of hopelessness in various subgroups of the general population. Future research could aim at replicating these findings using item response theory and cross-cultural samples. A one-dimensional bi-factor model seems appropriate even in a non-clinical population.
Hopelessness as a psychological construct is of relevance with regard to various psychological disorders and related symptoms, e.g. depression, suicide, schizophrenia, alcoholism and sociopathy. Due to its role in the etiology of depression, hopelessness became a focus of the work group around Aaron T. Beck. In his cognitive theory, Beck conceptualizes hopelessness as a system of cognitive schemes. These schemes can be characterized by a generalized negative future expectation. An individual characterized as hopeless overestimates the likelihood of unfortunate events while underestimating the occurrence of fortunate and positive events. Positive outcomes are regarded as being very unlikely if not impossible . These dysfunctional beliefs regarding the future are a cornerstone of Becks cognitive triad of depression . The other two elements of this triad are a negative view of the self and a negative view of the world. Although hopelessness and depression correlate very highly, and hopelessness is often observed in depressed individuals, it is not a necessary component of depression . This is taken into account in Abramson’s theory of hopelessness depression, a special subtype of depression. Hopelessness is a proximal cause of the symptoms of hopelessness depression and suicidality is a symptom of this special form of depression, however not of other forms of the clinical picture of depression .
In over 30 years of research, Beck and his colleagues established that hopelessness is more strongly related to suicidality than to depression . In their cognitive model of suicidal behavior Wenzel and Beck  acknowledge the crucial role of hopelessness in the context of suicide. They place cognitive schemes at the center of suicidal behavior, one of these being state hopelessness. Furthermore, the World Health Organization  recognizes hopelessness as an important suicide risk factor and recommends its assessment in the context of suicidal behavior. Besides its crucial role in suicidality, hopelessness is an important construct in the context of life satisfaction, compliance and recovery in medical care, as well as in the field of forensic psychology [7, 8].
Since its development in 1974, the Beck Hopelessness Scale (BHS) has become the most popular measurement of the hopelessness construct in international studies. It has been translated into dozens of languages and used in hundreds of studies worldwide. The BHS is a 20 item self-assessment questionnaire. All items are scored on a true-false rating scale. After recoding negatively worded items, the number of endorsed items is combined to a sum-score. Based on their own research the authors of the English original suggest the following classification of the BHS-sum score: 0–3 minimal, 4–8 mild, 9–14 moderate, 15–20 severe. Based on investigations on the predictive power of the BHS, most researchers investigating the predictive power of the BHS suggest a cut-off of 9 to be indicative of suicide intentions [4, 9,10,11]. However, final interpretations should be left to trained clinicians .
Although the BHS is well established it had not officially been translated into German. Furthermore, despite this widespread use in various different samples, measurement invariance and hence the appropriateness of intergroup comparison in different populations, has never been tested. Hence, the aim of the present study was to assess the validity, reliability, factor structure, construct validity and factorial invariance of the German version of the BHS. Data stems from a sample representative for the population of the Federal Republic of Germany. To the knowledge of the authors only four other studies report the use of the BHS in a population sample [13,14,15,16]. Only Tanaka and colleagues report psychometric characteristics of the scale for a Japanese sample of N = 154 community residents. Psychometric properties of the scale have so far not been reported for a large western representative community sample. (Iliceto and Fino  have used the scale in a large Italian population, however the sample was not representative. Furthermore, the researchers used the BHS with a 5-point Likert format.)
Szabó and colleagues  recently investigated a bi-factor model with one content related and two method-factors in a clinical population from Hungary. Boduszek and Dhigra  have recently investigated the factor structure in a large student sample building on the work of Szabó and colleagues  suggesting a three dimensional multi trait multi method model (MTMM). In addition to replicating the results from other large European studies and validating the German version of the BHS, this study is furthermore the first to combine the recently favored methodological approach (i.e. using method factors accounting for item wording using a bi-factor or MTMM framework) with the analysis of measurement invariance in a large representative population sample.
Sample and sampling procedure
From February to June 2014 the University of Leipzig conducted a survey in a population sample representative of the Federal Republic of Germany. A total of N = 2527 individuals were interviewed by 206 interviewers. This number corresponds to response rate of 54.8% of the initially contacted households (N = 4607). The interviews were conducted by professionals from an independent institute for opinion and social research (USUMA Berlin). The sampling was carried out using a threefold random selection procedure drawing from in the entire inhabited territory of the Federal Republic of Germany. In a first step, 258 non-overlapping regional areas in Germany were defined by use of Cox-allocations. After this, target households were randomly selected within these areas through random route procedures. Finally, the interviewers identified the target person within the household with the help of a Kish selection grid . Each target person was individually interviewed at home by a trained interviewer and was asked to complete several self-report questionnaires. The proper conduct of the interviews was controlled. For this purpose, postcards with pre-payed postage were sent to 38.7% of the participants. Approximately 53% of these postcards were returned, all of them confirmed that the interviewers had worked as expected. As indicated in the original BHS manual, the BHS was only presented to participants who were at least 18 years old. Hence, a subsample of 2450 individuals is the basis of the present study. The participant’s mean age was M = 50.51 years (SD = 17.0) with a range of 18–95 years; 88 (3.6%) had nationalities other than German; 54% were female. All participants had adequate knowledge of the written German language and completed the survey entirely in German. Further sample details can be obtained from Table 1. Written informed consent was obtained from each participant. The survey was approved by the ethics committee of the medical faculty of Leipzig University (AZ: 063–14-10,032,014).
The Beck hopelessness scale (BHS)
The BHS  is a 20 item self-assessment instrument for the measurement of hopelessness. Abbreviated items are presented in Table 2. The respondent is asked to evaluate each of the 20 statements and decide whether the statement describes his or her attitude in the previous week (including the day of assessment). Nine items are inversely scored to prevent acquiescence. After inversion of the positively worded items, a sum-score is calculated. The total score can range from 0 to 20, indicating the number of items endorsed in the hopelessness direction. Translation of the scale into German was carried out by Pearson Assessment on the basis of WHO guidelines for questionnaire translation (including forward and back-translations). Results from the same survey using the German BHS and BSS have also been published by Gunzelmann and colleagues  in German .
The Beck scale for suicide ideation (BSS)
The BSS  contains 21 statement groups each consisting of three sentences that differ in the intensity of suicidal ideation. Scores between 0 and 2 are designated to each statement. Participants chose one statement of each group, which describes them best. The total BSS score can range from 0 to 38, with higher values indicating an increasing risk of suicide. The first five items of the BSS serve as a screening tool for suicidal ideation during the previous week and can be summed up to form the BSS-Screen score. Subsequent items are only presented if either item # 4 or item #5 has been endorsed. These 14 items allow for an assessment of the severity of the existing suicidal ideation. The last two statement groups address frequency and intensity of former suicide attempts and are to be answered by all participants independent of their endorsement of the filter questions. They are however not included in the total BSS score. Due to the low number of individuals that have completed the whole questionnaire, only the screening part of the BSS will be used for the establishment of construct validity. Details on the psychometric properties of the German version of the BSS can be obtained from Kliem and colleagues .
The patient health questionnaire 2 (PHQ-2)
The PHQ-2 is a brief instrument for the assessment of depressive symptoms. It consists of only two items, namely the depression scale of the PHQ-9 . It assesses the frequency of depressive symptoms over the course of the previous 2 weeks. The response options for each item are 0 = not at all, 1 = several days, 2 = more than half the days and 3 = nearly every day. Thus, PHQ-2 scores can range from 0 to 6. A total score of 3 proved to be most suitable regarding sensitivity and specificity for the tentative diagnosis of major depressive disorder (sensitivity: 87%, specificity: 78%) as well as other depressive disorders (sensitivity: 79%, specificity: 86%). Cronbach’s alpha was found to be α = .83 .
Life satisfaction questionnaire (FLZ)
The FLZ-8, a shortened version of the Fragebogen zur Lebenszufriedenheit (FLZ) [Life Satisfaction Questionnaire] by Brähler, Fahrenberg, Myrtek, and Schumacher , was used for the assessment of global life satisfaction. It assesses individual satisfaction in eight areas of life (friends/acquaintances, leisure time/hobbies, health, income/financial security, job/work, living situation, family life/children, relationship/sexuality). Participants have to rate their satisfaction in each area on a 5-point rating scale ranging from 1 = dissatisfied to 5 = very satisfied. Individual items can be aggregated to form a global score. The sum score can range from 7 to 35 with higher values indicating higher life satisfaction. Cronbach’s alpha for the original sub-scales ranged from α = .82 to α = .95 .
Imputation of missing data
A percentage of 0.0–1.4% of the data were missing on item level regarding the questionnaires examined in the context of this paper. Missing data were imputed using nonparametric recursive partitioning. The R package missForest  was used for this purpose. Differing from other established imputation methods (e.g., multiple imputation or full information maximum likelihood) the missForest imputation algorithm is not based on any assumptions regarding the distribution underlying the estimated variables, which makes it particularly suitable for mixed-type data containing variables differing in level of measurement. Non-linear relationships as well as higher order interactions are more adequately modeled this way. The random forest algorithm constructs a multitude of decision trees based on the observed values. After being trained on the data this way, in a next step the missing values are predicted for each variable on the basis of the other variables of the data set. This process is undergone for each variable and then repeated iteratively until a stopping criterion is met. The random forest was trained based on the variables age, gender, BHS items and BSS-Screen items. The FLZ-8 and the PHQ-2 were not imputed on the item level as the original survey data available to the authors only included the sum score, hence only the sum score could be imputed if missing.
The BHS is dichotomously-scored and hence can be categorized as an ordered-categorical measure whose values are both discrete and ordinal in scale. Ordered-categorical measures require some special considerations. For example, a dichotomous variable can by definition never be normally distributed. Hence, every method based on this prerequisite is not applicable. Whenever properties at the item level are assessed, special attention has to be given to methodology.
Internal consistency reliability
Reliability for dichotomous measures can be computed using the Kuder-Richardson Formula 20 (KR-20). The KR-20 is analogous to Cronbach’s alpha which is a generalization of the same for non-dichotomous measures.
In dichotomous items, item difficulty is calculated as the percentage of individuals who endorsed the item, after recoding all variables such that endorsement indicates higher levels of hopelessness. Item-total correlations refer to the correlation of a single item with the rest of the scale.
CFA was performed using the lavaan package  for R statistics. As suggested for the use with ordered categorical measures , weighted least square means and variance adjusted estimation (WLSMV) was used. Model-fit was assessed using the following fit measures: Comparative Fit Index (CFI), Tucker Lewis Index (TLI), Root Mean Square Error of Approximation (RMSEA). It is common practice to use the same criteria for evaluating goodness of fit for ordered categorical measures as for continuous ones, with the canonical reference being Hu and Bentler . While some authors suggest this procedure for lack of alternative (e.g. ) adequacy of this procedure has not yet been established. As χ2 has been found to be overestimated even with robust WLS estimation interpretation of model fit will not rely too heavily on χ2 estimation. In this paper, for the lack of alternatives, we will therefore rely on the Hu and Bentler  criteria of a CFI and TLI > .95 and a RMSEA < .08 indicating good model fit. Robust test statistics were used. However, due to the lack of validation of this interpretation in the context of WLSMV estimation of ordered categorical variables, especially dichotomous ones, all interpretation of model fit has to remain tentative.
First we analyzed, the original model by Beck and colleagues  [a three-factor model: Feelings, about the future (1,5,6,13,15,19), Loss of motivation (2,3,9, 11,12,16,17,20), Future Expectations (4,7,8,10,14,18)] and the one-dimensional model (all items loading on one overall hopelessness factor) as these are the models most discussed in the BHS literature. Recently the influence of method factors (optimism and pessimism, due to item wording) has been introduced in the discussion regarding the BHS factor structure. Therefore, we also tested a one-factorial, bi-factor model as suggested by Szabó and colleagues  [one content factor (all items), two method factors: Negatively worded items (2,4,7,9,11,12, 16, 17, 18, 20) and Positively worded items (1,3,5,6,8,10,13,15,19)] as well as a multitrait-multimethod model suggested by Boduszek & Dhingra  [three correlated trait factors (as suggested by Beck and colleagues), and two correlated method factors] .
Based on the fact, that the factor structure of the BHS has been fiercely debated and various factor solutions were suggested by different research groups, we additionally tested several other models suggested in the BHS literature. Details regarding those models can be found in the Additional file 1 . Especially noteworthy are the following three models, which will thoroughly be reported in the results section and critically evaluated in the discussion section.
However, the BHS sum-score is the major statistic that is interpreted in every day practice. Uni-dimensionality can hence be regarded as a mayor objective. To justify the endorsement of a three-factorial solution instead of a more parsimonious one-factorial model, the sub-factors have to add explanatory value (for example differential predictive validity; see ). To examine differential relationships exhibited between the three conceivable hopeless factors (“feelings about the future”, “loss of motivation”, and “future expectations”) correlations with the following constructs were computed: life satisfaction (FLZ), suicidal ideation (BSS-Screen), depression (PHQ-2), as well as two single items indicating death wish and suicide attempt.
To further test the reliability of the bifactor model (one general factor) as well as the MTMM model (three-factors). Mc Donald’s coefficient omega (ω) was computed for each of the subscales as well as the overall sum score.
Measurement invariance can be assumed if a construct’s factorial structure does not differ between groups (e.g. gender), i.e., the factor structure is invariant across these groups. There are different levels of measurement invariance, which determine the comparability of the analyzed groups. Configural invariance, the weakest form of measurement invariance, refers to the equivalence of the factorial structure. It is assumed when the latent constructs (i.e., the factors) show the same dimensionality and in addition, the indicators (i.e., the items) can be identically assigned to the latent constructs in both groups. This type of measurement invariance is necessary but not sufficient for allowing an unbiased comparison of measurement between groups. If this prerequisite is not empirically supported, stricter tests of invariance and comparisons between groups are not appropriate, as this indicates that the analyzed items measure different constructs in the examined groups.
More restrictive forms of measurement invariance refer to a) the factor loadings (i.e., metric or weak invariance) and b) the intercepts of the indicators (i.e., scalar or strong invariance). If weak invariance is empirically supported, structural relationships among latent constructs (e.g., correlation coefficients) can be compared between the groups. If, additionally, strong invariance is also confirmed, between-group differences in the constructs’ means can be assessed. Ultimately, error variance invariance (or strict invariance) can be tested examining whether the indicators’ residual variances are equal across groups. Different residual variances in groups can result in different error rates (e.g., sensitivity, specificity) and thereby affect screening decisions or the calculation of critical differences (e.g., ) Measurement invariance was assessed based on the following groups: gender (as indicated by participants) and depression (PHQ-2 score below 3, PHQ-2 score of 3 or higher). Categorization of depression status reflects the suggested cut-off for a tentative diagnosis of depressive illness . Measurement invariance was tested using multiple group factor analysis, which was again performed using the lavaan package  for R statistics and WLSMV estimation. Following the procedure suggested by Millsap and Yun-Tein  for ordered categorical variables, the following models were subsequently tested: configural invariance (no constraints apart from those necessary for model identification), weak invariance (constraining all loadings to be equal), strong invariance (constraint of threshold which was already necessary for model identification and is hence identical to weak invariance), strict invariance (constraining unique variances to 1). Chen  suggest the following cut-off criteria: a change of ≥ − .01 in CFI in addition to a change of ≥.015 in RMSEA indicates non-invariance. They furthermore point out that among those indices CFI is the most reliable and that RMSEA tends to be more affected by sample size and model complexity.
Construct validity was established by assessing the correlations of the BHS sum score and the Fragebogen zur Lebenszufriedenheit (FLZ-8) [Life satisfaction questionnaire] and a brief assessment of depressive symptoms: Patient Health Questionnaire (PHQ-2) as well as the Beck Scale for Suicide Ideation (BSS).
Sample characteristics including descriptive statistics of demographic variables can be obtained from Table 1 in the methods section.
Table 2 presents the BHS item characteristics. BHS mean in the whole sample was M = 4.87 (SD = 4.33). Item difficulty P i ranged from 8 (# 3 “bad things won’t stay forever”) to 55 (#8 “particularly lucky”). Item-total correlations ranged from r it = .03 (#13 “future will be happier”) to r it = .64 (#7 “dark future”). Item means were virtually identical between men and women with effect sizes between d = .00 and d = .09.
Confirmatory factor analysis of the one-dimensional model exhibited an acceptable model fit χ2 = 2205.127***, df = 170, CFI = 0.926, TLI = 0.917, RMSEA = 0.070, 95% CI [0.067, 0.073]. For a bi-factor model accounting for the optimism and pessimism method factors recently suggested by Szabó and colleagues  excellent model fit could be attested (χ2 = 562.577***, df = 150, CFI = 0.985, TLI = 0.981, RMSEA = 0.034, 95% CI [0.031, 0.036]). Composite reliability in the form of McDonald’s coefficient omega was ω = 0.90 (for the general factor). Standardized factor loadings for this model can be obtained from Table 3.
For several multi-factorial models (e.g. the model suggested by Beck and colleagues : Feelings, about the future (1,5,6,13,15,19), Loss of motivation (2,3,9, 11,12,16,17,20); Future Expectations (4,7,8,10,14,18), χ2 = 1499.105 ***; df = 167; CFI = 0.951; TLI = 0.945, RMSEA = 0.057, 95% CI [0.054, 0.060]) acceptable to good model fit could be confirmed.Footnote 1 Excellent model fit was obtained for the MMTM model recently suggested by Boduszek and Dhingra  with three correlated traits and two correlated method factors (χ2 = 474.106***, df = 146, CFI =0.988, TLI =0.984, RMSEA = 0.030, 95% CI [0.027, 0.033]). Composite reliability in the form of McDonald’s coefficient omega was ω = 0.96 (for total score), ω = 0.88 (factor 1 - Feelings about the future), ω = 0.94 (factor 2 - Loss of motivation) and ω = 0.90 (factor 3 - Future expectations).
To evaluate rationale regarding the different models we investigated whether the original Beck factors (as also included in the MTMM model) provide additional explanatory value regarding construct validity. Table 4 shows the correlations of the three BHS factors as originally suggested by Beck and colleagues . As the correlations only marginally differ per factor we continued further analyses using one-dimensional approach.
Table 5 shows the fit measures obtained in the measurement invariance analysis for the one-factorial as well as for the bi-factor model. The cut-off criteria by Chen  are exceeded only in the strict bi-factor model regarding depression status. Robust fit statistics are reported. The groups were of the following sizes. Gender: female n = 1320, male n = 1130; depression status: non-depressed n = 2227, possibly depressed n = 223.
Table 4 contains the Pearson correlations of the BHS sum scores and measures of depression (PHQ-2) suicidal ideation (BSS-Screen) and life satisfaction (FLZ-8). The directions of the correlations were in accordance with theoretical expectations.
This study examined the psychometric properties of the German version of the Beck Hopelessness Scale in a large sample representative for the Federal Republic of Germany. It has been the first study to report psychometric properties of the BHS in a large representative western community sample and the first investigations of the scales measurement invariance. Although several authors attest limited suitability of the BHS for general population samples [7, 35] the German version of the Beck Hopelessness Scale demonstrated mostly sound psychometric properties. The item characteristics can be evaluated as satisfactory with a few noteworthy exceptions: The items #4, #8 and #13 cannot be interpreted as adequately capturing the construct. In previous psychometric evaluations these specific items have also been found to be the cause for concern. Niméus et al.  performed a principal component analysis and found item #4 to load on a single factor all on its own. Kao, Liu, and Lu  found that item #4 was endorsed 1.5 times more frequently in non-suicidal patients than in suicidal patients. No other item showed such inverse behavior, at least not to that extent. A similar item behavior was observed by Durham  who found that item #13 had the same item difficulty in a student sample as in psychiatric samples. In the following studies, one or more of the problematic items exhibited low item-total correlations or very low factor loadings: Aloba, Ajao, Alimi and Esan , Fisher and Overholser , Perczel Forintos, Sallai and Rózsa , Pompili, Tatarelli, Rogers and Lester , Steed , Szabó et al. , Tanaka, Sakamoto, Ono, Fujihara and Kitamura , Young, Halper, Clark, and Scheftner . However, the undesired properties of these items are not universal. In fact, in some studies they ranged among the best. In the Yoruba version of the BHS, Aloba et al.  even suggested two of these items (namely #8 and #13) for inclusion in a 4-item short form. Tanaka et al.  reports negative item-total correlation of item #17 in a Japanese population sample, this item ranged among the best in the German sample, suggesting potential cultural differences. The zeitgeist however should also be considered. The BHS has been developed in the 1970s. Especially the items that turned out to be problematic in this analysis, could be interpreted as historically sensitive. Being able to imagine one’s life in 10 years (item #4) is probably harder in 2018 than it was in 1974 (at least in Germany). Individuals from first world countries today more frequently change careers, remarry or move houses; which does not necessarily render them more hopeless. Viewing oneself as particularly lucky (item #8) might be rooted in the notion of fate, a concept that was arguably more pronounced in the 1970s. Expecting to be happier in the future (item #13) possibly highly depends on the current situation. Anyone being currently very happy might also have a hard time imagining an even more positive future. This notion is supported by a recent study by Szabó and colleagues  who report, that item #13 tapped into both optimism as well as pessimism. The current state of happiness might also fluctuate with recent historic events.
Confirmatory factor analysis of the one-factor model provided acceptable model fit. This allows the calculation (and meaningful interpretation) of a BHS sum score. When including two method factors (one for positively worded items and one for negatively worded items) model fit of the one-dimensional bi-factor model as suggested by Szabó et al.  results in excellent model fit for the one-factor solution.
Regarding multi-factorial models, most of the three factor solutions (e.g. Beck ) fit the data well. When taking into account the partially inverted item wording, the fit of the three factor MTMM model suggested by Boduszek and Dhingra  was almost identical excellent as the bifactor model. However, given that we could not reproduce differential relationships exhibited between the three hopelessness factors, we would opt for the more parsimonious one-dimensional bi-factor model. Furthermore, some methodological arguments support the reasoning that the good fit of several three factorial models might be due to method effects. For example, Woods  pointed out that dichotomous items tend to produce spurious factors, as items with a similar threshold tend to group together in factors. Thus, the good fitting three factor solutions could in fact be an artifact. The notion of Flora and Curran , that positive bias of several fit measures increase with model size using WLSMV estimation, is another explanation for the good fit of several models. A final recommendation however cannot be provided. Further studies depending on large samples seem necessary to reach a final conclusion. It seems furthermore possible that different factorial solutions are to be favored in different sample compositions. Several BHS publications have discussed the notion that in non-clinical samples hopeless is best interpreted as a unidimensional construct, whereas in clinical samples a multi-factorial approach is more appropriate (since a certain degree of hopelessness is necessary to bring out the differences). This could also be an explanation regarding the different explanatory value of the subfactors in the present study compared to the recent study by Boduszek and Dhingra  as the BHS mean in their student sample is considerably higher than in the present sample.
Measurement invariance analysis using multiple-group CFA supported invariance across gender and depression status. Cultural explanations however could not be tested and should be taken into account in future research. Cross cultural assessment using the BHS is scarce (the authors know of only one study including American and Turkish university students,  and therefore the establishment of measurement invariance in cross cultural samples might be an adequate next step.
Construct validity could be established by replicating correlations from the existing body of research. Correlations of hopelessness and depression (measured with the PHQ-2) are comparable in magnitude to those found in previous research with various measures of depressive symptoms. However, it should be noted that the second of the two PHQ-2 items explicitly assesses feelings of hopelessness, which could possibly inflate the correlation. The correlation of the BHS with the measure of life satisfaction (FLZ) are in the expected direction and exceed previously found correlations in magnitude. Correlations with suicide related measures (BSS-screen, suicide attempt and death wish) are highly significant but lower than expected with regard to previous findings. This might be due to the unusually small percentage of suicidal individuals given the otherwise large sample size. One of the most relevant characteristic of the hopelessness construct, namely the fact that hopelessness stronger correlates with suicidality than depression correlates with the same, is not so pronounced in this sample. This is most likely due to the wording of the second PHQ-2 item, which explicitly assesses feelings of hopelessness, thus likely over representing the hopelessness aspect of depression, compared to other depression measures.
As three items exhibited insufficient item-total correlations they were excluded from some analyses rendering a generalization of obtained results tentative. The huge number of models tested entails the risk of a coincidental fit to the data, rather than the confirmation of the “true” underlying model. Especially as the fit indices of several different models were virtually identical. Literature on the interpretation of model fit for ordered categorical variables is furthermore scarce. Hence, the interpretation heavily relies on findings generated from continuous data, the generalizability of which is questionable. Interpretation of results therefore has to remain tentative. Additional analyses applying item response theory could have provided further insight into item functioning.
To the knowledge of the authors, this has been the first attempt to establish measurement invariance for the BHS by the means of multiple group CFA. (Iliceto, Fino, Sabatello, and Candilera  established measurement invariance regarding age in a larger model including the BHS, using a Likert scale and Iliceto and Fino tested for general model invariance in two random subsamples.) Measurement bias can lead to erroneous application and interpretation of cut-off scores, denying individuals in distress proper treatment . Empirical findings could furthermore erroneously be generalized across groups. The establishment of cross cultural measurement invariance should hence become a priority to ensure comparability of results. Qualitative interviews concerning the subjective interpretation of the items #4, #8, #13, that did not seem to tap the construct well in the German sample, could help understanding the poor psychometric properties exhibited by these items. Regarding the overlap in item wording of the BHS and the PHQ-2, a validation in a German sample using a different depression measure might be appropriate.
Table A in the electronic Additional file 1 shows the model fit of the models from the extensive literature regarding the BHS factorial structure.
Beck Hopelessness Scale
Beck Scale for suicide ideation
Confirmatory factor analysis
Comparative Fit Index
Fragebogen zur Lebenszufriedenheit (Life satisfaction questionnaire)
Kuder-Richardson Formula 20
Multi group confirmatory factor analysis
multi trait multi method
Patient health questionnaire 2
Root Mean Square Error of Approximation
Standardized Root Mean Square Residual
Tucker Lewis Index
World health organization
Weighed least square means and variance adjusted estimation
Beck AT. Depression: causes and treatment. Philadelphia: University of Pennsylvania Press; 1972.
Hanna D, White R, Lyons K, McParland MJ, Shannon C, Mulholland C. The structure of the Beck hopelessness scale: a confirmatory factor analysis in UK students. Personal Individ Differ. 2011;51:17–22. https://doi.org/10.1016/j.paid.2011.03.001.
Abramson LY, Metalsky GI, Alloy LB. Hopelessness depression: a theory-based subtype of depression. Psychol Rev. 1989;96:358.
Beck AT, Steer RA, Kovacs M, Garrison B. Hopelessness and eventual suicide: a 10-year prospective study of patients hospitalized with suicidal ideation. Am J Psychiatry. 1985;1:559–63. https://doi.org/10.1176/ajp.142.5.559.
Wenzel A, Beck AT. A cognitive model of suicidal behavior: theory and treatment. Appl Prev Psychol. 2008;12:189–201. https://doi.org/10.1016/j.appsy.2008.05.001.
World Health Organization. Preventing suicide: A global imperative. Geneva; 2014. http://www.who.int/mental_health/suicide-prevention/world_report_2014/en/.
Durham TW. Norms, reliability, and item analysis of the hopelessness scale in general psychiatric, forensic psychiatric, and college populations. J Clin Psychol. 1982;38:597–600. https://doi.org/10.1002/1097-4679(198207)38:3<597::aid-jclp2270380321>3.0.co;2-6.
Ivanoff A, Jang SJ. The role of hopelessness and social desirability in predicting suicidal behavior: a study of prison inmates. J Consult Clin Psychol. 1991;59:394. https://doi.org/10.1037/0022-006x.59.3.394.
Brown GK, Beck AT, Steer RA, Grisham JR. Risk factors for suicide in psychiatric outpatients: a 20-year prospective study. J Consult Clin Psychol. 2000;68:371.
Goldston DB, Daniel SS, Reboussin BA, Reboussin DM, Frazier PH, Harris AE. Cognitive risk factors and suicide attempts among formerly hospitalized adolescents: a prospective naturalistic study. J Am Acad Child Adolesc Psychiatry. 2001;40:91–9. https://doi.org/10.1097/00004583-200101000-00021.
Niméus A, Träskman-Bendz L, Alsén M. Hopelessness and suicidal behavior. J Affect Disord. 1997;42:137–44. https://doi.org/10.1016/S0165-0327(96)01404-8.
Beck AT, Steer RA. Beck hopelessness scale (BHS) manual. Pearson: San Antonio; 1993.
Greene SM. Levels of measured hopelessness in the general population. Br J Clin Psychol. 1981;20:11–4. https://doi.org/10.1111/j.2044-8260.1981.tb00490.x.
Haatainen K, Tanskanen A, Kylmaä J, Honkalampi K, Koivumaa-Honkanen H, Hintikka J, Viinamaki H. Ftors associated with hopelessness: a population study. Int J Soc Psychiatry. 2004;50:142–52. https://doi.org/10.1177/0020764004040961.
Tanaka E, Sakamoto S, Ono Y, Fujihara S, Kitamura T. Hopelessness in a community population in Japan. J Clin Psychol. 1996;52:609–15.
Iliceto P, Fino E. Beck hopelessness scale (BHS): a second-order confirmatory factor analysis. Eur J Psychol Assess. 2014; https://doi.org/10.1027/1015-5759/a000201.
Szabó M, Mészáros V, Sallay J, Ajtay G, Boross V, Udvardy-Mészáros À, et al. The Beck hopelessness scale: specific factors of method effects? Eur J Psychol Assess. 2016;32:111–8. https://doi.org/10.1027/1015-5759/a000240.
Boduszek D, Dhingra K. Construct validity of the Beck hopelessness scale (BHS) among university students: a multitrait-multimethod approach. Psychol Assess. 2016;28:1325–30. https://doi.org/10.1037/pas0000245.
Kish L. A procedure for objective respondent selection within the household. J Am Stat Assoc. 1949;44:380–7. https://doi.org/10.2307/2280236.
Beck AT, Weissman A, Lester D, Trexler L. The measurement of pessimism: the hopelessness scale. J Consult Clin Psychol. 1974;42:861–5.
Gunzelmann T, Beutel M, Kliem S, Brähler E. Suizidgedanken, Hoffnungslosigkeit und Einsamkeit bei Älteren. Z Psychosom Med Psychother. 2016;62:366–76. https://doi.org/10.13109/zptm.2016.62.4.366.
Beck AT, Steer RA, Ranieri WF. Scale for suicide ideation: psychometric properties of a self-report version. J Clin Psychol. 1988;44:499–505. https://doi.org/10.1002/1097-4679(198807)44:4<499::aid-jclp2270440404>3.0.co;2-6.
Kliem S, Lohmann A, Mößle T, Brähler E. German Beck scale for suicide ideation (BSS): psychometric properties from a representative population survey. BMC Psychiatry. 2017;17:389. https://doi.org/10.1186/s12888-017-1559-9.
Kroenke K, Spitzer RL, Williams JBW. The PHQ-9. J Gen Intern Med. 2001;16:606–13. https://doi.org/10.1046/j.1525-1497.2001.016009606.x.
Löwe B, Kroenke K, Gräfe K. Detecting and monitoring depression with a two-item questionnaire (PHQ-2). J Psychosom Res. 2005;58:163–71. https://doi.org/10.1016/j.jpsychores.2004.09.006.
Brähler E, Fahrenberg J, Myrtek M, Schumacher J. Fragebogen zur Lebenszufriedenheit (FLZ). Göttingen: Hogrefe; 1999.
Stekhoven DJ, Bühlmann P. MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics. 2012;28:112–8. https://doi.org/10.1093/bioinformatics/btr597.
Rosseel Y. Lavaan: an R package for structural equation modeling. J Stat Softw. 2012;48:1–36. https://doi.org/10.18637/jss.v048.i02.
Flora DB, Curran PJ. An empirical evaluation of alternative methods of estimation for confirmatory factor analysis with ordinal data. Psychol Methods. 2004;9:466–91. https://doi.org/10.1037/1082-989X.9.4.466.
Hu L, Bentler PM. Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Struct Equ Modeling. 1999;6:1–55. https://doi.org/10.1080/10705519909540118.
Bovaird JA, Koziol NA. Measurement models for ordered-categorical indicators. New York: The Guilford Press; 2012.
Millsap RE, Kwok O-M. Evaluating the impact of partial factorial invariance on selection in two populations. Psychol Methods. 2004;9:93. https://doi.org/10.1037/1082-989x.9.1.93.
Millsap RE, Yun-Tein J. Assessing factorial invariance in ordered-categorical measures. Multivariate Behav Res. 2004;39:479–515. https://doi.org/10.1207/s15327906mbr3903_4.
Chen FF. Sensitivity of goodness of fit indexes to lack of measurement invariance. Struct Equ Modeling. 2007;14:464–504. https://doi.org/10.1080/10705510701301834.
Young MA, Halper IS, Clark DC, Scheftner WA. An item-response theory evaluation of the Beck hopelessness scale. Cogn Ther Res. 1992;16:579–87. https://doi.org/10.1007/bf01175143.
Kao YC, Liu YP, Lu CW. Beck hopelessness scale: exploring its dimensionality in patients with schizophrenia. Psychiatr Q. 2012;83:241–55. https://doi.org/10.1007/s11126-011-9196-9.
Aloba O, Ajao O, Alimi T, Esan O. Psychometric properties and correlates of the Beck hopelessness scale in family caregivers of Nigerian patients with psychiatric disorders in southwestern Nigeria. J Neurosci Rural Pract. 2016;7:S18–25. https://doi.org/10.4103/0976-3147.196434.
Fisher LB, Overholser JC. Refining the assessment of hopelessness: an improved way to look to the future. Death Stud. 2012;37:212–27. https://doi.org/10.1080/07481187.2011.628437.
Forintos DP, Sallai J, Rózsa S. Adaptation of the Beck hopelessness scale in Hungary. Psihologijske Teme. 2010;19:307–21.
Pompili M, Tatarelli R, Rogers JR, Lester D. The hopelessness scale: a factor analysis. Psychol Rep. 2007;100:375–8. https://doi.org/10.2466/pr0.100.2.375-378.
Steed L. Further validity and reliability evidence for Beck hopelessness scale scores in a nonclinical sample. Educ Psychol Meas. 2001;61:303–16. https://doi.org/10.1177/00131640121971121.
Aloba O, Akinsulore A, Mapayi B, Oloniniyi I, Mosaku K, Alimi T, Esan O. The Yoruba version of the Beck hopelessness scale: psychometric characteristics and correlates of hopelessness in a sample of Nigerian psychiatric outpatients. Compr Psychiatry. https://doi.org/10.1016/j.comppsych.2014.09.024.
Woods C. Factor analysis of scales composed of binary items: illustration with the Maudsley obsessional compulsive inventory. J Psychopathol Behav Assess. 2002;24:215–23. https://doi.org/10.1023/A:1020770831134.
Gençöz F, Vatan S, Walker RL, Lester D. Hopelessness and suicidality in Turkish and American respondents. Omega. 2007;55:311–9. https://doi.org/10.2190/OM.55.4.e.
Iliceto P, Fino E, Sabatello U, Candilera G. Personality and suicidal ideation in the elderly: factorial invariance and latent means structures across age. Aging Ment Health. 2014;18:792–800. https://doi.org/10.1080/13607863.2014.880404.
Bowen NK, Masa RD. Conducting measurement invariance tests with ordinal data: a guide for social work researchers. J Soc Soc Work Res. 2015;6:229–49. https://doi.org/10.1086/681607.
Data collection was founded by Pearson Assessment. Pearson Assessment however did not commission the preparation of this manuscript nor did they interfere with data analysis and interpretation of results.
Availability of data and materials
The data that support the findings of this study are available from Pearson Assessment but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of Pearson Assessment.
Ethics approval and consent to participate
The study was approved by the ethics committee of the medical faculty of the University of Leipzig [Ethik-Kommission an der Medizinischen Fakultät der Universität Leipzig]. Reference number 063–14-10,032,014. Written informed consent was obtained from each participant.
Consent for publication
SK and EB authored the manual for the German version of the BHS. All other authors declare that they have no conflicts of interest. This paper and its contents were neither commissioned nor was any author imbursed for the preparation.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional file 1:
Table A. “Model fit of factorial solutions suggested in previous studies including the BHS”. (DOCX 26 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Kliem, S., Lohmann, A., Mößle, T. et al. Psychometric properties and measurement invariance of the Beck hopelessness scale (BHS): results from a German representative population sample. BMC Psychiatry 18, 110 (2018). https://doi.org/10.1186/s12888-018-1646-6
- Beck hopelessness scale (BHS)
- Population sample
- Measurement invariance
- Psychometric analysis
- Confirmatory factor analysis (CFA)
- Bi-factor model
- Factor structure