German Beck Scale for Suicide Ideation (BSS): psychometric properties from a representative population survey

Background Suicidal ideation has been identified as one of the major predictors of attempted or actual suicide. Routinely screening individuals for endorsing suicidal thoughts could save lives and protect many from severe psychological consequences following the suicide of loved ones. The aim of this study was to validate the German version of the Beck Scale for Suicide Ideation (BSS) in a sample representative for the Federal Republic of Germany. Methods All 2450 participants completed the first part of the Scale, the BSS-Screen. A risk group of n = 112 individuals (4.6%) with active or passive suicidal ideation was identified and subsequently completed the entire BSS. Results Satisfactory internal reliability (α = .97 for the BSS-Screen; α = .94 for the entire BSS) and excellent model fit indices for the one-dimensional factorial structure of the BSS-Screen (CFI = .998; TLI = .995; RMSEA = .045 [95%-CI: .030-.061]) were confirmed. Measurement invariance analyses supported strict invariance across gender, age, and depression status. We found correlations with related self-report measures in expected directions comparable to previous studies, indicating satisfactory construct validity. Limitations Our study involved cross sectional data, hence neither predictive validity nor retest-reliability were examined. As only the risk group of n = 112 individuals completed the entire measure, confirmatory factor analyses could not be conducted for the full BSS. Conclusion The German translation of the BSS is a reliable and valid instrument for assessing suicidal ideation in the general population. Using it as a screening device in general and specialized medical care could substantially advance suicide prevention.


Background
The World Health Organization [1] estimated that in 2012 approximately 800,000 people died of suicide worldwide. In Germany, around 10,000 suicides are recorded each year, which is 2.5 times the number of motor vehicle deaths within the same period of time [2]. Each of these suicides itself constitutes a tragedy. On top of that, they also strongly affect society: Numerous other people are affected by every suicide and often need psychosocial support [3]. An increased risk of committing suicide has been found in individuals who have lost their partner [4] or their child to suicide [5]. Moreover, in the United States, loss of productivity resulting from suicides and suicide attempts mounts up to 11.8 billion dollars each year [6], stressing that prevention and early detection of suicidal behavior is of utmost importance.
Several risk factors for suicidal behavior have been identified, for example low socio-economic status, experienced child abuse, and mental disorders (e.g. [7][8][9]). Protective factors like religious affiliation, social support, life satisfaction, and having children [10,11] are related to lower suicide rates. One of the major predictors for committing or attempting suicide is the occurrence of suicidal ideation (e.g. [10][11][12][13][14]). Within the first year after the onset of suicidal thoughts, the risk for attempted suicide increases by approximately 170 times [9] as the transition from suicidal thoughts to behavior is often implemented during this period of time.
Thus, a routine assessment of suicidal thinking as part of general and specialized medical care could substantially advance suicide prevention. Especially because many practitioners are treating suicidal individuals without realizing it: Half of the individuals committing suicide contact primary or specialized health care facilities within 4 weeks prior to their death [15]. However, extensive exploration of suicidality with every patient cannot be provided by primary health care as time and financial resources are limited. Instead of clinical interviewsas for example the Scale for Suicide Ideation (SSI) [16])less time-consuming inventories seem more practical.
The Beck Scale for Suicide Ideation (BSS) [17] is the self-report version of the interviewer-administered SSI [16] and is one of the most widely used self-report instruments for the assessment of suicidal thinking. It helps to identify suicidal individuals provided that they are willing to acknowledge and share their thoughts. The BSS serves as a routine screening for existent suicidal thinking (BSS-Screen) and can also aid in a more extensive exploration of the severity of such thoughts (total BSS score). It can be administered in various settings (e.g., psychiatric-psychotherapeutic care, general medical services, and forensic psychiatry) and the routine screening, consisting only of five items, can be regarded as very time-efficient.
The BSS has proven to be a reliable measure across many different settings and samples, showing good internal consistencies e.g. α = .87 in an outpatient sample [18], α = .89 in a risk sample [19], and α = .88 in a nonclinical student sample [20]. One-week retest reliabilities of r tt = .54 [17] and r tt = .88 [21] have been found. Suicidal ideation as measured by the BSS has been shown to be strongly associated with hopelessness (e.g. [22][23][24]) and depression (e.g. [22,25,26]). High correlations between the BSS and other instruments for the measurement of suicidality have also been found, for example with the Suicide Probability Scale [25], the Adult Suicidal Ideation Questionnaire [25], and the Ratings of Suicidal Thoughts [24], providing support for convergent validity.
Although it has been translated into several languages, so far, there has been no official German version. Thus, the aim of this study is to investigate the reliability, validity, factorial structure, and factorial invariance of the German BSS in a large German population sample.

Study design and participants
Commissioned by the University of Leipzig, an independent institute for opinion and social research (USUMA, Berlin) collected the data in 2014. Sampling was conducted using a threefold random selection procedure in the entire inhabited territory of the Federal Republic of Germany. Firstly, 258 non-overlapping regional areas in Germany were defined by use of Cox-allocations (an algorithm providing a random rounding procedure thus allowing for unbiased stratification). Secondly, target households were randomly selected within these areas through random route procedures and thirdly, the specific target person in the respective households was randomly determined among all household members aged 14 years or older, who were able to sufficiently understand written German language. The selection of the individual to be interviewed was carried out with the help of Kish selection, which is a pre-assigned table of random numbers that helps the interviewer to determine the household member to interview [35]. Written consent was provided by all participants. Each target person was individually interviewed at home by a trained interviewer and was asked to complete several self-report questionnaires. In accordance with the American manual, the BSS, however, was exclusively completed by participants aged at least 18 years. Proper conduct of the interviews was assessed by sending prestamped postcards to 38.7% of the participants. Approximately 53% of these postcards were returned, all of them affirmative.
Altogether 2527 individuals were interviewed by 206 interviewers which constitutes a response rate of 54.8%. Two thousand four hundred fifty individuals were aged 18 years and older, thus completed the BSS and were included in the following psychometric analysis. The participant's mean age was M = 50.51 years (SD = 17.0) with a range of 18-95 years; n = 88 (3.6%) had nationalities other than German and 53.9% of the participants were female. Further sample details can be found in Table 1. The BSS-Screen identified n = 112 individuals (4.6%) with active or passive suicidal ideation, whose mean age was M = 49.7 years (SD = 17.83). 53% of them were male. In the following, they will be referred to as the "risk group". All procedures were authorized by the Ethics Committee of the Medical Faculty of the University of Leipzig (Az.: 063-14-10,032,014).

Beck Scale for Suicide Ideation (BSS)
The BSS contains 21 statement groups each assessing various aspects of suicidal ideation (see Table 3). Each statement group consists of three sentences that describe different intensities of suicidal ideation, representing a three-point scale (0 to 2). Participants are instructed to choose the particular statement of each group that is most applicable to them. The total BSS score can range from 0 to 38, with higher values indicating a greater risk of suicide. Beck and Steer [17] do not distinguish different degrees of suicidal risk. Nor do they report a cutoff criterion as even very low total scores can be associated with elevated risks of suicide [36]. The first five items of the BSS serve as a screening device for suicidal ideation during the last week (including the day of assessment) and are summed up to the BSS-Screen score. Two filter questions (the statement groups four and five) assess the presence of active or passive suicidal thoughts. If participants endorse one of them (i.e., chose a sentence rated 1 or 2), they are to complete the subsequent 14 statement groups which allow for an assessment of the severity of existing suicidal ideation. If participants choose the response option rated "0" for both item 4 and item 5 they skip items 6 to 19 and precede to the last two statement groups. These last two items address frequency and intensity of former suicide attempts and are again to be answered by all participants. They are not part of the total BSS score. The translation of the German version (Beck-Suizidgedanken-Skala, BSS) was based on the WHO guidelines on translation and adaptation of psychometric instruments [37].

Patient Health Questionnaire 2 (PHQ-2)
The PHQ-2 [38] is a two-item self-administered depression module, which includes the two main criteria for major depression from the Diagnostic and Statistical Manual of Mental Disorders (5th ed., DSM-5) [39] rated on a scale from 0 = not at all to 3 = nearly every day. PHQ-2 sum scores range from 0 to 6, with higher values indicating more depressive symptomatology. A total score of ≥ 3 proved to be most suitable regarding sensitivity and specificity for the tentative diagnosis of major depressive disorder (sensitivity: 87%, specificity: 78%) and any other depressive disorder (sensitivity: 79%, specificity: 86%) [40]. The PHQ-2 showed high internal consistency in a recent population-based study (α = .75) [41].
Life satisfaction questionnaire (FLZ-8) Participants are asked to rate their satisfaction in each area on a five-point scale ranging from 1 = dissatisfied to 5 = very satisfied, with higher values indicating higher life satisfaction. The FLZ showed good internal consistency in similar population-based surveys (α = .82) [42].

Beck Hopelessness Scale (BHS)
The BHS [43] is a 20-item scale measuring negative attitudes about the future. For each of nine optimistic and 11 pessimistic statements participants are asked to report whether it describes their attitude during the last week (true) or not (false). Scores range between 0 and 20, with higher values indicating greater hopelessness. In the present study, the BHS showed high internal consistency, α = .87.

Statistical analyses
Internal consistency of the BSS is reported as coefficient α. Item-total correlations were determined correlating the respective item with the sum of all other items. Item difficulty (P i ) coefficients were calculated as quotients of the sum of the item values that were obtained and the sum of the maximum achievable item values, multiplied by 100. To examine construct validity of the BSS, correlations with the PHQ-2, the FLZ-8, the BHS, and with the last two items of the BSS were calculated. We applied chained equation modeling [44] using the following variables: gender, age, monthly net income, educational status, and partnership status to estimate missing data (proportion of missing values of analyzed items: 0.1 -0.4%) (see, for example, [45]). To avoid implausible item values, the estimated values ( y ) were corrected by predictive mean matching (i.e., the observable values closest to the predicted value were chosen). We used the R package mice [46] for imputation. In order to verify whether the 5 items of the BSS-Screen may be summed up to one overall score (the BSS-Screen score), a one-factor model was tested using confirmatory factor analysis (CFA). Because of the threepoint response format, maximum likelihood estimation was not considered appropriate [47]. Instead, we calculated a polychoric correlation matrix and used the meanand variance-adjusted weighted least square estimator (WLMSV) [48] which has been found to be robust to violations of normality (e.g. [49]). Subsequently, goodness of fit was evaluated considering three different criteria and their respective cutoff values for a good model fit: the Comparative Fit Index (CFI > .950), Tucker Lewis Index (TLI > .950), and root mean square of approximation (RMSEA < .080). Because of the small size of the risk group within our non-clinical sample, no factor analysis concerning the entire BSS scale was performed.
Furthermore, we conducted several measurement invariance tests using multi-group factor analyses across gender (group 1: men; group 2: women), age (group 1: 18-34 years; group 2: 35-64 years; group 3: ≥ 65 years), depression status (group 1: < 3 sum score in PHQ-2; group 2: ≥ 3 sum score in PHQ-2). The groups were of the following sizes: gender: female n = 1230, male n = 1130; age: 18-34 years n = 519, 35-64 years n = 1362, ≥ 65 years n = 569; depression status: non-depressed n = 2227, possibly depressed n = 223. The same estimator as in the CFA (WLSMV) was used. These measurement invariance tests were performed using the sequential strategy discussed by Millsap and Yun-Tein [50]. As recommended by Chen [51], CFI differences with a cutoff value of Δ CFI > .01 were used to test the different stages of measurement invariance. Data analysis was carried out with the R package lavaan [52]. Table 2 displays means, standard deviations, item difficulties, the frequency of item endorsement, and the corrected item-total correlation values for the five items of the BSS-Screen in the general population and in the risk group. Furthermore, the item characteristics of the entire BSS completed only by the risk group are listed in Table 3. The item-total correlation values in the general population, which ranged from r it = .70 (active suicide attempt) to r it = .78 (wish to live), can be regarded as very satisfactory. In the risk group, item-total correlation values were mostly satisfying (except for items 11 and 19).

Item characteristics
In the general population, n = 49 individuals (2.0%) reported one and n = 8 individuals (0.3%) reported several attempted suicides in their life. Their actual death wish during the suicide attempt was estimated as low by n = 20 individuals (35.1%), as moderate by n = 19 individuals (33.3%), and as strong by n = 18 individuals (31.6%).

Internal consistency
Internal consistency based on the polychoric covariance matrix was computed as coefficient alpha. Considering that coefficient alpha could be affected by problems stemming from its assumptions not being met [53], we additionally computed McDonald's omega. BSS-Screen for the general population yielded α = .97, ω = .97 and the entire BSS (for the risk group) showed an internal consistency of α = .94; ω = .94.

Factorial validity
CFA revealed very good fit parameters for the one-factor model of the BSS-Screen. All assessed indices showed very good model fit for the total sample (CFI = .998; TLI = .995; RMSEA = .045 [95%-CI: .030-.061]). Thus, calculating a BSS-Screen score can be regarded as appropriate.

Factorial invariance
The fit measures obtained in the measurement invariance analysis are presented in Table 4. Robust fit statistics are reported. Regarding the CFI differences, strict invariance can be assumed for gender, age and depression status, as for no group the cutoff value by Chen [51] was exceeded in any step.

Construct validity
Correlation coefficients with related self-report instruments were calculated in order to determine evidence for validity of the German BSS. As can be seen in Table 5, there were substantial correlations between the BSS-Screen score and validity measures for the general population and the risk group in the expected directions: More severe suicidal thoughts were associated with higher depression (PHQ-2), lower life satisfaction (FLZ-8), higher degrees of hopelessness (BHS), higher numbers of suicide attempts and higher seriousness of the intent to die in these attempts. In the risk group, distinctively high associations between the total BSS score and the last two items of r = .53 were found.

Discussion
In this study, we investigated the psychometric quality of the German BSS using a German representative population sample. The reliability of the measure was found to  Note. a Item was regarded as endorsed if the statement rated 1 or 2 was chosen. P i = item difficulty; r it = item-rest correlation be satisfactory, meeting the requirement of α ≥ .80 for clinical measures. Our results are in line with coefficients α reported in former studies ranging from α = .75 to .98 (e.g. [21,34,54]). CFA confirmed the unidimensionality of the five BSS-Screen items, allowing a meaningful calculation of the BSS-Screen sum score. Measurement invariance analyses supported strict invariance across gender, age, and depression status, which allows for unbiased comparison of means, correlation coefficients, and path coefficients within structural equation models between the analyzed groups. As construct validity indices, all correlation coefficients between the BSS and related self-report instruments were in the expected directions and of substantial size. The association between the BSS and the BHS lies within the range of previous studies using non-clinical samples (r = .21 -.59) (e.g. [23,33,34]). Krajniak et al. [55] found a relation between the BSS and the PHQ-9 (the longer version of the PHQ-2) similar to ours (r = .37). In our sample, suicidal thoughts were detected in 4.6% (95% CI[3,8%,5,5%]) of the general population. This number is below the prevalence rate of suicidal ideation found in previous studies: 8.0% of participants in a representative sample of the German general population reported suicidal ideation during the last 2 weeks [56], the WMH Survey [9] showed a lifetime prevalence of 9.7% for suicidal ideation in Germany and the European Study on the Epidemiology of Mental Disorders (ESEMED) [57] found lifetime prevalence of suicidal ideation of 9.8% in Germany. This could be explained by people higher on suicidality might having shown a lower return rate and therefore being under-represented in the final pool of participants. However, also in the three studies mentioned above, representative samples of adults were interviewed at home, filling out short questionnaires about their suicidality. The number of suicidal individuals consenting to participate should therefore not differ between these studies and ours. It might be more likely that the German BSS is a more conservative measure than the questionnaires employed in the aforementioned studies.
Unlike the rest of the BSS items, the statement groups 11 (Reason for attempt) and 19 (Deception and concealment) showed low item-total correlations (r it = .19 and .20 respectively). This also occurred in the studies conducted by Beck and Steer [17] for the validation of the original BSS scale. As a clinical instrument, the BSS aims at assessing all relevant aspects of suicidal ideation and, considering the sensitive nature of the gathered information, its clinical utility should be top priorityeven if this may be at the expense of psychometric quality. Therefore, these two items should always be included when assessing individuals who endorse the existence of suicidal ideation, regardless of their item characteristics. The reasons and behavior of individuals identified as being at risk of suicide can be of high interest to a therapist. When calculating sum scores, however, one might consider excluding these two items.

Limitations
Despite the many strengths of this study, such as its large sample size and representativeness, certain limitations  should be taken into consideration: First, the response rate was only 54.8%. However, in general population studies lower response rates than in clinical studies are quite common and the response rate of this study was comparable to similar general population surveys (e.g. [58][59][60]). Second, the diagnostic efficiency of the German BSS could not be studied because no additional clinical interviews were conducted whose results could have been compared to those of the BSS. Yet, previous studies have already found high correlations between the BSS and clinical interviews as the SSI (e.g., r = .90) [54]. Third, as we exclusively relied on self-report, it would be interesting for future studies to include behavioral or archival data as well. Forth, given the fact that only n = 112 individuals endorsed the screening items and therefore completed the entire BSS, CFA could not be conducted for the complete scale. In future research, investigators should therefore use the German BSS in larger clinical samples more prone to suicidal ideation as for instance depressive patients. Fifth, while this study incorporated several measures for the assessment of convergent validity, divergent validity was not sufficiently covered. Sixth, as this study solely involved cross sectional data, it neither addressed predictive validity nor test-retest reliability. These should be examined by future research as well.

Conclusion
In summary, the German BSS has proved to be a reliable and valid screening instrument that could certainly help to identify more suicidal individuals in primary or specialized health care facilities. Our study in a very large German population sample showed that the BSS can be administered in this country, contributing evidence that it is an instrument appropriate for many different cultures. An interesting question for future research could be the assessment of cross-cultural invariance of the BSS.
Funding Data collection was founded by Pearson Assessment. Pearson Assessment however did not commission the preparation of this manuscript nor did they interfere with data analysis and interpretation of results. Pearson Assessment furthermore published the German BSS manual based on the same data as the present study.

Availability of data and materials
The data that support the findings of this study are available from Pearson Assessment, but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of Pearson Assessment.
Authors' contributions SK and AL were responsible for data analysis and interpretation as well are preparation of the manuscript. TM revised the manuscript. EB conceptualized and designed the study, interpreted the data, reviewed and revised the manuscript. All authors had full access to the data, read, and approved the final manuscript.

Ethics approval and consent to participate
The study was approved by the ethics committee of the medical faculty of the University of Leipzig. Reference number 063-14-10,032,014. Written consent was provided by all participants. All participants provided written consent after having been informed about the nature of the study.
Consent for publication (Not applicable)