Psychometric properties of responses by clinicians and older adults to a 6-item Hebrew version of the Hamilton Depression Rating Scale (HAM-D6)

Background The Hamilton Depression Rating Scale (HAM-D) is commonly used as a screening instrument, as a continuous measure of change in depressive symptoms over time, and as a means to compare the relative efficacy of treatments. Among several abridged versions, the 6-item HAM-D6 is used most widely in large degree because of its good psychometric properties. The current study compares both self-report and clinician-rated versions of the Hebrew version of this scale. Methods A total of 153 Israelis 75 years of age on average participated in this study. The HAM-D6 was examined using confirmatory factor analytic (CFA) models separately for both patient and clinician responses. Results Reponses to the HAM-D6 suggest that this instrument measures a unidimensional construct with each of the scales’ six items contributing significantly to the measurement. Comparisons between self-report and clinician versions indicate that responses do not significantly differ for 4 of the 6 items. Moreover, 100% sensitivity (and 91% specificity) was found between patient HAM-D6 responses and clinician diagnoses of depression. Conclusion These results indicate that the Hebrew HAM-D6 can be used to measure and screen for depressive symptoms among elderly patients.


Background
Depression is a common debilitative psychiatric condition ranked high in prevalence among all mental health conditions [1]. Lifetime prevalence may be as high as 20% [2] and, at any one time, 5-10% of the world's population meets diagnostic criteria for a major depressive episode [3]. Depression is projected to be the second leading cause of disability worldwide in 2020 [4].
Clinical depression is common in primary care with rates of prevalence among older adults ranging between 4-24% [5,6]. Untreated elderly patients are at higher risk of morbidity and mortality [7] and experience slower rates of recovery [6,8]. Moreover, chronic depression is a significant risk factor for dementia [9].
Given that depression is amenable to treatment, valid and reliable screening tools are necessary to identify this patient population. Among existing instruments, the clinician-administered Hamilton Depression Rating Scale (HAM-D) was first developed to assess the efficacy of the first generation of antidepressant medications [10]; the HAM-D has since become the gold standard for measuring symptom severity and change in randomized clinical trials. Among various formats (17, 21, 24 & 28 items) [10,11], the 17-item (HAM-D 17 ) has been used most frequently. Scale items measure mood, insomnia, anhedonia, agitation, gastro-intestinal and other somatic symptoms, weight change, suicidal ideation, hypochondriasis, anosognosia, and psychomotor and cognitive retardation.
Despite widespread usage, various researchers have questioned whether the HAM-D 17 is a unidimensional or multidimensional instrument [12][13][14][15]. This is problematic as multi-factorial measurement may impede the detection of symptom change over time, treatment response characteristics [16] and the ability to distinguish the relative efficacy of treatments [13]. This assertion is supported by meta-analytic study findings indicating that certain scale items are less sensitive to measurement of symptom severity. In addition, some items have comparatively poor inter-rater and retest reliability, and the response-option format may not be optimal [17]. In light of these findings, some have suggested that the 17-item HAM-D may be less than ideal for clinical research applications [14,15,17,18].
These limitations have led researchers to propose abridged versions of the HAM-D that are quick to administer yet sensitive to measurement of symptom levels, change over time and relative differences in treatment efficacy. For instance, Maier and Philipp [19] proposed a 6-item version of the HAM-D. More recently, an 8-item version was devised by Gibbons and colleagues [20] by applying item response theory. Research to date suggests that both versions are sensitive to change over time and can identify patients in remission [21,22]. Recently, a scale consisting of 7 items was also suggested [23]. The items were empirically identified on the basis of response frequency and sensitivity to change of the individual HAM-D items with depressed samples [24].
Among the abridged versions of the Hamilton scale, the most frequently used was developed by Bech et al.
(HAM-D 6 ) [25]. Using item analysis, these researchers [25] have proposed a 6-item HAM-D as a unidimensional measure of depressive symptomatology [14]. This HAM-D 6 is composed of items measuring core symptoms of depression (i.e., depressed mood, self-esteem and feelings of guilt, social interaction and interests, psychomotor retardation, anxiety, and somatic symptoms). Compared to the HAM-D 17 , this assessment appears to measure a unidimensional construct [13][14][15]17,25,26], and it is as sensitive [14] or more sensitive in detecting drug-placebo or drug-drug differences [27,28]. The authors of a recent study with older adults that compared six depression scales concluded that the HAM-D 6 was the only one to demonstrate total scalability, and that it had the greatest external validity [18].
This scale , may be especially appropriate for use by both older persons and clinicians; its relative brevity makes it comparatively easy for older persons to complete and clinicians to administer. However, to the best of our knowledge, the psychometric properties of responses to the Hebrew HAM-D 6 had yet to be examined. Thus, the current study examined and compared self-report and clinician responses to the Hebrew HAM-D 6 for elderly patients.

Scale translation
The HAM-D 6 was first translated from English to Hebrew by a bilingual psychologist, in keeping with accepted procedures [29]. The translated version was back translated and modified until it was comparable to the original version.

Training procedures
Two graduate research assistants completed a three-day training course in the administration of study measures. After watching a training tape and receiving instructions, they administered study measures in mock interviews until acceptable inter-rater reliability was established vis-à-vis semi-structured clinical assessments. Research assistants' HAM-D 6 scores did not significantly differ from corresponding patient HAM-D responses suggesting no discernible between-rater differences, χ 2 (df = 1) = 1.31, p = .25.

Recruitment
Participants were recruited in the waiting rooms of two primary care clinics operated by Clalit Health Services (Israel's largest health insurance provided serving 53% of the population). One clinic is located in the center and the other in the north of Israel (Tel Aviv and Haifa, respectively). Inclusion criteria were: 60+ years of age, fluent in Hebrew, and no pronounced cognitive loss (determined using a 6-item screening measure [30]). Participant recruitment took place between May, 2008 and February, 2009.
Research assistants approached patients to request their participation in this study. Participation was voluntary and no remuneration was provided. Those who took part provided written consent. This study was approved by the Helsinki Committee of the Clalit Health Care Services.

The Structured Clinical Interview for DSM-IV (SCID-I)
The SCID-I is a semi-structured interview to assist clinicians in making a DSM-IV Axis I diagnosis [31]. Only those modules pertaining to depression and dysthymia were administered in the present study. The Hebrew version of the SCID-I was translated and validated by Shalev et al. [32]. All study participants were interviewed using this instrument.

The 6-item Hamilton (HAM-D6)
The self-and clinician-administered versions of the HAM-D 6 measure depressed mood, self-esteem and guilt, social interaction and interests, psychomotor retardation, anxiety, and somatic symptoms. Items are provided along 5-point scales, with the exception of the somatic symptoms item (where responses were provided on a 3-point scale). As a screening measure, scores of 7+ suggest clinically significant depressive symptomatology [33]. Whereas the self-report HAM-D 6 is based solely on patient responses, the clinician-administered version integrates patients' responses and clinical observation.

Analytic strategy
We set out to ascertain if the HAM-D 6 measures a unidimensional construct, as proposed by Bech et al. [25]. This hypothesis was tested using confirmatory factor analyses. Both self-and clinician-administered versions of the HAM-D 6 were next compared to assess the relative contribution of items to measurement (invariance or equivalence analyses). Subsequent analyses were undertaken comparing responses for each patient (self and corresponding clinician HAM-D 6 responses). Comparisons between SCID diagnoses of a major depressive episode and the patient HAM-D 6 responses were made to estimate sensitivity and specificity of the scale. Lastly, item-level analyses were computed (intraclass correlation coefficients) to determine if there was agreement between patients and their clinicians for each item.

HAM-D 6 as a screening measure
As previously mentioned, Bech et al. [33] suggest that a HAM-D 6 score of 7+ is suggestive of clinically significant depressive symptoms (i.e., warranting thorough clinical assessment). Comparing patient and clinician ratings, agreement as calculated using the kappa coefficient was in fair range (k = .26; [34]). Where there was a discrepancy between the two, 13 patients provided responses in clinical range, whereas physicians' responses indicated these patients were euthymic. A similar finding emerged comparing patient HAM-D 6 responses with SCID diagnoses of a current major depressive episode (k = .20; linear weighted). Where there was a discrepancy, 14 patients provided HAM-D 6 responses in clinical range, while the SCID diagnoses indicated no major depressive episode. However, these percentages indicate 100% sensitivity for the patient version of the HAM-D 6 (true positives) and 91% specificity (true negatives).
Moreover, each of the six items contributed significantly to measurement of a single higher-order construct (i.e., all item t values > 1.96); see Figures 1 and 2. For both patient and clinician versions, the HAM-D 6 appears to measure a unidimensional depression construct.
Next, invariance analyses were undertaken to compare solutions between CFA models. These analyses indicated that responses did not significantly differ for 4 of 6 items. However, responses for the social interaction and interests and psychomotor retardation items did differ. Both contributed to measurement of depression as reported by patients to a greater degree than that reported by the clinicians. See Table 1.

Intra-class correlation coefficients
Intra-class correlation coefficients (ICC) were next computed to directly compare HAM-D 6 ratings for patientclinician pairings (i.e., patient self-report vs. corresponding clinician ratings for that patient). ICC values were within adequate parameters for items 1-3 (depressed mood, self-esteem and guilt, social interaction and interests), low for items 5-6 (anxiety, somatic symptoms), but very low for item 4 (psychomotor retardation). This is consistent with invariance analyses reported above, see Table 2.

Discussion
The goal of this study was to assess the psychometric properties of self-report vs. clinician versions on the Hebrew HAM-D 6 . Results indicated that each of the six scale items contributed significantly to the measurement (both for patients and clinicians) and that HAM-D 6 responses indeed measure a single depression construct. These findings are in accord with previously reported findings [13][14][15]25,26,33].
Comparing clinician and patient HAM-D 6 responses indicate satisfactory correspondence between the two. Moreover, when patient HAM-D 6 responses were compared to SCID diagnoses of major depressive episodes, sensitivity and specificity were measured as 100% and 91%, respectively.  These findings suggest that a 7+ HAM-D 6 score is an effective threshold value. Most notably, responses by older adults, themselves, enable effective depression screening between euthymic patients and those reporting pronounced depressive symptomatology.
In addition, findings indicate that responses do not differ significantly for 4 of the 6 items suggesting that patients and clinicians appear to interpret and respond to these HAM-D 6 items in a consistent manner. Furthermore, the intra-class correlations for 5 of the 6 items were found to be above 0.60. This congruence between patients and clinicians for most scale items implies that patients' responses can be trusted and accepted as a valid evaluation of depression.
Responses do differ, however, for the social interaction and interests and psychomotor retardation items. For both items, patients' responses contributed more to the measurement of depression than clinicians' responses. Furthermore, the intra-correlation coefficient for the psychomotor retardation was found to be very low, but for the social interaction and interests item, an adequate correlation emerged.
In light of these intriguing results, we re-examined the Hebrew translations in order to ascertain where refinements are warranted. In English, the second response option for the social interaction and interests item reads: "I have felt that I have had difficulty performing my daily activities, but I was still able to perform them with great effort." The current Hebrew wording translates to: "I had difficulty performing my daily activities, but I was still able to perform routine activities".
The fourth response of this item in English reads: "I have not been able to do any of the simplest day-to-day activities without help," and the current Hebrew wording translates to: "I have not been able to do any of the simple day-to-day activities without help." Although the difference appears minimal, it might have had an effect on the results.
In English, the third and fourth response options for the psychomotor retardation item reads: "I have felt clearly slowed down or subdued or have been talking much less than usual," and "I have hardly been talking at all or feel extremely slowed down at the time." The corresponding Hebrew wording translates to: "I have felt clearly slowed down or passive and have been talking much less than usual," and "I have hardly been talking at all and feel extremely slowed down all the time." We recommend that corrections in translation be made for future studies using the self-report Hebrew HAM-D 6 .
Several limitations of the study need to be acknowledged: a) we do not have data on non-participants and cannot compare this group to our sample, b) we do not have medication data for this sample, c) this is a relatively small sample size, and d) the research assistants that assessed the participants SCID were aware of their HAM-D 6 scores. Therefore, future studies need to examine the Hebrew HAM-D 6 with larger samples of participants from different age groups derived by random recruitment.

Conclusion
Nonetheless, in the light of our results, the Hebrew HAM-D 6 can be used to measure and screen depressive symptoms among elderly persons. Future psychometric research is required to ascertain whether the above suggested revisions will further improve the psychometric properties of responses to this Hebrew version of the HAM-D 6.