Measuring depression with CES-D in Chinese patients with type 2 diabetes: the validity and its comparison to PHQ-9

Background The validity of the 20-item Center for Epidemiological Studies Depression (CES-D) scale for depression screening in Hong Kong Chinese patients with type 2 diabetes remains unknown. We aimed to validate CES-D, compare its psychometric properties with the 9-item Patient Health Questionnaire (PHQ-9), and explore whether one of the two is more suitable for depression screening in Chinese patients with type 2 diabetes. Methods Between June 2010 and July 2011, 545 consecutive Chinese patients with type 2 diabetes who underwent structured comprehensive assessments completed the CES-D and PHQ-9. Forty patients were retested within 2–4 weeks by telephone interview and 97 patients were randomly selected to undergo the Mini International Neuropsychiatric Interview (MINI) by psychiatrists for clinical diagnosis of depression. Results The internal consistency (Cronbach’s α) of CES-D was 0.85, with a test-retest correlation coefficient of 0.64. The area under the curve for CES-D compared to the clinical diagnosis of major depression was 0.85. A cut-off score of ≥21 for CES-D provided the optimal balance between sensitivity (78.3 %) and specificity (74.3 %) and identified 17.8 % (n = 97) of patients with depression. CES-D and PHQ-9 showed moderate agreement in depression screening (Cohen’s Kappa: 0.45). Compared to non-depressed patients, those who screened positive by PHQ-9 had a higher HbA1c whereas the glycemic differences were not significant when using CES-D. Conclusion The CES-D is a valid screening tool for depression in Chinese type 2 diabetic patients although the PHQ-9 was more discriminative in identifying those with suboptimal glycemic control.


Background
Depression and type 2 diabetes are complex diseases with rising prevalence [1,2]. These two chronic conditions frequently coexist resulting in increased risk of morbidity and mortality with major negative implications on the individuals, families and society [3,4]. International diabetes guidelines now recommend screening for psychosocial problems including depression, especially when self-management is poor [5,6]. The 20-item Center for Epidemiological Studies Depression (CES-D) scale and the 9-item Patient Health Questionnaire (PHQ-9) are two most widely used self-administered instruments for depression screening [7,8]. Originally developed for a general population in a Western setting, both instruments have been validated in other populations including American and Hong Kong Chinese community-dwelling individuals [9][10][11][12][13].
There is emerging evidence suggesting that ethnicity, culture, and acculturation may lead to response bias in these instruments [13][14][15]. We previously reported the validity of PHQ-9 for depression screening in Hong Kong Chinese patients with type 2 diabetes and reported a lower cutoff point (≥7) for significant depressive symptoms than the conventional one (≥10) which was first validated in primary care settings and obstetrics-gynecology clinics in U.S. [13,16]. However, there is a paucity of data on the performance of CES-D in Chinese patients with type 2 diabetes. In this study, we aimed to validate CES-D and compare its psychometric properties with PHQ-9 in community-dwelling Chinese patients with type 2 diabetes in Hong Kong.

Subjects and setting
The study design and patient recruitment have been described previously [16]. In brief, 601 Chinese outpatients with type 2 diabetes aged 25-75 years were recruited consecutively from a hospital-based (Prince of Wales Hospital) and a community-based (Yao Chung Kit Diabetes Assessment Centre) diabetes center between June 2010 and July 2011. All patients underwent a 4-hour diabetes complication assessment using a structured protocol provided by the Joint Asia Diabetes Evaluation Program [17][18][19][20]. They were also invited to complete a set of questionnaires to assess their psychological wellbeing. Significant medical and psychiatric history, social history, family history of diabetes, and medication records were documented. Urine and blood samples were collected after overnight fast for plasma glucose, glycated hemoglobin (HbA 1c ), total cholesterol, low densitylipoprotein cholesterol (LDL-C), high density-lipoprotein cholesterol (HDL-C), triglycerides, renal function, and urinary albumin-to-creatinine ratio (ACR). This study was approved by the ethics committee of The Chinese University of Hong Kong, and all patients gave informed consent.

Psychological assessment
Symptoms of depression were assessed by the CES-D and PHQ-9 questionnaires. The CES-D scale is a 20item self-reported instrument developed by Radloff in 1977 [7]. It measures the frequency of common depressive symptoms over the past week. Each item is scored from 0 (rarely or none of the time, less than one day) to 3 (all of the time, 5-7 days). The four positively stated items (item 4, I felt that I was just as good as other people; item 8, I felt hopeful about the future; item 12, I was happy; item 16, I enjoyed life) are reverse-coded for calculating the total score which ranges from 0 to 60. The cut-off value of ≥16 has been widely used to define clinically meaningful depressive symptoms [7,21]. It was reported to have 96.8 % sensitivity and 67.6 % specificity for clinical depression in Chinese type 2 diabetic patients attending a diabetes centre in Singapore [22]. The PHQ-9 focuses on the frequency of occurrence of 9 depressive symptoms derived from DSM-IV diagnostic criteria over the past two weeks [8]. Each item is scored from 0 (not at all) to 3 (nearly every day), with a total score ranging from 0 to 27. A cutoff value of 10 has been widely used to define probable depression, with 88 % sensitivity and 88 % specificity in the original validation study with majority of participants being Caucasians [8]. In our previous criterion validation in the same group of 99 patients, we identified the optimal value of 7 with 82.6 % sensitivity and 73.7 % specificity [16].
The process of this study has been described previously [16]. Briefly, 40 patients were randomly selected for CES-D and PHQ-9 retest within 2-4 weeks by telephone survey. Another randomly selected subset of patients was referred for assessment by psychiatrists (Dr Marco Lam and Dr Siu-ping Lam) using the Chinese version of Mini International Neuropsychiatric Interview (MINI version 6.0), a short structured diagnostic interview that has been validated and is widely accepted for diagnosing major depression in a research setting [23,24]. Due to manpower issue, patient was assessed by one psychiatrist only and we were not able to calculate the inter-rater reliability. However the two psychiatrists received training for MINI together. Pilot interview was conducted in three patients and it showed 100 % agreement in the MINI diagnosis for depression.

Statistical analyses
All analysis was performed using the Statistical Package for Social Sciences (SPSS version 20.0, IBM). Data were expressed as mean ± SD, median (interquartile range) or number (%), as appropriate. The Student's t-test, Mann-Whitney U test and Chi-square tests were used for group comparisons. Cronbach's α was calculated to evaluate the internal consistency. Pearson correlation coefficients were used to measure test-retest correlation and concurrent validity of the PHQ-9 and CES-D as appropriate. Item discrimination was tested by corrected item-total correlation using the Pearson product-moment correlation formula, which has been incorporated into SPSS Reliability analysis. Exploratory factor analysis (EFA) with eigenvalue >1 criteria was performed to establish the construct validity. An oblique (Promax) rotation was used in the EFA based on the assumption that the CES-D would have correlated factors. The response agreement between PHQ-9 and CES-D was evaluated with Cohen's kappa. Receiver Operator Characteristic (ROC) analysis was used to determine the diagnostic performance and optimal cutoff score for screening major depression against MINI-based clinical diagnosis. AP value < 0.05 (2-tailed) was considered significant.

Psychometric properties of the CES-D Internal reliability and item discrimination
The internal consistency (Cronbach's α) of CES-D was 0.85, with test-retest correlation coefficient (r) of 0.64 (P < 0.001), similar to PHQ-9 (α = 0.87, r = 0.70, P < 0.001) as shown in our previous study [16]. After removal of the four positive affective items in the CES-D the internal consistency of CES-D questionnaire increased to 0.91. The corrected item-total correlations for individual CES-D items ranged from 0.17 (item 4: feeling good) to 0.66 (item 6: depressed) ( Table 2), lower for the four positive items than other items. The corrected item-total correlations for individual PHQ-9 items ranged from 0.48 to 0.68 [16].

Construct validity and item scores of CES-D
The Kaiser-Meyer-Olkin test of sampling adequacy was 0.91 and Bartlett's test of sphericity was significant (X 2 = 5042.6, P < 0.001). EFA using Promax rotation procedure yielded a four-factor structure for the CES-D according to the "eigenvalue >1" rule: 1) depressed affect, 2) somatic symptoms, 3) positive affect, and 4) interpersonal problems. The scree plot was shown as Fig. 1. This four-factor model accounted for 61.1 % of the scale variance, with factor loadings ranging from 0.62 to 0.88 ( Table 2).
The mean CES-D total score was 13.0 ± 8.6 (median12.0, IQR 7.0-17.0), with individual item scores ranging from 0.14 to 1.65. The four positive affect items scored much higher than the other items, with a mean factor score of 5.9 ± 3.9, accounting for 50 % of the total CES-D score. The positive affect factor did not correlate with the somatic symptoms (r = −0.05,P = 0.275), depressed affect (r = 0.04, P = 0.414), or interpersonal problems (r = −0.02, P = 0.703); the latter three factors were significantly inter-correlated, with correlation coefficients ranging from 0.49 to 0.75 (all P < 0.001) ( Table 2)

Diagnostic validity
Among the 97 patients who were interviewed by two psychiatrists, 23 patients had a clinical diagnosis of current major depressive episode as measured by the MINI. The area under the curve (AUC) upon ROC analysis was 0.85(95%CI: 0.77-0.92) (Fig. 2). The standard cut-off score of ≥16 for CES-D had an excellent sensitivity (91.3 %) but low specificity (60.8 %) compared to the MINI diagnosis of major depressive episode. As shown in Table 3, a cut-off score of ≥21 on CES-D yielded an optimal balance between sensitivity (78.3 %) and specificity

Comparison of CES-D and PHQ-9
If the conventional cut-off points (score of ≥16 for CES-D and ≥10 for PHQ-9) were adopted, 31.0 % and 9.0 % of patients were respectively identified to have depression by CES-D and PHQ-9, with fair chance-corrected agreement between the two instruments (Cohen's kappa: 0.32). However if we used a score of 21 as the cut-off for CES-D, 17.8 % of patients were identified to have possible depression, similar to the depression prevalence reported in the same group of patients using the PHQ-9 ≥ 7 [16], with moderate chance-corrected agreement between the two instruments (Cohen's kappa: 0.45). The overlap of depressive symptom screen positivity is illustrated in a Venn diagram (Fig. 3), where just over a third of the 545 patients were captured by both PHQ-9 ≥ 7 and CES-D ≥21.

Discussion
To our best knowledge, this is the first study to validate CES-D and systematically compare it with the PHQ-9 for depression screening in Chinese patients with type 2 diabetes. The internal consistency, test-retest reliability, PHQ-9 Patients Health Questionnaire-9, CES-D 20-item Center for Epidemiological Studies Depression, SMBG Self-monitoring of blood glucose, BP Blood pressure, eGFR estimated glomerular filtration rate, ACR albumin-to-creatinine ratio, OAD Oral antidiabetic drugs Data are shown as mean ± SD, number (%) or median (interquartile range) The definitions of risk factors and complications were as follows: hypertension = known high blood pressure with or without treatment and/or blood pressure ≥ 130/80 mmHg; dyslipidaemia = LDL-C ≥ 2.6mmol/L, HDL-C < 1.0 mmol/L, triglycerides ≥ 2.3 mmol/L, or on any lipid-lowering agents; albuminuria = sport urine albumin/creatinine ratio ≥ 2.5 mg/mmol in men or ≥ 3.5 mg/mmol in women; chronic kidney disease = eGFR <60 ml/min/1.73 m 2, cardiovascular disease = coronary heart disease, stroke, and/or peripheral vascular disease and diagnostic performance of CES-D was comparable with that of the PHQ-9, which had been validated in the same population [16].

Factor structure of CES-D
The four-factor structure of CES-D was similar to the original one proposed by Radloff [7], except that item 20 ("get going") loaded in the depressed affect factor but not the somatic symptom factor in our Chinese patients with type 2 diabetes. These subtle variations in factor structure might be due to racial/ethnic or culture differences [25,26]. In a meta-analysis on CES-D factor structure, people from different cultures might conceptualize depressive symptoms in different ways. Besides, the analytic methods used to load various factors, e.g. confirmatory factor analysis (CFA) versus EFA, could also influence the results of CES-D factor structure [25]. In another study involving 138 Hong Kong Chinese married couples (aged 22-70 years) which used CFA to validate the CES-D, the authors reported a 2-factor model (depression and interpersonal problems) [11] compared to the 4-factor model in our study. In this study, the positive affect factor did not correlate with the other three factors (inter-factor correlation ranged from −0.05 to 0.04), contrary to many other studies in United States (ranged from 0.31 to 0.85) [27][28][29]. Traditional Chinese and Oriental culture emphasizes modesty, silence, stoicism, and emotional restraint. These beliefs might influence our patients not to endorse positively-stated items in the CES-D (e.g. "I was happy", the score was reversed during calculation) despite having other negative symptoms, leading to an elevated score on positive affect problems. Our results are consistent with other studies in Chinese subjects [14,30]. In a study of 168 community-dwelling American Chinese women, native Chinese speakers or Chinese immigrants were 50 % less likely to endorse the four positive items than English speakers or subjects born in United States, albeit having similar mean scores for the other 16 items [14]. This discrepancy has also been reported in other studies probably due to culture influences [30,31]. American Koreans who were less acculturated to American views were less likely to endorse positive CES-D items than the more acculturated ones [32]. Compared to Americans, Japanese had spuriously lower ratings of positive items whereas the scores for the negative items were comparable between the two groups [30]. Taken together, these cultural or ethnic factors seemed to affect responses to the four positive affect items which might compromise the validity of CES-D. In support of this, the performance of CES-D improved upon exclusion of these 4 positive affect items suggesting that the 16 item CES-D is a better screening tool for depression than the 20 item CES-D, at least in Chinese subjects.

Higher cut-off point of CES-D in Chinese
Our results suggested an optimal cutoff value of 21, which is higher than the widely used cutoff value of 16 (sensitivity 91.3 %, specificity 60.8 %). Different study designs, settings and populations may contribute to differences in the performance of different cutoff values [33].  Consistently, in studies involving Chinese subjects, the latter tended to have higher CES-D cut-off points in depression screening than Caucasian populations [12,22,34]. For example, a CES-D validation study in Hong Kong involving 398 elderly individuals reported an optimal cutoff value of 22 (75 % sensitivity and 51 % specificity), while the conventional cutoff value of 16 had high sensitivity (92 %) but poor specificity (30 %) in detecting depression [12]. In another CES-D validation study in Singaporean Chinese adults using the Schedule for Clinical Assessment in Neuropsychiatry (SCAN) as criterion, the cutoff value of 16 had specificity of 67.6 % only, although the sensitivity was high (96.8 %) [22].

Comparison of CES-D to PHQ-9
In line with other studies [35,36], our findings showed that both PHQ-9 and CES-D showed similar psychometric performances with respect to the internal consistency, test-retest reliability, and diagnostic validity against MINI-based diagnostic interview. Using the standard cutoff point (≥16), CES-D identified more than 30 % of patients with possible depression, much higher than using the standard cutoff point ≥10 of PHQ-9 Compared to conventional cutoff, the CES-D had a higher cutoff value while PHQ-9 had a lower cutoff value. This disparity might be partly explained by the content differences of the two tools. The PHQ-9 is constructed on the basis of the DSM-IV diagnostic criteria for clinical depression; while the CES-D measures depressive symptomatology with emphasis on the affective component and depressed mood [7]. Besides, the PHQ-9 asks about the frequency of depressive symptom in the past two weeks, while the CES-D asks about the frequency of symptoms in the past one week. The shorter duration covered by the CES-D may have captured short-term symptoms including acute hassles and stressors which might not reflect true depression. Besides, since Chinese people tend to give negative response to positive affect items, this may also lead to a higher score of CES-D in our study population.
While the use of PHQ-9 (≥7 or ≥ 10) identified patients with significant depressive symptoms which were associated with poor glycemic control and increased use of lipid-lowering drugs, such association was not found in patients with depressive symptoms detected by CES-D, suggesting that CES-D might identify patients with slightly different profile than those identified by PHQ-9. When evaluating other studies that have examined the association between depression and glycemic control, findings have been mixed where some studies have found an association between depressive symptoms and poor glycemic control whereas others have not [37,38]. Our results raise the possibility that this inconsistency might be in part due to the different tools used in capturing depressive symptoms. Here, negative emotions can be heterogeneous and complex with different combinations of symptoms which may have common but also distinct biological pathways. This complexity may also explain the inconsistency regarding the associations between depression and glycemic control.
Although theoretically, the CES-D and the PHQ-9 may complement one another to detect depression and negative emotions, this may be not feasible in real-world practice given increased time of testing and redundancy of the items. However, since the two tools cover different time frame (1 week versus 2 weeks) with slightly different attributes (eg. interpersonal problems in CES-D and suicidal tendency in PHQ-9) and complementary aspects as revealed by the Venn diagram, it might be useful to explore the possibility of selecting items from both tools to generate a better depression screening algorithm in future studies.
Both PHQ-9 and CES-D had different cutoff values with PHQ-9 being a better tool than the CES-D in identifying Chinese type 2 diabetic patients with both depression and poor glycemic control. These results highlighted the importance of validating screening tools in local settings. The differences in associations of depression with glycemic control between the two instruments also support the syndromic nature of depression due to possible subphenotypes and aetiologies with variable responses to different screening tools. In this study, we also observed a high suicide risk in these patients that 5 % had suicide ideation using the PHQ-9. Consistently, a study in Italy found more severe suicide ideation in patients with diabetes and its severity was closely associated with older age, polytherapy and lower selfefficacy [39]. Therefore, these results warrant depression screening and subsequent emotional support for patients with diabetes.

Limitations
Although the study population comprised of selfreferred patients and those referred by family clinics, the majority of them were attending hospital-based specialist out-patient clinics, who might have more severe disease, longer disease duration, multiple co-morbidities, and more complicated drug regimens than the typical outpatient community-based patients. Thus, our cohort might not fairly represent the general Hong Kong Chinese population with type 2 diabetes. Furthermore, this study was performed in Hong Kong, which has a different health care system and different cultural nuances than that of Mainland China, so caution must be taken when making generalizations to the wider Chinese population. Finally, only a subset of our cohort had a diagnostic interview as the criteria validation measure, so further studies with larger sample sizes are required to confirm these findings.

Conclusions
In summary, the CES-D is a validated tool for detecting major depression in Chinese patients with type 2 diabetes. The improvement in performance after excluding items on positive affect might reflect cultural differences. Between CES-D and PHQ-9, the latter is a preferred screening tool due to its longer coverage period, fewer items associated less administering time, and ability to identify depressed patients with poor metabolic control. The different cutoff values in our population also emphasize the importance of calibrating these tools in different patient groups and settings.