Reliability, Validity and Psychometric Properties of the Greek Translation of the Center for Epidemiological Studies-Depression (CES-D) Scale



The aim of the current study was to assess the reliability, validity and psychometric properties of the Greek translation of the Center for Epidemiological Studies- Depression Scale (CES-D).


40 depressed patients 29.65 ± 9.38 years old, and 120 normal controls 27.23 ± 10.62 years old entered the study. In 20 of them (12 patients and 8 controls) the instrument was re-applied 1-2 days later. Translation and Back Translation was made. Clinical Diagnosis was reached by consensus of two examiners with the use of the SCAN v.2.0 and the IPDE. Statistical Analysis included ANOVA, the Pearson Product Moment Correlation Coefficient, Principal Components Analysis and Discriminant Function Analysis and the calculation of Cronbach's alpha (α)


Both Sensitivity and specificity exceed 90.00 at 23/24, Chronbach's alpha for the total scale was equal to 0.95. Factor analysis revealed three factors (positive affect, irritability and interpersonal relationships, depressed affect and somatic complains). The test-retest reliability was satisfactory (Pearson's R between 0.45 and 0.95 for individual items and 0.71 for total score).


The Greek translation of the CES-D scale is both reliable and valid and is suitable for clinical and research use with satisfactory properties. Its properties are similar to those reported in the international literature. However one should always have in mind the limitations inherent in the use of self-report scales.

The Center for Epidemiological Studies- Depression Scale (CES-D) [1] is a well known and widely used self-rating scale for the measurement of depression. Along with the Beck Depression Inventory [2] and the Zung Depression Rating Scale [3], these are the most popular self-administered instruments for the assessment of depression. These scales are supposed to be used as screening tools rather and not as substitutes for an in-depth interview [4]. They can also be an efficient tool for screening patients for depression [5] and have been used successfully for many years in the primary care setting. Higher scores on this scale are indicative of more severe depression [6]

The CES-D is a self-reporting instrument and was originally developed in order to assess depression symptoms without the bias of an administrator affecting the results. The items in the CES-D scale may also help patients begin to discuss previously nebulous symptoms, especially those patients who present with physical symptoms of depression such as headache or insomnia. CES-D consists of 20 items that cover affective, psychological, and somatic symptoms. The patient specifies the frequency with which the symptom is experienced (that is: a little, some, a good part of the time, or most of the time) [7].

The aim of the current study was to assess the reliability, validity and psychometric properties of the Greek translation of the Center for Epidemiological Studies- Depression Scale (CES-D)

Material and Methods


Forty patients (25 males and 15 females) aged 29.65 ± 9.38 years (range 18-55) suffering from Major Depressive disorder according to DSM-IV [8] and depression according to ICD-10 criteria [9], and 120 normal controls (71 males and 49 females aged 27.23 ± 10.62 years (range 18-51) entered the study. In 20 of them (12 patients and 8 controls) the instrument was re-applied 1-2 days later.

Patients and controls were free of any medication for at least two weeks and were physically healthy with normal clinical and laboratory findings (Electroencephalogram, blood and biochemical testing, thyroid function, test for pregnancy, 12 and folic acid).

Patients came from the inpatient and outpatient unit of the 3rd Department of Psychiatry, Aristotle University of Thessaloniki, General Hospital AHEPA, Thessaloniki, Greece. They were consecutive cases and were chosen because they fulfilled the above criteria.

The normal control group was composed of members of the hospital staff and relatives of patients. A clinical interview confirmed that they did not suffer from any mental disorder and their prior history was free from mental and thyroid disorder.

All patients and controls provided written informed consent before participating in the study.


Translation and Back Translation was made by two of the authors, one of whom did not knew the original English text. The final translation was fixed by consensus.

Clinical Diagnosis was reached by consensus of two examiners. The Schedules for Clinical Assessment in Neuropsychiatry (SCAN) version 2.0 [10,11] and the International Personality Disorders Examination (IPDE) [12,13,14] were used. Both were applied by one of the authors (KNF) who has official training in a World Health Organization Training and Reference Center. The IPDE did not contributed to the clinical diagnosis of depression, but was used in the frame of a global and comprehensive assessment of the patients. The second examiner performed an unstructured interview.

Statistical Analysis

Analysis of Variance (ANOVA) [15], was used to search for differences between groups. The Pearson Product Moment Correlation Coefficient R was calculated to assess the test-retest reliability. Principal Components Analysis (Varimax Normalized Rotation) was performed, and factor coefficients and scores were calculated. Finally, Discriminant Function Analysis was performed as well.

Item Analysis [16] was performed, and the value of Cronbach's alpha (α) for CES-D and its factor subscales was calculated. Receiver Operator Characteristic Curves (ROC curves) and histogram of frequencies were created as well.


The calculation of sensitivity (Sn) and specificity (Sp) at various cut-off levels showed that both variables exceed 90.00 at 23/24, with 109 controls and 36 patients correctly classified. Eleven controls and 4 patients were classified into a wrong diagnostic group (table 1). Receiver Operation Curve Analysis (figure 1) confirmed these results.

Table 1 Sensitivity and Specificity of CES-D at various cut-off levels.

Chronbach's alpha for the total scale was equal to 0.95, and this is a very high value, suggesting that the CES-D scale reflects a single structure.

The histogram of CES-D scores in control subjects reveals that they do not follow the normal distribution in this population, but manifest a skew towards lower values (figure 2).


The factor analysis of cases (varimax normalized rotation) revealed three factors (table 2). The first one includes items No 3, 4, 8, 12, 14 and 16, largely reflects a factor of positive effect, and explains 22% of variability. The second one includes items No 1, 11, 15 and 19, largely reflects a factor of irritability and problems with interpersonal relationships, and explains 13% of variability. The third factor includes items No 1, 2, 3, 5, 6, 7, 9, 10, 11, 13, 14, 17, 18 and 20 and reflects depressed affect and somatic complaints. It explains 31% of total variability. Factor loadings and coefficients are shown in table 2. All three factors explain 66% of total CES-D variance.

Table 2 Factor loadings, Factor scores, coefficients and Sum of Factor items after Factor Analysis (Varimax normalized rotation), of controls and patient data.

Chronbach's alpha for the individual factors (subscales that include the items that load in each one) was excellent. The factor 1 items had alpha equal to 0.91, those of factor 2 equal to 0.76 and those of factor 3 equal to 0.94.

Depressed patients did not differ from controls in age. On the contrary they differed in every CES-D individual item score and total score (p < 0.001- table 3). It is very interesting that the two groups did not differ in the scores of any of the factors that emerged. Only factor 3 showed a tendency towards significance (table 3). However the two groups differed in all scores that derive from the sum of items that group under each factor (p < 0.001).

Table 3 Greek translation of the CES-D and comparison between controls and patients.

The test-retest reliability proved to be satisfactory. Individual items had good Pearson correlation coefficients with lower for item No 1 (R = 0.45) and higher for item No 4 (R = 0.95). The coefficient for the total CES-D score was very good and equal to 0.71.

Discriminant function analysis results are shown in table 4. Two separate analyses were performed, with the forward stepwise method, one with individual CES-D items and a second with factor scores. The first one performed excellently while the second one was very poor. The results of the first one suggest that when the D-C equation, that is:

1.43*(It2)+1.01*(It5)+1.13*(It6)+0.63*(It7)+0.94*(It9)+0.65*(It10) +0.96*(It11)+1.07*(It13)-0.61*(It14)+1.32*(It17)-1.20*(It19)-0.83*(It20)

takes values above 9.03, then the subject is a depressed patient. This method correctly classified 98.33% of controls and 87.5% of patients.

Table 4 Discriminant Function Analysis Results.


Self-administered scales heavily depend on the co-operation and reading ability of the patient. On the other hand they save time for the clinician. The reliability and validity of the CES-D has been examined in only a limited number of studies and not many translations of this scale have been published. More, translations are difficult to access because of publication in various languages and local journals. The same is true for other scales, like the Zung Depression Rating Scale [17,18,19,20].

Although the Center for Epidemiologic Studies Depression Scale (CES-D) is an internationally popular self-rating scale for depression both in community and clinical settings, extend literature concerning its transcultural reliability and validity is limited. The current study reports observations on the reliability, the validity and psychometric properties of the Greek translation of the Center for Epidemiological Studies- Depression Scale (CES-D). The results suggest that this translation is well suited for use in the Greek population with high sensitivity and specificity at the cutoff level 23/24, high test-retest reliability and high internal consistency. Its factor structure is similar to structures reported in the literature.

Apart from the full version, also a 10-, 8- and 4- item versions exist [21,22,23], with comparable accuracy to the original CES-D in classifying cases with depressive symptoms [24].

Because the overlap with symptoms of physical diseases is very limited, the CES-D can be used in physically ill populations [25], so it has been used widely in general medical populations [26,27] and pain patients [28]. Acculturation constitutes a more complex problem [29]. Data indicate that youths who spoke only or mostly English reported lower rates of depression and suicidal ideation, suggesting that acculturation may play a role as well [30]. Also, irrespective of the scale used, a gender difference is found across the ethnic groups, in which girls expressed depressive feelings more than boys [31]. Various papers report on the study of the effect of race and sex [32,33,34,35,36,37,38,39,40,41], but results are difficult to interpret.

The Center for Epidemiologic Studies Depression scale (CES-D) has been widely used in studies of late-life depression, but geriatric data are considered insuficient [42,43,44,45,46,47,48,49,50]. Psychometric properties reported are generally favourable [51], but data on the criterion validity of the CES-D in elderly community-based samples are not sufficient.

The Dutch translation manifested satisfactory properties for use in the elderly with Cronbach's alpha 0.80-0.90 [52], which is comparable to the results of the current study, and the Japanese translation proved to be suitable for the detection of major depressive episodes among first-visit psychiatric patients [53].Generally the CES-D has moderate convergent and discriminant validity to detect major depressive episodes among first-visit psychiatric patients and complex methods may be essential [54].

The CES-D was confirmed as essentially unidimensional and robust to minor changes; therefore, it is recommended for use in cross-cultural studies of depression in elderly persons. The original four-factor solution proposed by Radloff was successfully replicated for Australians, showing similar underlying structures as for Americans, Canadians, and Japanese [55].

A moderate correlation between the CES-D and self-esteem and state anxiety. However, a high correlation was obtained between the CES-D and trait anxiety, which suggests that the CES-D measures in large part the related conceptual psychological domain of predisposition for anxiousness [56].

The Spanish trial reported 0.9 alpha,, and the factor analysis showed 4 factors who explain the 58.8% of the variance: "depressed Affect/Somatic", "Positive Affect", "Irritability/Hopelessness", "Interpersonal/Social". The scale shows a 0.95 sensibility and 0.91 specificity to depressive symptomatology detection (according to scores equal or over 9 on HRSD) taking as cutoff scores equal or over 16 on CES-D [57]. The publication of the Spanish version boosted research in Mexican Americans [58,59,60,61,62,63].

In Chinese geriatric patients the correlation with the Geriatric Depression Scale was 0.96 [64]. Chen et al [65] studied whether an instrument developed in the U.S. may identify lower rates of major depression among young Chinese, because its content may not cover culture-specific symptoms of depression. The authors concluded that the lower prevalence of depression was not due to the ethnocentric character of the instrument in the Chinese sample. Similarly, data add to growing evidence that Mexican American youths are at increased risk of depression, and this is not an artificial product of the CES-D [66].

The Italian validation study [67] was carried out in northern Italy with 40 depressives and 40 matched normals and showed that the CES-D is a valid measure in that it sensitively discriminates between depressed patients and normals and presents satisfactory correlations with the observer rating scale (HRSD) in both groups.

Large-scale studies revealed that neither age, gender, cognitive impairment, functional impairment, physical disease, nor social desirability had a significant negative effect on the psychometric properties or screening efficacy of the CES-D [68].

The factor analysis of the Japanese version [69] of the CES-D using data obtained from 2,016 adult employees aged 19-63 years extracted 4 factors for each age group. Depressive affect items did not group into one factor; some were combined with somatic or interpersonal items, and the remainder constituted the smallest factor. These three main factors, 'somatic+depressed', interpersonal + negative' and 'positive affect' were comparable across age groups except for those aged 50-63 years. For those aged 50-63 years, the first two factors were combined into a large 'general dysphoria' factor, suggesting a more unified conceptualization of depressive mood. Although 'positive affect' was stable cross-culturally, it was not related to depressive symptomatology as measured by the other items, for Japanese. The 'interpersonal + negative' appears unique for Japanese, indicating the association of interpersonal relations with depressive mood in Japanese. These results are impressively very close to the results of the current study.

Comparison of the response patterns on the CES-D items between Japanese adolescents and with those of their U.S. counterparts (1,500 junior high school students, aged 12-15 years) showed that Japanese responses to positively worded items markedly differed from those of American adolescents, whereas responses to negatively worded items were comparable in the two groups. This resulted in poor psychometric properties for the CES-D and spurious higher positive subscale and whole scale scores among the Japanese sample. It is possible that Japanese respondents tend to suppress positive affect expression and, thus, the positively worded questioning of the CES-D is presumably inappropriate for Japanese samples [70]. There were differences in patterns of the CES-D item endorsement between diverse ethnocultural groups as indicated by principal component factor analysis of the results of 2200 persons 12-17 years of age. Anglo- and African Americans exhibited similar factor structure, represented by negative affect, positive affect, and psychosomatic symptoms. Two Hispanic groups also exhibited a three-dimensional pattern, but there was a tendency among Hispanic adolescents for somatic symptoms and negative affect symptoms to cluster together. This pattern may indicate a more prominent role of somatic complaints in the presentation of depression among Mexican Americans and other Hispanics [71], and this is similar with the findings of the current study.

Review studies on various self-administered instruments suggest that there is no significant difference between them in terms of performance and overall sensitivity is around 84% and specificity around 72% [72]. These instruments are of particular value in primary care settings because it is clear that primary care providers fail to diagnose and treat as many as 35% to 50% of patients with depressive disorders [73,74]. Depression is one of the most common psychiatric diagnoses in primary care populations [75]; major depressive disorders can be diagnosed in 6% to 9% of such patients. Obstacles to the appropriate recognition of depression include inadequate provider knowledge of diagnostic criteria; competing comorbid conditions and priorities among primary care patients; time limitations in busy office settings; concern about the implications of labeling; poor reimbursement mechanisms; and uncertainty about the value, accuracy, and efficiency of screening mechanisms for identifying patients with depression. Given that 50% to 60% of persons seeking help for depression are treated exclusively in the primary care setting, accurate detection in this setting is important [76] and self -administered instruments may help to ameliolate some of them. Many studies have assessed the effect of feedback of scale scores on physician practice patterns [77,78,79,80,81,82,83,84,85,86] and have shown improved recognition of depression with such feedback.

On the other hand, it should be noted that the diagnosis of depression is itself based on symptoms. A patient cannot truly be asymptomatic and have major depressive disorder. Thus, these screening questionnaires are actually being evaluated for their ability to detect unrecognized, rather than strictly asymptomatic, depressive symptoms and disease.

The Canadian Task Force on the Periodic Health Examination found fair evidence to exclude the use of depression detection tests from the periodic health examination of asymptomatic people [87]. The American Academy of Family Physicians advises physicians to remain alert for depressive symptoms in adolescents and adults [88]; this policy is under review. The American Medical Association recommends that all adolescents be asked annually about behaviors or emotions that indicate recurrent or severe depression [89].

The agreement between the CES-D scale and the DIS diagnoses of major depressive disorder (MDD) and generalized anxiety disorder (GAD) was poor, especially among Mexican-origin patients interviewed in Spanish. Multiple regression analysis revealed that the CES-D scale was positively associated with MDD in all groups. In addition, GAD also was associated with the CES-D scale in Anglos and English-speaking Mexican-Americans but not in Spanish-speaking Mexican-Americans [90].

The results indicate no systematic variation in either reliability (test-retest, internal consistency), dimensionality, or ability of the CES-D Scale to detect clinical depression among Anglos or persons of Mexican origin classified according to language use as Spanish dominant, English dominant, or bilingual. The available evidence suggests that the ability of the CES-D Scale to detect major depression is so limited that further use of the instrument as a screening scale would seem unwarranted, at least in treatment settings [91].


The Greek translation of the CES-D scale is both reliable and valid and is suitable for clinical and research use with satisfactory properties. However one should always have in mind the limitations inherent in the use of self-reporting scales.


