Factor structure and measurement invariance of the Chinese version of the Center for Epidemiological Studies Depression (CES-D) scale among undergraduates and clinical patients

Background The Center for Epidemiologic Studies Depression scale (CESD) was widely used for screening of depressive symptoms. The purpose of the current study was to investigate the factor structure and measurement invariance of the CESD across genders and groups in a sample of Chinese undergraduates and clinical patients. Methods Participants included 3093 undergraduates from the Hunan province and 336 patients from psychological clinics. The structure of the CESD scale was analyzed by confirmatory factor analysis (CFA). Multiple sets of CFAs were used to test measurement invariance across genders among undergraduates and clinical patients. Internal consistency reliability was also evaluated. Results The five-factor model achieved satisfactory fit (in the undergraduate sample: WLSMVχ2 = 1662.385, df = 160, CFI = 0.973, TLI = 0.968, RMSEA = 0.055; in the clinical patients: WLSMVχ2 = 502.089, df = 160, CFI = 0.962, TLI = 0.955, RMSEA = 0.072). The measurement invariance of the five-factor model across genders was supported fully assuming different degrees of invariance. The CESD also showed acceptable internal consistency. Conclusion Due to its sound structure and measurement invariance, the five-factor model of the CESD is best suited for testing in Chinese mainland college students and clinical patients.

The Center for Epidemiologic Studies Depression scale (CES-D), developed in 1977 by Radloff [8], is one of the most widely used self-report scale to assess depressive symptoms. The scale items cover the major components of depressive symptoms and was designed to measure the current level of depression [9]. Thus, the CESD has been widely used in research on children and adolescents and elderly populations, the physically ill and the mentally ill populations [9][10][11][12][13]. The CESD has shown good reliability (Cronbach α = 0.70-0.95, r test-retest = 0.71-0.85) and good validity in different countries [13][14][15]. The Chinese version of the CESD has been reported to be useful for assessing depression among large adolescents and adults [9].
The original version of the CES-D consisted of 20 items, categorized into 4 factors: depressed affect (DA; seven items); somatic complaints (SC; seven items); interpersonal problems (IP; two items); and positive affect (PA; four items) [8]. However, others have proposed two [16], three [9,17] and five [15,18] factor models. In 2006, Shafer conducted four separate metaanalyses based on factor analysis studies of the CES-D including a total of 22,000 participants and found that the original four-factor structure was the most suitable [19]. However, another meta-analysis by Kim et al. (2011) found that the four-factor structure of the CES-D was not appropriate among Asian participants. Besides, the four-factor model has been shown to be the best fitting model across various Chinese factor analytic studies [10,11]. However, Wang et al. found that three-factor model (depressed affect, somatic complaints and positive affect) was the best fitting model among Chinese adolescents [9]. A confirmatory factor analyses indicated that another three-factor structure (positive affect, interpersonal problems, depressive mood and somatic symptoms combined) had good fit in rural Chinese [20]. Thus, the best factor structure of the CES-D among Chinese participants has not yet been determined. It is important to confirmed the best fitted model of the CESD among different samples of China.
Based on the best fitted model, another essential issue that requires further study is whether the CESD has the same structure in different groups and whether its items have the same meaning for across different groups. Previous studies have found differences in CESD scores between male and female college students [12]. A longitudinal study found that a higher percentage of male students endured different degrees of depression compared to female students [21]. In these comparative studies, it is presumed that the measurement of the construct is comparable between male and female. However, since the meaning of items may differ for males and females, it is necessary to establish the measurement invariance of the CESD between different the twpo. Measurement invariance is defined as "a given factorial defined construct has the same measurement parameters across two or more samples (i.e. the loading, intercepts and residual matrix are equal among different groups)" [22,23]. Without evidence of measurement invariance, it cannot concluded that group difference in depression reflected true differences between groups, as the difference may be due to the item bias of the scale [23]. A previous study has demonstrated that the measurement invariance of the CESD was acceptable across gender among non-clinical sample [9], but the result was not generalized to clinical populations.
Thus, the aims of the present study were to test the factor structure and internal consistency reliability of the CESD in undergraduates and clinical patients and to explore measurement invariance of the CESD across genders among the two samples.

Participants
The undergraduate participants came from the Central South University in Changshang. We recruited participants by posters and advertisements. Students who had a history of a mental disorder, a neurological disorder and intellectual disability were excluded. A total of 3158 university students were surveyed, 10 of which were excluded due to mental disorders and 55 of which were excluded due to missing data. The final sample included 3093 (57% males, 43% females), aged 18 to 22 years old [Mean = 19.5, Standard Seviation(SD) = 1.04].
The clinical samples including 353 outpatients who had been referred for the assessment and treatment in a psychological clinic of the Second Xiangya Hospital. The patients who cannot understand the questions well were excluded. A total of 336 patients finished the questions, including 139 (42%) males and 197 (58%) females, aged 16 to 33 years old (Mean = 24; SD = 5.7). The diagnoses of clinical sample were major depressive disorder(38.5%), schizophrenia(10%), obsessive-compulsive disorder(11.8%), a personality disorder(7.4%), an anxiety disorder(14.7%)and other mental disorders(16.9%)as a whole and the frequency distribution of the psychiatric disorders were 31.1%. 13 The data were collected by a trained psychology postgraduate researchers. All participants provided informed consent and the Ethics Committee of the Second Xiangya Hospital of Central South University approved the study. There were no significant demographic differences between participants who did not complete the CES-D and those that did in two groups.

CES-d
The CES-D consists of 20 items, including 16 negative items ("I felt depressed", "I Felt lonely") and 4 positive items("I was happy", "I Enjoyed life"). The four positive affect items were inversely scored for calculating the total score. Items are structured on a 4-point from 0 (rarely; less than 1 day) to 3 (most or all of the time; 5-7 days). Higher scores on the CES-D indicate more depressive symptoms. The Chinese version of CES-D has been widely used in China and has been validated in previous Chinese studies [11,20,24].

Data analysis
Step 1: confirmatory factor analysis (CFA) The CFAs were analyzed with Mplus 7.11 software to examine the best fit factor model of the CES-D. Given that items have only four response categories, the robust weighted least squares with mean and variance adjustment (WLSMV) estimator was used [23,25,26]. Several models fit indices were used to evaluate the goodness of fit: the Tucker-Lewis Index (TLI), the comparative fit index (CFI), and the root mean-square error of approximation (RMSEA) [9,27]. According to the conventional guidelines, CFI, TLI ≥ .90 indicates acceptable model fit and ≥ .95 indicates adequate model fit, while RMSEA values ≤ .08 indicates acceptable model fit and ≤ .05 indicate good model fit [28,29].
Six alternative models of the CES-D, which were good fitted in previous studies, were chosen for comparison. Model A was the original four-factor model proposed by Radloff [8]. In this model, items loaded on four factors: depressed, somatic, interpersonal, and positive. The four-factor model has been shown to be the best fitting model across various Chinese factor analytic studies. Model B is a two-factor model which included depressed affect and positive affect [12]. All negative terms are combined into the depressed affect, and the remaining positive terms form the positive affect. Recently, this model also has been verified good fitted in Chinese population. Model C was completed after Kuo's study to analyze the factor structure of Chinese Americans and put forward a three-factor model (depressed affect, positive affect and interpersonal problems), and the results about Chinese American were superior to the threeactor model in Kuo's study [13] Model D is another three-factor model proposed by Wang et al. which including depressed affect, positive affect and somatic complaints factors [9]. It is shown to be best fitting in Chinese adolescents. Model E and Model F are five-factor model proposed by Kim for Asian population after meta-analysis by Exploratory factor analysis (EFA) and CFA separately [15]. EFA is a data-driven approach while CFA is a model-driven approach [15]. Therefore, we include two five-factor models (model E and Model F) derived from different analytical method. Model E contains one additional factor (alienation, AI) compared to original four-factor structure. Besides, in Model F, one additional factor representing sorrow/ grief appeared that was distinct from the original depression factor. Both alienation and sorrow/grief were factors unique to the Asian population in the meta-analysis (See Table 1).
Step 2: internal consistency reliability In the current study, Cronbach's alphas (α), mean inter-item correlations (MIC) and McDonald's Omega coefficient were used to evaluate internal consistency reliability. A Cronbach's α coefficient above 0.70 (> 0.60 in some cases) was considered acceptable. An optimal range of 0.10-0.40 was set for the MIC.
Step 3: measurement invariance After the most appropriate factor model was identified, Mplus 7.11 was used to analyze the model's measurement invariance across genders. The multi-group CFA (MGCFA) was used to test the invariance with nested models. MGCFA method typically considers four different levels of measurement invariance: configural, weak (metric), strong (scalar) and strict. Configural invariance to test whether the latent variables are in the same constituents or patterns across groups (Model 1). Weak invariance based on the configural invariance results to test the relationship between the measurement index and the factor load, that is whether factor loads are equal to the groups (Model 2). Strong invariance based on metric invariance results to test whether the variable intercepts are equal between different group (Model 3). Strict invariance based on scalar invariance results to test whether the error variance are equal to different groups (Model 4) [22]. Given that tests of the change in CFI are reported as being superior to chi-square difference tests of nested models, because they are not affected by the sample size [29,30], the current study compared nested models in consideration of CFI values. Thus, measurement invariance is considered established when two of following satisfied: the change of TLI < 0.01, the change of CFI < 0.01, the change of RMSEA < 0.015 [25].
Step 4: difference test T-tests were used to explore differences between males and females and between clinical and non-clinical sample on the total CES-D score and each factor score. P-values < 0.05 was considered significant.

CFA of the CES-D scale based on the hypothesized model
As illustrated in Table 2, Model B、Model D and Model E fitted the data well (CFIs > 0.90, TLIs > 0.90, RMSEAs < 0.08) in the clinical sample. Model E (fivefactor model) provided the best fit for the data (WLSMVχ 2 = 502.089, df = 160, CFI = 0.962, TLI = 0.955, RMSEA = 0.072) in the clinical sample. As can be seen in Table 3, Model E also fit the data well in the undergraduate sample (WLSMVχ 2 = 1662.38, df = 160, CFI = 0.973, TLI = 0.968, RMSEA = 0.055). For all items, the factor loadings were ≥ 0.40 and loaded significantly on the latent factors propose (p < 0.01; Table 4).

Internal consistency reliability
In both samples, the Cronbach's α values were > 0.8 for the whole scale and > 0.6 for each dimension ( Table 5).
All mean MICs were between 0.10 and 0.400 except PA subscale, IP subscale in undergraduates sample and DA subscale in clinical sample. The McDonald's Omega coefficients were > 0.9 for the whole scale and > 0.6 for each dimension (Table 5).

Measurement invariance across genders among undergraduates and clinical patients
As the five-factor model (model E) fitted the data best in undergraduate and clinical samples, we choose the fivefactor model to estimate the measurement invariance across gender.
In the undergraduates sample, the following goodness of fit indices were obtained from the configural invariance test: TLI = 0.934, CFI = 0.944, RMSEA (90% CI) = 0.043 (0.040, 0.045) (see Table 6). All indices met requirements of configural invariance. Thus, the configural invariance was established and the model was used as baseline model for the next analysis. To verify weather factor loads are equal across gender, the weak invariance was set based on the baseline model. All indices met requirements of weak invariance (see Table 6). In addition, the ΔCFI, ΔTLI, and △RMSEA (0.000, 0.002, and − 0.001, respectively) were all less than 0.01. On the basis of previous steps, the strong invariance was set. All requirements for the goodness of fit indices for the strong invariance test were met (see Table 6). In addition, ΔCFI, ΔTLI, and △RMSEA (− 0.006, − 0.003, and 0.001, respectively) were all less than 0.01. The strict invariance was set on the basis of the third step. All indices of the strict invariance test were less than 0.01 (ΔCFI = 0.000, ΔTLI = 0.003, and △RMSEA = − 0.001) and therefore, strict invariance was established in undergraduate sample (see Table 6).
In the clinical sample, in the configural invariance test, various parameters were allowed to be freely estimated, and the following fitting indicators are obtained in clinical sample: TLI = 0.948, CFI = 0.956, RMSEA (90% CI) = 0.079 (0.071,0.086). Fitting index met the requirements of the survey and the baseline model was established. Based on the baseline model, the changes of CFI, TLI and RMSEA(CFI < 0.010, TLI < 0.010, RMSEA < 0.015) supported weak、strong and strict invariance (see Table  6). Thus, the measurement invariance of the CES-D across gender among clinical sample was established.

Difference test
In the clinical patients, females scored significantly higher than males on the score of AI (t = − 2.956, p <   (Table 8).

Discussion
The current study aimed to explore the best factor structure and measurement invariance of the Chinese version of the CESD among undergraduates and clinical patients. The CFA was conducted, suggesting that fivefactor model was best suited in the two samples. Moreover, gender invariance was well established among undergraduates and clinical patients. To our knowledge, this was the first study to explore the measurement invariance of Chinese version of CESD across gender in clinical patients. Besides, The CES-D also showed acceptable internal consistency in the two samples.
The two-factor, three-factor, four-factor and fivefactor models of the CES-D proposed in previous studies were all tested by CFA in the present study. In the original psychometric testing of the CES-D scale, Radloff proposed a four-factor structure comprising DA (depressed affect), PA (positive affect), SC (somatic/vegetative complaints), and IP (interpersonal problems) [8]. The current results found that the five-factor (Model E: DA, PA, IP, SC, and AI) showed the best fit. This fivefactor model differs from the original four-factor model by changing the previous factor structure and proposing a new factoralienation (items 10, 14, and 17). Alienation is a condition in social relationships reflected by a low degree of integration or common values and a high degree of distance or isolation between individuals, or between an individual and a group of people in a community or work environment. This particularly factors may impairments in interpersonal relationships. Previous study found that higher scores for thinking that others were out to harm or exploit them (alienation), the more likely participants were to experience a co-occurring mood disorder. School maladjustment in relations with teacher and peers and in learning activities had indirect effects through alienation and depression on students' suicidal ideation [31].  Prior studies have shown that the original four-factor structure of the CES-D was not suitable for Asian population [15]. In addition, a recent study suggested that ethnic and cultural factors can lead to different CES-D factor structures [32]. The understanding of words or cultural differences may play an important role in different model structures. In addition, the population tested is may also play an important role. The current experiment included college student participants. The results may reflect the unique psychological characteristics, high level of education, and sensitivity of college students. This population is more likely to experience feelings of loneliness and alienation [32,33].
The reliability and validity of the CES-D scale for was previously studied. It was proposed that the three-factor structure was the most suitable model. However, the five-factor model was not included in the study. In the current study, which included both the three-and fivefactor model, the CFA on the five-factor structure showed the best fit. Accordingly, we concluded that the five-factor CES-D scale is an effective and reliable screening tool for depression.
Based on the five-factor model, we examined the gender invariance among among undergraduates and clinical patients. Our MGCFA confirmed good configural, weak, strong and strict invariance of the Chinese RRS-10 across gender in undergraduates sample. Configural equivalence is the precondition to test other equivalence. As the baseline model, the further equivalence test is the nested model produced by restricting the corresponding parameters on the basis of configural equivalence, only if the equivalence of the previous level is established, can the equivalence test of the next higher level be continued. In this study, the configural invariance of CESD was supported, so it can be used for the next step of equivalence test. Besides, the establishment of weak equivalence model shows that the CESD observation index and latent trait have the same meaning between men and women, that is to say, each item has the same unit between men and women. Moreover, the establishment of strong equivalence shows that the intercept of CESD is invariable between men and women, which means all CESD items have the same reference point in the two groups. Finally, strict equivalence is carried out on the basis of strong equivalence, and its establishment indicates that the measurement error variance is equivalent in different gender. Therefore, measurement invariance between males and females among undergraduates patients were achieved. In clinical samples, configural, weak, strong and strict invariance were also supported. Thus, the results of this study confirm that the Chinese CESD has strict equivalence, indicating that the scale is effective and interpretable between gender groups among undergraduates and clinical patients.
Since the CES-D has achieved measurement invariance across gender among undergraduates and clinical patients, this study further compared gender differences in CES-D and its subscale scores. The current study found that females scored significantly higher than males on the AI subscale in clinical patients. Besides, the current study also found that males scored significantly higher than females on the SC and IP subscale among college students. According to previous research, there are some sex differences in interpersonal problems [34]. For instance, boys are not good at talking, so girls are better than boys in speech skills [35]. About somatic complaints, Hyde found that the differences are demonstrated between boys and girls, boys are more physically and verbally aggressive than girls [36]. And they have higher domineering, controlling, independent behaviors and are more vindictive [35,37]. In 2000, a research from American Psychiatric Association, illustrated boys exhibit higher rates of antisocial, narcissistic, obsessive compulsive, paranoid, and aggression-related disorders than girls. In sum, according to these reasons, boys will have higher scores on SC and IP. College males experience more pressure than college females, in life pressure, personal ambition, studying, love, job hunting, earning money and interpersonal relationships [32,35]. In addition, male college students consume more alcohol than female college students, leading to greater depression [38]. Moreover, studies have shown that in traditional Chinese culture, men are the economic backbone of the family and thus experience greater economic pressure [7].
While the current study provides valuable data on the factor structure and measurement invariance of the CESD, it is not without limitations. First, all participants were from the Changsha University and as such, the results may not fully reflect depression in college students in China. Second, a cross-sectional design was employed with no long-term follow-up. Third, Han students only were included in the sample, and therefore the results may not apply to minority students. Lastly, we only considered measurement invariance across genders, and thus, measurement invariance across other factors such as ages and religions, remains unknown.

Conclusions
The CESD has good psychometric characteristics and measurement invariance across genders among clinical patients. The present study that the CESD may provide reliable and valid self-reported assessments of depression among Chinese undergraduates and clinical patients.