The validity and reliability of the PHQ-9 on screening of depression in neurology: a cross sectional study

Background This study aimed to explore the validity and reliability of the Patient Health Questionnaire-9 (PHQ-9) on screening of depression among patients with neurological disorders, and to explore factors influencing such patients. Methods In this study, 277 subjects who were admitted to the department of neurology of our hospital due to different neurological disorders completed the PHQ-9 questionnaire. The Mini-International Neuropsychiatric Interview (MINI) and Hamilton Rating Scale for Depression (HAMD) were employed to evaluate the depressive symptoms of patients who completed the PHQ-9 questionnaire. The internal consistency, criterion validity, structural validity, and optimal cut-off values of PHQ-9 were evaluated, and the consistency assessment was conducted between the depression severity as assessed by PHQ-9, HAMD and MINI. Logistic regression analysis was used to calculate the risk factors of depression. Results The Cronbach’s α coefficient of the PHQ-9 was 0.839. The Pearson’s correlation coefficient among the 9 items of the PHQ-9 scale was 0.160 ~ 0.578 (P < 0.01), and the Pearson’s correlation coefficient between each item and the total score was at the range of 0.608 ~ 0.773. Taking the results of MINI as the gold standard, the area under the receiver operating characteristic (ROC) curve of the PHQ-9 results for all the subjects (n = 277) was 0.898 (95% confidence interval (CI): 0.859 ~ 0.937, P < 0.01). When the cut-off score was equal to 5, the values of sensitivity, specificity, and the Youden’s index were 91.2, 76.6%, and 0.678, respectively. Multivariate logistic regression analysis showed that the influence of unemployment on the occurrence of depression was statistically significant (P = 0.027, OR = 3.080, 95%CI: 1.133 ~ 8.374). Conclusions The application of PHQ-9 for screening of depression among Chinese patients with neurological disorders showed a good reliability and validity.


Introduction
Mental disorder, also called mental illness or psychiatric disorder, is a behavioral or mental pattern that causes significant distress or impairment of personal functioning [1]. Depression, as an important mental disorder, was ranked as the third cause of burden of disease worldwide in 2008 and may rank first by 2030 [2]. Depression, which often accompanies multiple diseases, imposes serious health and economic burdens to society [3,4]. It is highly prevalent among patients suffering from various chronic conditions [5]. There are multiple ways in which depression can be identified. As for mild forms of depression, it may recover without much clinical assistance or only need primary care. However, major depression, especially severe depression, requires advanced care and early identification [6]. Identifying cases with depression that require advanced care is not only a main challenge to primary care, but also for clinicians, especially for nonpsychiatric physicians.
Neurology and psychiatry are often closely related. There are several factors influencing the incidence of depression on patients with neurological disorders, and controversial results were reported. A previous study showed that epilepsy was an independent risk factor for depression [7]. Scholars found that severe motor function, dyskinesia, poor sleep quality, and cognitive impairment were independent predictors of depression in Parkinson's disease (PD) patients who were admitted to department of neurology [8,9]. However, depression in patients with neurological disorders and associated risk factors need further clinicians' and scholars' attention. Training non-psychiatric doctors to successfully identify patients with severe depression through the method of mental examination may resolve the mentioned challenge, while it is costly and time-consuming [10,11]. A large number of health care systems have employed screening tools, such as the Self-rating Depression Scale (SDS) [12], the Structured Clinical Interview for DSM-IV Axis I Disorders (SCID-I) [13], Composite International Diagnostic Interview (CIDI) [14], the Mini-International Neuropsychological Interview (MINI) [15], the Cornell Scale for Depression in Dementia (CSDD) [16], and the Hamilton Rating Scale for Depression (HAMD) [17,18] to evaluate severity of depressive symptoms. However, such tools are not optimal as they (1) tie up significant resources, such as trained professionals [14,15,17], (2) cannot be used for diagnosis but with many items needed to be evaluated [12], or (3) can only be used for diagnosis of specific patients [16].
Patient Health Questionnaire-9 (PHQ-9) was derived from the depression part in the Patient Health Questionnaire (PHQ) compiled by Spitzer et al. in 1999 [19]. PHQ-9 was recommended by the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5). Response options on the items range from 'not at all' (0-point) to 'nearly every day' (3-point). The scale can not only screen for depression, but also show the severity of depression [20]. Because of its convenient use and good reliability and validity, it has been widely used for depression screening in the internal medicine department of primary hospitals. The depression screening of the elderly, patients with epilepsy, and stroke patients also had good reliability and validity [21][22][23].
However, there still lies some uncertainties to be explored. Different studies have shown that the optimal cut-off value of PHQ-9 varies in different populations. The PHQ-9 maker used a cut-off value of 10, with the sensitivity 88% and the specificity 88% [20]. In 2012, a meta-analysis showed that the optimal cut-off value of PHQ-9 was 8-11 [24]. The best cut-off value of PHQ-9 for diagnosing depression still needs further discussion. The original researchers of PHQ-9 used 5, 10, 15, 20 as the demarcation values for mild depression, moderate depression, severe depression, and very severe depression [25]. If PHQ-9 is used in different populations, the screening cut-off value changes, then the corresponding evaluation of depression severity may also change, which has certain guiding significance for treatment. What's more, the PHQ-9 may lack some symptoms that are meaningful to the depressive patient, and the description of the symptoms is not clear enough [26]. For example, patients with depression will regard abnormal perception, depersonalization, isolation, loneliness, and physical sensations (such as tremor, fatigue, restlessness, nausea, inability to relax, etc.) as meaningful or strong feelings of their depression. However, these symptoms are not reflected in the PHQ-9. As a self-rated scale, the PHQ-9 still needs to be completed by doctor involved in assisting patients to reduce confusion and to express their inner feelings accurately. Regarding the use of the PHQ-9, there lies several problems, such as the cut-off value, the inconsistency of reliability and validity when use in neurology, and the language expression which may need to be adjusted. PHQ-9 needs to be further explored in patients with neurological disorders in order to improve its diagnostic value.
In the present study, general data of patients who were admitted to the Department of Neurology of an affiliated hospital of Peking University due to different neurological disorders were collected, and the PHQ-9 questionnaire was distributed among those patients. One trained psychiatrist used the Mini-International Neuropsychiatric Interview (MINI) to evaluate the depressive symptoms of patients who completed the PHQ-9 questionnaire. Two senior psychiatrist used the HAMD to assess the severity of depression. The internal consistency, criterion validity, structural validity, and optimal cut-off values of PHQ-9 were evaluated, and the consistency assessment was conducted between the depression severity as assessed by PHQ-9 and HAMD. We also explored factors (e.g., age, gender, medical insurance, course of disease, work conditions, etc.) influencing such patients and discussed their influences comprehensively.

The aim, design and setting of the study
We aimed to explore the validity and reliability of the Patient Health Questionnaire-9 (PHQ-9) in the neurology ward when screening depression. This is a crosssectional study. We hoped to screen all inpatients in neurology for depression and its severity using the PHQ-9. This study was approved by the Ethics Committee of Peking University Six Hospital (No.2009025).

Study subjects
From January 2016 to June 2016, patients with depression who suffered from neurological disorders were admitted to the Neurology Department of an affiliated hospital of Peking University (Beijing, China). Inclusion criteria were as follows: i) patients (age ≥ 18 years old) from the Neurology Department of an affiliated hospital of Peking University, ii) absence of a significant cognitive impairment (Mini-Mental Status Examination > 21) [27,28], and iii) patients who signed the written informed consent form prior to commencing the study. Exclusion criteria were as follows: i) patients with speech dysfunction and hearing impairment, who could not complete the questionnaire, or ii) patients who aged < 18 years old. A total of 300 questionnaires were distributed among eligible patients, and a total of 290 questionnaires were returned, accounting for 96.7%. Those patients received MINI and Hamilton Rating Scale for Depression (HAMD), and the total number of patients who completed all the survey was 277. A self-edited questionnaire was designed to collect patients' general data, including patients' name, gender, age, ethnicity, marriage status, work experience, treatment costs, course of disease, diagnostic method, etc. The flowchart of patients' selection is shown in Fig. 1.

Research tools PHQ-9
The PHQ-9 is a 9-question instrument given to patients in a primary care setting to screen the presence and severity of depression. This is a self-rating scale. The results of the PHQ-9 are used to make a depression diagnosis according to the DSM-IV (Diagnostic and Statistical Manual of Mental Disorders-Fourth Edition) criteria. Here, the PHQ-9 was formulated based on DSM-IV to understand how often patients have been bothered by symptoms of depression in the period of two weeks (0 point = never, 1 point = a few days, 2 point = more than half of the days, 3 point = almost every day). Each item was scored on a scale of 0-3, with a total score ranging from 0 to 27. Based on these scores, depressive symptoms could be divided into "none or minimum" (0-4),

MINI (Chinese version)
As a semi-fixed diagnostic tool developed by a number of Chinese scholars, the MINI is a short structured interview used to diagnose 16 axis I DSM-IV and ICD-10 (International Classifications of Diseases and Related Health Problems, Tenth Revision) disorders [29]. A previous research showed that the Chinese version of MINI had good reliability and validity, as well as high sensitivity and specificity for depressive disorders. The current study used the evaluation results of depression in the MINI as the "gold standard" to assess the validity of PHQ-9. We defined "1" = have depression and "0" = have no depression. This scale was completed through interviews. The depression diagnosis was made by 1 psychiatrist, deputy chief physician.

HAMD
The HAMD [18] is a 17-item instrument that was designed to measure frequency and intensity of depressive symptoms in individuals with major depressive disorders. HAMD possesses a good reliability and validity. It comprises of 17 items, and was previously grouped into 5 structural factors (i.e., anxiety/somatization, mental disorders, retardation symptoms, sleep disturbances, and weight loss) by Cleary and Guy [30]. The higher the score, the more severe the symptoms. The following ranges for the HAMD were recommended: no depression (0-7); mild depression (8)(9)(10)(11)(12)(13)(14)(15)(16); moderate depression (17-23); and severe depression (≥24). In our research, two physicians assessed the severity of the patients' depression using HAMD.

Statistical analysis
Sample size: Explanation: Z α/2 is the Z value of cumulative probability in normal distribution (Z 0.05 / 2 = 1.960); δ is the allowable error; α is the inspection level; P is sensitivity or specificity.
Taking the 10 points recommended by the original maker of PHQ-9 as the screening cut-off value, a study covering 6000 subjects reported that the sensitivity was 88% and the specificity was 88% [20]. Therefore, it was expected that the sensitivity and specificity of this test would be 88% both. We took 0.05 as the significance level α, 0.08 as the allowable error. According to the formula, the number of samples in the case group was 63 and that in the control group was 63, too. The incidence of depression in hospitalized patients in neurology department was 25.0-50% [9,31,32]. This experiment predicted that the prevalence rate was 25%. It was estimated that the number of PHQ-9 questionnaires issued at least should be 63 / 0.25 = 252. Taking into account the 10% loss to follow-up rate, we set the sample size as 300 cases. SPSS 22.0 statistical software (IBM, Armonk, NY, USA) was used to perform statistical analysis, and descriptive statistics were used for expressing general data and other related descriptions. The receiver operating characteristic (ROC) curve was employed to analyze the validity, sensitivity, specificity, positive predictive value, negative predictive value, and Youden's index of the PHQ-9, so as to find the best diagnostic cut-off score. Based on the cut-off points, consistency analysis between the severity of depression obtained by PHQ-9 and HAMD revealed a Kappa score. The linear regression analysis of PHQ-9 and HAMD was performed to obtain the PHQ-9 cut-off score for depressive symptoms with different diversities. The intraclass correlation coefficient (ICC) and Cronbach's alpha coefficient were used to assess internal consistency. The confirmatory factor analysis was employed to analyze the structural validity of the scale. We used logistic regression analysis to explore risk factors of depression.

Reliability
In order to investigate the reproducibility and consistency of PHQ-9, reliability coefficients as measured by Cronbach's alpha were calculated. The Cronbach's α coefficient for PHQ-9 was 0.839. When one of the items of PHQ-9 was deleted, the α coefficient was still between 0.806 ~ 0.839. The Pearson's correlation coefficient among the 9 items of the PHQ-9 scale was at the range of 0.160 ~ 0.578 (P < 0.01), and the Pearson's correlation coefficient between each item and the total score was within 0.608 ~ 0.773. The above-mentioned coefficients were statistically significant (P < 0.01) ( Table 2).

Construct validity
In this research, the eigenvalues of factor-1, factor-2, and factor-3 were 3.385, 1.248, and 1.050, with the corresponding the explanatory variances of 37.615, 13.868, and 11.661%, respectively. The cumulative interpretation variance of the three factors was 63.114%. For rotated component matrix of factor analysis, the coefficients of interest decline, fatigue, mental motor delay, difficulty  in paying attention, emotional depression, and factor-1 were 0.736, 0.717, 0.701, 0.694, and 0.563, respectively. The coefficients of suicide and self-injury, inferiority and factor-2 were 0.806 and 0.758, respectively. The matrix coefficients of sleep disorder, eating disorder and factor-3 were 0.828 and 0.781, respectively (Table 3).

Criterion validity
Criterion validity was assessed by ROC curve. The PHQ-9 score simultaneously showing the highest sensitivity and specificity was evaluated using the ROC curve. PHQ-9's accuracy was estimated by the area under the ROC curve (AUC). As shown in Fig. 2, the results of ROC curve analysis indicated that the AUC for PHQ-9 was 0.898 (95% confidence interval (CI): 0.859 ~ 0.937), which indicated that PHQ-9 possessed a good ability to identify depressive symptoms. When the cut-off scores were 3, 4, 5, 6, 7, 8, 9, and 10, the rates of sensitivity were 95. 6 (Table 4).

Cut-off scores of PHQ-9 for depression with standard of HAMD-17
The consistency analysis between PHQ-9 and HAMD showed a Kappa coefficient of 0.423. Using total score of HAMD as the independent variable, linear regression analysis of total score of HAMD and total score of PHQ-9 were performed (Fig. 3). Using the total score of HAMD as independent variable X and the total score of PHQ-9 as the dependent variable Y, the regression equation was Y = 0.719X -0.299. The t-test was conducted on regression coefficient of 0.719 (P < 0.01), and regression relation was observed between the total HAMD score and total PHQ-9 score. The coefficient of determination R 2 was equal to 0.701, and the regression model showed a good fit. Cut-off points of 7, 17, and 24 on HAMD scale represented mild, moderate, and severe symptom levels; the corresponding cut-off points on PHQ-9 scale were 5, 12, and 17, respectively.

Consistency analysis Consistency analysis of PHQ-9 and MINI
It was previously reported that in Chinese version of the PHQ-9, a threshold of 10 or more is an accurate, reliable, and valid measure for screening depressive symptoms. Thus, taking 10 as cut-off score of the PHQ-9, a consistency analysis of the results of PHQ-9 and the MINI was conducted, and the Kappa value was 0.529, P < 0.01. However, with taking 5 as the cut-off score of the PHQ-9, the consistency analysis with the MINI showed that the Kappa value was 0.558, P < 0.01.

Consistency analysis of PHQ-9 and HAMD assessment
Severe and extremely severe cases of depression, as rated by the PHQ-9, were unified as severe. We used the cutoff values of 5, 10, and 15 for mild, moderate, and severe depression, and the depression rating scores derived from PHQ-9 and from HAMD were evaluated for consistency. The consistency analysis between PHQ-9 and HAMD showed a Kappa coefficient of 0.423, P < 0.01. In this study, we used the cut-off scores 5, 12, 17, for mild, moderate, and severe depression derived from PHQ-9 as variables, Kappa = 0.465, P < 0.01.

Analysis of depression-associated factors
Univariate analysis of depressive patients with neurological disorders who were hospitalized in department of neurology was carried out by using the Chi-square test, and the results are summarized in Table 5. The effects of gender, age, marital status, ethnicity, work, expenses of hospitalization, course of disease, and major diseases in the depression and non-depression groups were not statistically significant. Multivariate logistic regression analysis showed that the influence of unemployment on the occurrence of depression was statistically significant (P = 0.027, odds ratio (OR) = 3.080, 95%CI: 1.133 ~ 8.374). As shown in Table 6, unemployed patients were at a high risk of depression compared with employed patients.

Discussion
Depression is a widespread mental disorder that can pose threat to thoughts, mood, and physical health [33]. Depression severity was classified into three levels, including mild, moderate, and severe. Individuals with depression not only often experience sadness, but also a lack of interest or enjoyment in activities, decreased energy, insomnia, weight changes, feelings of loss and worthlessness, and recurrent thoughts of death or suicide. The prevalence of depressive disorders was higher in neurology inpatients [34,35]. Our study found that a Chinese version of the MINI was used to assess the status of inpatients with neurological disorders admitted to the Department of Neurology of Peking University Third Hospital, and the results showed that the prevalence of depression was 24.5%, which was similar to outpatients in different clinical specialties, but significantly higher than outpatients in healthy controls [31]. This indicates that further attention should be paid to depression in non-psychiatric departments (e.g., department of neurology) of general hospitals.
PHQ-9, a universal community screening tool for depression, was herein used, and it was revealed that it had a good reliability and validity when it was applied to depressed patients with neurological disorders who were hospitalized at the department of neurology. It is noteworthy that the DSM-5 also recommends use of PHQ-9 as a tool for evaluating the severity of depression.
Studies conducted in China as well as overseas have consistently shown that PHQ-9 has an I-factor structure, i.e., affective factor; in other words, all items in PHQ-9 measure the same concept [36,37]. Other studies have reported that PHQ-9 has II-factor structure: cognitiveaffective factor and somatic factor [38]. In the current research, the structure validity of the PHQ-9 was analyzed by principal component analysis, and the results extracted three main factors contributing to a cumulative explained variance of 63.114%. The analysis of the three main factors was mainly related to low mood, lack of motivation and somatic symptoms. When the PHQ-9 was compared with the MINI, it outperformed with a reasonable accuracy in identifying cases of depression. The value of AUC was 0.898, suggesting a promising diagnostic ability of the PHQ-9. In a systematic review of PHQ-9, Kroenke et al. showed that the sensitivity was 77 -88% and the specificity was 88 -94% with 10 points as the cut-off value [39]. Importantly, the values of sensitivity obtained in this study was not as high as those reported by Kroenke et al. [39].There lies several reasons.
(1) It may be related to the different source of subjects.
(2) there may have another reason that when PHQ-9 is used to screen depression with patients in neurology, its sensitivity may be suboptimal and still needs further evaluation by related professionals. (3) What's more, it might be related to the use of the MINI as a gold standard. A recent meta-analysis showed that the sensitivity of the PHQ-9 was lower (0.77 versus 0.88) when using the MINI as the gold standard compared to semi-structured interviews [40]. In the present study, there was a strong correlation between the total scores of HAMD-17 and PHQ-9, which was consistent with previous findings [41,42]. These findings support the validity and feasibility of the use of PHQ-9 for assessing depression severity. In the current study, we used PHQ-9 scale scores of 5, 12, and 17 as cut-off scores to designate mild, moderate, and severe symptoms of depression, respectively. This is slightly different from the cut-off scores used by the original developers of the scale. They recommended cut-off scores of 5, 10, 15, and 20 to designate mild, moderate, moderately severe, and severe depression, which is also more easily remembered by clinicians.
Consistent with previous studies, the results of the present study revealed that the PHQ-9 has a high reliability evidenced by the Cronbach's α coefficient. The internal consistency of the PHQ-9 was assessed by using the Cronbach's alpha coefficient, and it was found to be 0.839. The correlation coefficients between the nine entries of the scale were 0.160 ~ 0.578 (P < 0.01), and the correlation coefficients between each entry and the total score of the scale were 0.608 ~ 0.773, all of which had a significant correlation relationship (P < 0.01). This indicates that the PHQ-9 has an acceptable predictive performance. Our findings were similar to those observed in validation studies whose Cronbach alpha values were found to be 0.8 in a study of Mexico [43], 0.74 in Australia [44], and 0.78 in Thailand [45]. The present study analyzed factors influencing depression in patients with neurological disorders who were hospitalized in department of neurology. We compared the factors of gender, age, marital status, ethnicity, work experience, hospitalization expenses, course of disease, number of patients, and major disorders in the depression and non-depression groups. The results of univariate analysis did not indicate any statistical significance. Previous studies reported that age and gender are significantly correlated with the occurrence of depression [46,47]. It was previously found that the scores of depressive symptoms in stroke patients who aged 25-54 to 55-64 years old were significantly higher than those in other age-based groups [48]. It should be noted that the results of the current study did not reveal any significant correlation between depression and age/sex. The influences of age and gender on the depression of patients with neurological disorders who admitted to the department of neurology need further discussion. The current study did not make a detailed classification and comparison of various domestic reimbursement methods. The multivariate logistic regression analysis showed that unemployed cases were at a higher risk of depression. Previous studies showed that depression is closely correlated to unemployment. Scholars [49] pointed out that nearly one fifth of long-term unemployed men were diagnosed with major depressive disorders. Unemployment may be a potential predictor of depression, weakening the work productivity, thereby increasing the risk of longterm unemployment [50][51][52].

Limitations
The application of PHQ-9 scale on such patients showed a good reliability and validity. However, the current study contains a number of limitations. First, the samples were only patients with neurological disorders from one general hospital. Second, due to the short period of hospitalization, no retesting of reliability was undertaken. Last but not least, this study did not analyze the effects of various neurological diseases on depression. Thus, further studies need to be carried out to confirm our findings and eliminate the above-mentioned deficiencies.

Conclusions
In summary, depressive disorders are more common among patients with neurological disorders. Since depression can bring many adverse prognoses to patients, even lead to suicide, early identification of depression needs the attention of non psychiatrists. Our study demonstrated good reliability and validity of the PHQ-9 by applying this questionnaire to screen depressed patients in a neurology department of general hospital. PHQ-9 is worth promoting and applying in the general hospital department of neurology.