The validity and reliability of the PHQ-9 in screening for post-stroke depression

Background Depression affects about 30% of stroke survivors within 5 years. Timely diagnosis and management of post-stroke depression facilitate motor recovery and improve independence. The original version of the Patient Health Questionnaire-9 (PHQ-9) is recognized as a good screening tool for post-stroke depression. However, no validation studies have been undertaken for the use of the Thai PHQ-9 in screening for depression among Thai stroke patients. Methods The objectives were to determine the criterion validity and reliability of the Thai PHQ-9 in screening for post-stroke depression by comparing its results with those of a psychiatric interview as the gold standard. First-ever stroke patients aged ≥45 years with a stroke duration 2 weeks–2 years were administered the Thai PHQ-9. The gold standard was a psychiatric interview leading to a DSM-5 diagnosis of depressive disorder and adjustment disorder with a depressed mood. The summed-scored-based diagnosis of depression with the PHQ-9 was obtained. Validity and reliability analyses, and a receiver operating characteristic curve analysis, were performed. Results In all, 115 stroke patients with a mean age of 64 years (SD: 10 years) were enrolled. The mean PHQ-9 score was 5.2 (SD: 4.8). Using the DSM-5 criteria, 11 patients (9.6%) were diagnosed with depressive disorder, 12 patients (10.5%) were diagnosed with adjustment disorder with a depressed mood. Both disorders were combined as a group of post-stroke depression. The Thai PHQ-9 had satisfactory internal consistency (Cronbach’s alpha: 0.78). The algorithm-based diagnosis of the Thai PHQ-9 had low sensitivity (0.52) but very high specificity (0.94) and positive likelihood ratio (9.6). Used as a summed-scored-based diagnosis, an optimal cut-off score of six revealed a sensitivity of 0.87, specificity of 0.75, positive predictive value of 0.46, negative predictive value of 0.95, and positive likelihood ratio of 3.5. The area under the curve was 0.87 (95% CI: 0.78–0.96). Conclusions The Thai PHQ-9 has acceptable psychometric properties for detecting a mixture of major depression and adjustment disorder in post-stroke patients, with a recommended cut-off score of ≥6 for a Thai population.

programs, resulting in less functional improvement [4,5]. After stroke patients are discharged, they tend to become physically inactive and socially isolated [6]. Depressed patients have fewer daily activities and a lower quality of life [7]. This may lead to more cognitive impairment [8] and increased mortality during the 2-5 years following the stroke [9].
It is difficult to make a diagnosis of depression after a stroke because the symptoms of depression can be confused with certain symptoms that are typical of stroke patients [10]. Screening for mood disorders after a stroke is recommended by many stroke and stroke-rehabilitation guidelines [11,12]. Given that the availability of psychiatrists is limited in Thailand, there is a need for a screening tool to assist primary care physicians and other specialists in assessing for depression. Extensively studied in the non-Thai population and post-stroke patients, the Patient Health Questionnaire-9 (PHQ-9) has been reported to be a good PSD screening tool and to have the highest sensitivity [13,14]. The PHQ-9 has also been translated into Thai (Thai PHQ-9) and validated in primary care patients [15]. The cut-off score of the Thai PHQ-9 for major depression in primary care patients is 9, which differs from the original version of the PHQ-9 [16]. As to PSD, Williams et al. [17] reported a cut-off score for the original version of 10 for the diagnosis of major depression, with a sensitivity of 91% and a specificity of 89%. However, the PHQ- 9 has not yet been validated for PSD among Thais. Because Thailand and western countries have different health care systems, cultures, attitudes, mindsets, and family support systems, this study investigated the validity and reliability of the Thai PHQ-9 in screening for depression after stroke among Thais.

Subjects and procedures
Ethics approval was obtained from the Medical Ethics Committee of the Human Research Protection Unit, Faculty of Medicine Siriraj Hospital. The patients were recruited November 2017-December 2018 from the Department of Rehabilitation Medicine, Faculty of Medicine Siriraj Hospital, a tertiary hospital in Thailand. All patients gave written consent to participate. They were informed that their emotional status would be assessed via a questionnaire and a psychiatric interview. The patient inclusion criteria were aged ≥45 years; having a first-time stroke, as per WHO criteria [18], and with a stroke duration 2 weeks-2 years; stable vital signs, neurological signs, and stroke symptoms, as confirmed by a neurologist; and the ability to communicate in Thai. Excluded were patients with a cognitive impairment score of < 24, as measured by the Thai Mental State Examination [19], or a previous diagnosis of dementia, a psychiatric disorder, or another neurological disease.
Demographic characteristics were gathered from interviews with the enrolled patients, and information related to their stroke (such as any comorbid illnesses, and the types of stroke diagnosed from imaging studies) were obtained from medical records. The Modified Rankin Scales were also obtained to determine the level of disability of the participants. The Thai PHQ-9 [15] was administered by one of the researchers (PD) at either the inpatient rehabilitation ward or the outpatient rehabilitation clinic, depending on a patient's visit. On the same day, a psychiatrist interviewed each patient in a private area and made a diagnosis according to the criteria detailed in the American Psychiatric Association's Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5). The researcher and the psychiatrist were blinded to each other's assessment.

Measures
Thai mental state Examination [19] The Thai Mental State Examination (TMSE) is the first neuropsychiatric test that was used to provide a standard mental status examination of Thais. The maximum TMSE score is 30 points. For the diagnosis of a normal, healthy, older Thai person, a TMSE cut-off score of 24 points is used.

Modified Rankin Scale
The Modified Rankin Scale (MRS), a clinician-reported measure of global disability, has been widely applied to evaluate stroke recovery [20,21]. It is an ordinal scale, with 7 categories ranging from zero (no symptoms) to six (death). The MRS assesses an individual's ability to ambulate and complete the activities of daily living. MRS scores > 3 are defined as severe disability [22].
Thai PHQ-9 [15] The PHQ-9 consists of 9 questions that are based on the 9 DSM-IV criteria for a major depressive disorder. The questionnaire explores the symptoms experienced by patients during the 2 immediately preceding weeks. The scores for each PHQ-9 item range from 0 (not at all), to 1 (several days), 2 (more than half of the days), and 3 (nearly every day). The PHQ-9 also provides a preliminary diagnosis of major depressive disorder using an algorithm-based diagnosis (≥ 5 items, including items 1 and/or 2, are rated ≥2), resulting in the total score for the questionnaire being 10 or higher. PHQ-9 can be used as a screening tool for the diagnosis of depression by using a summed-scored-based algorithm. The summed scores range from 0 to 27. Various cut-off scores allow for the determination of different degrees of depression. A study on the Thai PHQ-9 in the general Thai population reported that a summed score of 9 or greater signified a major depressive disorder, with a sensitivity of 0.84 and specificity of 0.77.

DSM-5
The DSM-5 criteria for depressive disorders and adjustment disorder were used as the reference standard [23]. A psychiatric interview was conducted for each patient. Three psychiatrists had a process of standardization whereby they discussed and agreed on the content of the interviews before they were conducted. Depressive disorders could be classified as a major depressive disorder, a persistent depressive disorder (dysthymia), a depressive disorder due to another medical condition, another specified depressive disorder, or as an unspecified depressive disorder. For adjustment disorder, the only adjustment disorder with depressed mood was selected as the symptoms of adjustment disorder with a depressed mood are similar to those of major depressive disorder [24].

Data analysis
PASW Statistics for Windows, version 18.0 (SPSS Inc., Chicago, Ill., USA) [25] and MedCalc for Windows, version 15.0 (MedCalc Software, Ostend, Belgium) [26] were used for the statistical analyses. The demographic data, MRS, and PHQ-9 scores were analyzed by descriptive statistics. The quantitative data (age) was analyzed by an independent-sample t-test, while the stroke durations and Thai PHQ-9 scores were analyzed with the Mann-Whitney U test. Gender, education levels, risk factors, stroke pathology, side of weakness, and MRS scale were analyzed by Chi-square tests.
The stroke patients were divided into normal and depression groups, based on their psychiatric diagnoses. The psychiatrist determined the types of depressive disorders and adjustment disorder by using the relevant DSM-5 criteria. The depression scores of the normal and depression groups were analyzed by the independent-sample t-test. All analyses were significant at a p-value of < 0.05. Internal consistency was analyzed by Cronbach's alpha. As a bivariate response, the psychiatric diagnosis of depression was used as the reference standard to calculate the sensitivities and specificities of all possible PHQ-9 cut-off scores. The positive and negative predictive values as well as the positive and negative likelihood ratios were calculated for each PHQ-9 cut-off score. Receiver-operator characteristic (ROC) analyses subsequently combined the instrument sensitivity and specificity into one measure (referred to as the area under the curve, or AUC) for all possible cut-off scores.

Results
In all, 190 stroke patients were approached for participation. Seventy-five of those were excluded: 21 had recurrent stroke, 17 had cognitive impairment, 17 had aphasia, 10 were < 45 years, and 10 had a stroke duration > 2 years. After applying the exclusion criteria, 115 stroke patients were enrolled. They comprised 63 males (54.8%) and 52 females (45.2%), with a mean age of 64 years (SD: 10 years; min, max: 45, 88). The majority had graduated primary school, followed by lower-secondary school and uppersecondary school. The comorbid illnesses found were, in descending order of frequency, hypertension, dyslipidemia, diabetes mellitus, and heart disease. The median duration of stroke was 59 days. The large majority of patients (81.7%) suffered from ischemic stroke, and left-side weakness was dominant (61%). Most patients (65.2%) were recruited from inpatient rehabilitation.
All patients were administered the PHQ-9 as the index test. The reference standard was the psychiatric interview conducted on the same day, with the resultant diagnosis based on the DSM-5 criteria. The psychiatrist who administered the interview was blinded to the corresponding score for the index test, and all interviews were conducted regardless of the index test scores. The mean Thai PHQ-9 score was 5.2 ± 4.8. According to the DSM-5 criteria, 11 patients (9.6%) were diagnosed with depressive disorder, 12 patients (10.5%) were diagnosed as adjustment disorder with a depressed mood and the rest of 92 patients (80%) were normal. In the depressive disorder group, eight (6.9%) were classified as having a major depressive disorder (MDD), two (1.7%) with an unspecified depressive disorder, and one (0.9%) with another specified depressive disorder. Although the number and quality of symptoms of adjustment disorder with a depressed mood are less than those of major depressive disorder, [24] this study combined the depressive disorder group with the adjustment disorder group and named it as a depression group for the analysis.
The demographic characteristics of the normal and depression groups revealed no statistically significant differences (Table 1). However, the MRS and the median PHQ-9 scores of the groups differed. MRS scores of 0-3 were defined as no-severe disability, while MRS scores > 3 were defined as severe disability; more stroke patients were disabled in the depression group (78%) than in the normal group (55.4%).

Reliability and item analysis
As presented in Table 2, the highest mean score of the nine PHQ-9 items was found for Item 3 ("trouble falling or staying asleep, or sleeping too much"). Item 9 ("thoughts that you would be better off dead or of hurting yourself") had the lowest score. As to the internal consistency of the PHQ-9, Cronbach's alpha was 0.78. All items, if deleted, would consistently decrease the total scale alpha. The least item-total correlation was for Item 5 ("poor appetite or overeating").

Validity analysis
A comparison was made of the performance of the Thai PHQ-9 against the diagnosis of depressive and adjustment disorders (based on the DSM-5 criteria for depressive disorders and adjustment disorder with a depressed mood as the standard). According to the DSM-5 criteria, 11 patients (9.6%) met the diagnosis of depressive disorder and 12 patients (10.5%) met the diagnosis of adjustment disorder with a depressed mood. These two disorders were combined as a depression group. The median Thai PHQ-9 score for the depression group was 10 (IQR 25, 75%: 7, 15) whereas the median score of the normal group was 4 (IQR 25, 75%: 0.5, 5.75). The differences in the median PHQ-9 scores of the 2 groups were statistically significant.

Discussion
This study was the first in Thailand to determine the validity of a depression screening questionnaire with stroke patients. The questionnaire investigated was the PHQ-9, one of the good screening tools for PSD [14]. The reference standard was a psychiatric interview based on the DSM-5 criteria for depressive disorders and adjustment disorders. In this study, adjustment disorder with a depressed mood was also diagnosed in a number of patients. Casey et al. reported that it was difficult to identify distinguishing features between adjustment disorder from the depressive episode [24]. After stroke, abrupt physical impairment leads to physical disability and this is an immense and ongoing stressor for a stroke survivor. Although physical rehabilitation can attenuate the impairment, the patients need to actively participate the rehabilitation program. Low mood, tearfulness, or feelings of hopelessness are predominant depressive symptoms in adjustment disorder with a depressed mood [27]. These symptoms would lessen motivation to achieve rehabilitation training goal. Therefore, the stroke patients with adjustment disorder pose a risk to have poor progression in rehabilitation training. In order to make use of the questionnaire screening for depression after stroke, both disorders were combined as a group of depression. The validity of the PHQ-9 in screening for  depression was good in terms of its discriminatory power (AUC: 0.87) relative to the gold-standard, DSM-5 criteria. In addition, its internal consistency was acceptable (Cronbach's alpha: 0.78). Depressive disorder was found in 11 patients (9.6%), which was less than the corresponding figures reported by other studies. A meta-analysis conducted by Hackett and Pickles [2] found that 31% of stroke patients developed depression or depressive symptoms in any setting and at any time up to 5 years following their stroke. Robinson [28] undertook a pooled analysis and reported mean incidences for major and minor depression of 19.3 and 18.5%, respectively, among hospitalized patients in acute care and rehabilitation hospitals. By comparison, the low incidence in the present study probably stemmed from having the criterion that only stroke patients aged ≥45 years would be included. Previous research has found that younger stroke survivors are more likely to become depressed than older survivors [29,30]. Nevertheless, the incidence established by the current study is in line with that of research by Fuentes et al., which recruited stroke patients of the same age group and found a low depression incidence of 9.9% [31]. In this study, adjustment disorder with a depressed mood were diagnosed in 12 patients (10.5%). The pooled prevalence of adjustment disorder after stroke across all settings was 6.9% [32] which was lower than the number found in this study. Most of the stroke patients (69.6%) recruited to the study had duration within 3 months and this could contribute to the higher prevalence of adjustment disorder.
Moving on to the demographic characteristics of stroke patients with and without PSD, our study revealed no significant differences in the demographic-related variables of the groups. In the case of the disability-related variable, the MRS was used to determine the level of disability after stroke. The patients with an MRS score > 3, who were classified as having a severe disability, appeared more frequently in the depression group. PSD has been found to be associated with more severe neurological deficits and physical disabilities in the acute and chronic phases [33,34].
The internal consistency of the Thai PHQ-9 administered to the stroke patients in this study was 0.78, which is considered acceptable. However, the level of internal consistency we found differed from that of the original version of the PHQ-9. The original studies-performed in primary care and in obstetrics and gynecology settings-showed an internal consistency of 0.89 and 0.86, respectively [16]. In addition, Turner et al., who utilized  PHQ-9 to screen for PSD, found an internal consistency of 0.82 [13]. In the case of the Thai version of the PHQ-9, a validity study on the Thai population reported an internal consistency of 0.79 [15]. Later, Lee and Dajpratham, who employed the Thai version on elderly Thais, reported an internal consistency of 0.76 [35]. In the current research, the internal consistency was 0.78, which means that it is highly congruent with those two earlier studies using the Thai version of the PHQ-9. The Thai PHQ-9 can be used as a screening tool since the AUC showed a good level of discriminatory power (AUC: 0.87). The results of our study are in line with several other investigations that have reported a good discriminatory power for the PHQ-9, with an AUC of > 0.8 [13,17,[36][37][38]. As to its validity, the PHQ-9 score can be used in 2 ways to diagnose depression. The first is an algorithm-based diagnosis for major depression, with a cut-off score of 10. In 2015, Manea et al. [39] conducted a diagnosis meta-analysis of the PHQ-9 algorithm-based scoring method as a screening tool for depression. They found that although the sensitivity was as low as 53% (95% CI: 42-65), the specificity was as high as 94% (95% CI: 91-96). Our study applied the algorithm-based diagnosis for PSD in a tertiary-hospital setting. Our evaluation of the diagnostic accuracy revealed low sensitivity and high specificity (Table 2), consistent with the results of the work by Manea et al. [39] Low sensitivity is not a good property of a screening tool. Therefore, all of the previous PHQ-9 validation studies for the detection of PSD have used the alternative diagnostic approach, summed-scoredbased diagnosis, for their comparisons with various structured interviews as their reference standard [13,17,36,38,40]. Pettersson et al. [41] performed a systematic review to explore the diagnostic accuracy of the structured interviews as index tests. The only structured interviews which were found to have sufficient accuracy for the diagnosis of depression disorders were the Structured Clinical Interview for DSM-IV (SCID) and the Mini International Neuropsychiatric Interview (MINI). The summed-scoredbased PHQ-9 diagnoses in the current research were validated against the psychiatric interviews that were based on DSM-5 criteria. Our analysis revealed an optimum cutoff score of 6 for the diagnosis of depression. This finding differed from those of other studies [13,17]. Turner et al. [13] validated the PHQ-9 for the detection of PSD against the DSM-IV criteria; they reported a summed score greater than 8 as the cut-off score for diagnosis. Similarly, Williams et al. [17] reported a summed score of 10 or greater as the cut-off score for diagnosis.
The suggestion for further study is to expand the sample size so the prevalence of depressive disorder after stroke would be enough to provide the specific entity for the depression screening tool. There were some limitations to this study. Firstly, the high mean age of the participants, 64 years, meant that the findings cannot be generalized to younger stroke patients. However, the incidence of stroke at a younger age is lower and only represents a small proportion in clinical practice. Secondly, only participants who could communicate were recruited. Stroke patients who are unable to communicate would probably be very depressed. Moreover, the mood assessment scale for patients who cannot communicate is different. Thirdly, this study did not include other psychiatric disorders after stroke. Finally, this study did not perform test-retest reliability; consequently, the temporal stability of the measure for Thai people with a stroke is presently unknown.

Conclusions
The Thai version of the PHQ-9 had acceptable properties for detecting a mixture of major depression and adjustment disorder in post-stroke patients. The summedscored-based depression diagnosis should therefore be employed for screening, with a cut-off score of 6 signifying PSD.