The Validity and Reliability of the Patient Health Questionnaire-9 in Screening for Post-Stroke Depression

Background: Depression affects about 30% of stroke survivors within five years. Timely diagnosis and management of post-stroke depression facilitate motor recovery and improve independence. The original version of the Patient Health Questionnaire-9 (PHQ-9) is recognized as a good screening tool for post-stroke depression. However, no validation studies have been undertaken for the use of the Thai PHQ-9 in screening for depression among Thai stroke patients. Methods: The objectives were to determine the criterion validity and reliability of the Thai PHQ-9 in screening for post-stroke depression by comparing its results with those of a psychiatric interview as the gold standard. First-ever stroke patients aged ≥ 45 years with a stroke duration 2 weeks–2 years were administered the Thai PHQ-9. The gold standard was a psychiatric interview leading to a DSM-5 diagnosis of depressive disorder. The summed-scored-based diagnosis of depressive disorder with the PHQ-9 was obtained. Validity and reliability analyses, and a receiver operating characteristic curve analysis, were performed. Results: In all, 115 stroke patients with a mean age of 64 years (SD: 10 years) were enrolled. The mean PHQ-9 score was 5.2 (SD: 4.8). Using the DSM-5 criteria, 23 patients (20%) were diagnosed with depressive disorder. The Thai PHQ-9 had satisfactory internal consistency (Cronbach’s alpha: 0.78). The algorithm-based diagnosis of the Thai PHQ-9 had low sensitivity (0.52) but very high specificity (0.94) and positive likelihood ratio (9.6). Used as a summed-scored-based diagnosis, an optimal cut-off score of six revealed a sensitivity of 0.87, specificity of 0.75, positive predictive value of 0.46, negative predictive value of 0.95, and positive likelihood ratio of 3.5. The area under the curve was 0.87 (95% CI: 0.78–0.96). Conclusions: The Thai PHQ-9 has acceptable psychometric properties for screening for post-stroke depression, with a recommended cut-off score of ≥ 6 for a Thai population.


Background
Depression is the most common psychological problem experienced by survivors of a stroke. 1 The pool frequency is 31% of stroke survivors at any time up to five years after their stroke. 2 However, a review of prospective longitudinal research 3 showed that there is a biphasic pattern in post-stroke depression rates. The depressive symptoms gradually rise in the first 6 months, ease slightly at around 12 months, and worsen again during the second year after the stroke. Post-stroke depression (PSD) is associated with a longer length of hospital stay and decreased participation in rehabilitation programs, resulting in less functional improvement. 4,5 After stroke patients are discharged, they tend to become physically inactive and socially isolated. 6 Depressed patients have fewer daily activities and a lower quality of life. 7 This may lead to more cognitive impairment 8 and increased mortality during the 2-5 years following the stroke. 9 It is difficult to make a diagnosis of depression after a stroke because the symptoms of depression can be confused with certain symptoms that are typical of stroke patients. 10 Screening for mood disorders after a stroke is recommended by many stroke and strokerehabilitation guidelines. 11,12 Given that the availability of psychiatrists is limited in Thailand, there is a need for a screening tool to assist primary care physicians and other specialists in assessing for depression. Extensively studied in the non-Thai population and post-stroke patients, the Patient Health Questionnaire-9 (PHQ-9) has been reported to be a good PSD screening tool and to have the highest sensitivity. 13,14 The PHQ-9 has also been translated into Thai (Thai PHQ-9) and validated in primary care patients. 15 The cut-off score of the Thai PHQ-9 for major depression in primary care patients is 9, which differs from the original version of the PHQ-9. 16 As to PSD, Williams et al. 17 reported a cut-off score for the original version of 10 for the diagnosis of major depression, with a sensitivity of 91% and a specificity of 89%. However, the PHQ- 9 has not yet been validated for PSD among Thais. Because Thailand and western countries have different health care systems, cultures, attitudes, mindsets, and family support systems, this study investigated the validity and reliability of the Thai PHQ-9 in screening for depressive disorder after stroke among Thais. Demographic characteristics were gathered from interviews with the enrolled patients, and information related to their stroke (such as any comorbid illnesses, and the types of stroke diagnosed from imaging studies) were obtained from medical records. The Modified Rankin Scales were also obtained to determine the level of disability of the participants.

Subjects and procedures
The Thai PHQ-9 15 was administered by one of the researchers (PD) at either the inpatient rehabilitation ward or the outpatient rehabilitation clinic, depending on a patient's visit.
On the same day, a psychiatrist interviewed each patient in a private area and made a diagnosis according to the criteria detailed in the American Psychiatric Association's Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5). The researcher and the psychiatrist were blinded to each other's assessment.

Measures
Thai Mental State Examination 19 The Thai Mental State Examination (TMSE) is the first neuropsychiatric test that was used to provide a standard mental status examination of Thais. The maximum TMSE score is 30 points. For the diagnosis of a normal, healthy, older Thai person, a TMSE cut-off score of 24 points is used.

Modified Rankin Scale
The Modified Rankin Scale (MRS), a clinician-reported measure of global disability, has been widely applied to evaluate stroke recovery. 20,21 It is an ordinal scale, with 7 categories ranging from zero (no symptoms) to six (death). The MRS assesses an individual's ability to ambulate and complete the activities of daily living. MRS scores > 3 are defined as severe disability. 22 Thai PHQ-9 15 The PHQ-9 consists of 9 questions that are based on the 9 DSM-IV criteria for a major depressive disorder. The questionnaire explores the symptoms experienced by patients during the 2 immediately preceding weeks. The scores for each PHQ-9 item range from 0 (not at all), to 1 (several days), 2 (more than half of the days), and 3 (nearly every day).
The PHQ-9 also provides a preliminary diagnosis of major depressive disorder using an algorithm-based diagnosis (≥ 5 items, including Items 1 and/or 2, are rated ≥ 2), resulting in the total score for the questionnaire being 10 or higher. PHQ-9 can be used as a screening tool for the diagnosis of depression by using a summed-scored-based algorithm.
The summed scores range from 0 to 27. Various cut-off scores allow for the determination of different degrees of depression. A study on the Thai PHQ-9 in the general Thai population reported that a summed score of 9 or greater signified a major depressive disorder, with a sensitivity of 0.84 and specificity of 0.77.

DSM-5
The DSM-5 criteria for depressive disorders were used as the reference standard. 23 A psychiatric interview was conducted for each patient. Three psychiatrists had a process of standardization whereby they discussed and agreed on the content of the interviews before they were conducted. Depressive disorders could be classified as a major depressive disorder, a persistent depressive disorder (dysthymia), a depressive disorder due to another medical condition, another specified depressive disorder, or as an unspecified depressive disorder.

Data analysis
PASW Statistics for Windows, version 18.0 (SPSS Inc., Chicago, Ill., USA) 24 and MedCalc for Windows, version 15.0 (MedCalc Software, Ostend, Belgium) 25 were used for the statistical analyses. The demographic data, MRS, and PHQ-9 scores were analyzed by descriptive statistics. The quantitative data (age) was analyzed by an independent-sample t-test, while the stroke durations and Thai PHQ-9 scores were analyzed with the Mann-Whitney U test. Gender, education levels, risk factors, stroke pathology, side of weakness, and MRS scale were analyzed by Chi-square tests.
The stroke patients were divided into normal and depression groups, based on their psychiatric diagnoses. The psychiatrist determined the types of depressive disorders by using the relevant DSM-5 criteria. The depression scores of the normal and depression groups were analyzed by the independent-sample t-test. All analyses were significant at a p-value of < 0.05. Internal consistency was analyzed by Cronbach's alpha. As a bivariate response, the psychiatric diagnosis of depression was used as the reference standard to calculate the sensitivities and specificities of all possible PHQ-9 cut-off scores. The positive and negative predictive values as well as the positive and negative likelihood ratios were calculated for each PHQ-9 cut-off score. Receiver-operator characteristic (ROC) analyses subsequently combined the instrument sensitivity and specificity into one measure (referred to as the area under the curve, or AUC) for all possible cut-off scores.

Results
In all, 190 stroke patients were approached for participation. Seventy-five of those were excluded: 21 had recurrent stroke, 17 had cognitive impairment, 17 had aphasia, 10 were < 45 years, and 10 had a stroke duration > 2 years (Fig. 1). After applying the exclusion criteria, 115 stroke patients were enrolled. They comprised 63 males (54.8%) and 52 females (45.2%), with a mean age of 64 years (SD: 10 years; min, max: 45, 88). The majority had graduated primary school, followed by lower-secondary school and uppersecondary school. The comorbid illnesses found were, in descending order of frequency, hypertension, dyslipidemia, diabetes mellitus, and heart disease. The median duration of stroke was 59 days. The large majority of patients (81.7%) suffered from ischemic stroke, and left-side weakness was dominant (61%). Most patients (65.2%) were recruited from inpatient rehabilitation.
All patients were administered the PHQ-9 as the index test. The reference standard was the psychiatric interview conducted on the same day, with the resultant diagnosis based on the DSM-5 criteria. The psychiatrist who administered the interview was blinded to the corresponding score for the index test, and all interviews were conducted regardless of the index test scores. The mean Thai PHQ-9 score was 5.2 ± 4.8. According to the DSM-5 criteria, 23 patients (20%) were diagnosed with PSD, whereas 92 patients (80%) were normal. In the PSD group, eight (6.9%) were classified as having a major depressive disorder, two (1.7%) with an unspecified depressive disorder, and one (0.9%) with another specified depressive disorder. The remaining 12 patients (10.5%) were diagnosed as having an adjustment disorder with a depressed mood.  The demographic characteristics of the normal and depression groups revealed no statistically significant differences (Table 1). However, the MRS and the median PHQ-9 scores of the groups differed. MRS scores of 0-3 were defined as no-severe disability, while MRS scores > 3 were defined as severe disability; more stroke patients were disabled in the depression group (78%) than in the normal group (55.4%).

Reliability and item analysis
As presented in Table 2, the highest mean score of the nine PHQ-9 items was found for Item 3 ("trouble falling or staying asleep, or sleeping too much"). Item 9 ("thoughts that you would be better off dead or of hurting yourself") had the lowest score. As to the internal consistency of the PHQ-9, Cronbach's alpha was 0.78. All items, if deleted, would consistently decrease the total scale alpha. The least item-total correlation was for Item 5 ("poor appetite or overeating").

Validity analysis
A comparison was made of the performance of the Thai PHQ-9 against the diagnosis of depressive disorder (based on the DSM-5 criteria for depressive disorders as the standard). According to the DSM-5 criteria, 23 patients (20%) met the diagnosis of PSD.
The median Thai PHQ-9 score for the depression group was 10 (IQR 25%, 75%: 7,15) whereas the median score of the normal group was 4 (IQR 25%, 75%: 0.5, 5.75). The differences in the median PHQ-9 scores of the 2 groups were statistically significant. When using the algorithm-based diagnosis, an assessment of the validity of the Thai PHQ-9 index test revealed a sensitivity of 34.8%, specificity of 97.8%, positive predictive value of 80%, negative predictive value of 85.7%, and positive likelihood ratio of 16.0 ( Table 3).
As to using the summed-scored-based diagnosis, the corresponding values for different PHQ-9 thresholds in diagnosing PSD are detailed in Table 2 Figure 2). The AUC in our study was 0.87 (95% CI: 0.78, 0.96), which represents good discrimination.

Discussion
This study was the first in Thailand to determine the validity of a depression screening questionnaire with stroke patients. The questionnaire investigated was the PHQ-9, one of the good screening tools for PSD. 14 The reference standard was a psychiatric interview based on the DSM-5 criteria for depressive disorders. In this study, the validity of the PHQ-9 in screening PSD was good in terms of its discriminatory power (AUC: 0.87) relative to the gold-standard, DSM-5 criteria. In addition, its internal consistency was acceptable (Cronbach's alpha: 0.78).
Twelve patients were diagnosed with an adjustment disorder with a depressed mood. In clinical practice, such stroke patients are usually administered antidepressant medications to assist them in adjusting to their physical disability. Although adjustment disorders fall under a different entity to depressive disorders, this study included the cases of adjustment disorder with depressed mood in the PSD group. PSD was found in 23 patients (20%), which was less than the corresponding figures reported by other studies. A metaanalysis conducted by Hackett and Pickles 26 found that 31% of stroke patients developed depression or depressive symptoms in any setting and at any time up to 5 years following their stroke. Robinson 27 undertook a pooled analysis and reported mean incidences for major and minor depression of 19.3% and 18.5%, respectively, among hospitalized patients in acute care and rehabilitation hospitals. By comparison, the low incidence in the present study probably stemmed from having the criterion that only stroke patients aged ≥ 45 years would be included. Previous research has found that younger stroke survivors are more likely to become depressed than older survivors. 28,29 Nevertheless, the incidence established by the current study is in line with that of research by Fuentes et al., which recruited stroke patients of the same age group and found a low depression incidence of 9.9%. 30 Moving on to the demographic characteristics of stroke patients with and without PSD, our study revealed no significant differences in the demographic-related variables of the groups. In the case of the disability-related variable, the MRS was used to determine the level of disability after stroke. The patients with an MRS score > 3, who were classified as having a severe disability, appeared more frequently in the depression group. PSD has been found to be associated with more severe neurological deficits and physical disabilities in the acute and chronic phases. 31,32 The internal consistency of the Thai PHQ-9 administered to the stroke patients in this study was 0.78, which is considered acceptable. However, the level of internal consistency we found differed from that of the original version of the PHQ-9. The original studiesperformed in primary care and in obstetrics and gynecology settings-showed an internal consistency of 0.89 and 0.86, respectively. 16 In addition, Turner et al., who utilized PHQ-9 to screen for PSD, found an internal consistency of 0.82. 13 In the case of the Thai version of the PHQ-9, a validity study on the Thai population reported an internal consistency of 0.79. 15 Later, Lee and Dajpratham, who employed the Thai version on elderly Thais, reported an internal consistency of 0.76. 33 In the current research, the internal consistency was 0.78, which means that it is highly congruent with those two earlier studies using the Thai version of the PHQ-9.
The Thai PHQ-9 can be used as a screening tool since the AUC showed a good level of discriminatory power (AUC: 0.87). The results of our study are in line with several other investigations that have reported a good discriminatory power for the PHQ-9, with an AUC of > 0.8. 13,17,[34][35][36] As to its validity, the PHQ-9 score can be used in 2 ways to diagnose depression. The first is an algorithm-based diagnosis for major depression, with a cut-off score of 10. In 2015, Manea et al. 37 conducted a diagnosis meta-analysis of the PHQ-9 algorithm-based scoring method as a screening tool for depression. They found that although the sensitivity was as low as 53% (95% CI: 42-65), the specificity was as high as 94% (95% CI: 91-96). Our study applied the algorithm-based diagnosis for PSD in a tertiary-hospital setting. Our evaluation of the diagnostic accuracy revealed low sensitivity and high specificity ( Table 2), consistent with the results of the work by Manea et al. 37 Low sensitivity is not a good property of a screening tool. Therefore, all of the previous PHQ-9 validation studies for the detection of PSD have used the alternative diagnostic approach, summed-scored-based diagnosis, for their comparisons with various structured interviews as their reference standard. 13,17,34,36,38 Pettersson et al. 39 performed a systematic review to explore the diagnostic accuracy of the structured interviews as index tests. The only structured interviews which were found to have sufficient accuracy for the diagnosis of depression disorders were the Structured Clinical Interview for DSM-IV (SCID) and the Mini International Neuropsychiatric Interview (MINI).
The summed-scored-based PHQ-9 diagnoses in the current research were validated against the psychiatric interviews that were based on DSM-5 criteria. Our analysis revealed an optimum cut-off score of 6 for the diagnosis of depression. This finding differed from those of other studies. 13,17 Turner et al. 13 validated the PHQ-9 for the detection of PSD against the DSM-IV criteria; they reported a summed score greater than 8 as the cut-off score for diagnosis. Similarly, Williams et al. 17 reported a summed score of However, the incidence of stroke at a younger age is lower and only represents a small proportion in clinical practice. Another limitation is that only participants who could communicate were recruited. Stroke patients who are unable to communicate would probably be very depressed. Moreover, the mood assessment scale for patients who cannot communicate is different. Finally, this study did not perform test-retest reliability; consequently, the temporal stability of the measure for Thai people with a stroke is presently unknown.

Conclusion
The Thai version of the PHQ-9 had good validity and acceptable reliability for the screening of PSD. The summed-scored-based depression diagnosis should therefore be employed for screening, with a cut-off score of 6 signifying PSD. Figure 1 flow diagram of the study