The validity and reliability of the Patient Health Questionnaire-9 for screening depression in primary health care patients in Botswana

Background The lack of locally validated screening instruments contributes to poor detection of depression in primary care. The Patient Health Questionnaire-9 (PHQ-9) is a brief and freely available screening tool which was developed for primary care settings; however, its accuracy may be affected by the population in which it is administered. This study aimed to determine the validity and reliability of PHQ-9 for screening depression in a primary care population in Botswana. Methods Data was collected from a conveniently selected sample of 257 adult primary care attendants. The Mini International Neuropsychiatric Interview (MINI) depression module was used as a gold standard to assess criterion validity. Results Sensitivity and specificity of the PHQ-9 for screening for major depression were 72.4 and 76.3 respectively at a cut off score of nine or more. The area under the ROC curve was 0.808. The PHQ-9 demonstrated good internal consistency with a Cronbach alpha of 0.799. Criterion validity was demonstrated by significant correlation (r = 0.528, p < 0.001) between PHQ-9 and the MINI. Significant negative correlation between PHQ-9 scores and all four domains of the WHO quality of life questionnaire- brief version scores demonstrated good convergent validity. Conclusions The PHQ-9 is a reliable and valid instrument to screen for depression in primary care facilities in Botswana. Primary care clinicians in Botswana may use the PHQ-9 to screen for depression with a cut –off score of nine. Further studies should focus on integrating routine depression screening in primary care.


Background
Depression is an important public health problem. It is a disabling chronic mental health problem commonly encountered in primary care. According to the World Health Organization (WHO) estimates, unipolar depressive disorders will become the leading cause of the global burden of disease by 2030 (World Federation for Mental Health (WFMH), 2012).
The prevalence of depression is markedly higher in people with other medical conditions such as diabetes mellitus and cardiovascular diseases, for example: 17 to 28% of patients with cardiovascular conditions and diabetes mellitus have been found to have depression [1,2]. The co-morbidity of chronic conditions and depressions is associated with worse outcomes [3,4]. Because most people with chronic diseases are diagnosed and managed in primary care, detecting depression at this early stage may improve outcomes. WHO mental health GAP Intervention Guide has shown that depression can be reliably diagnosed and treated in primary care [5]. Despite treatment for depression in primary care being feasible, affordable and cost-effective [6]; the condition is often underdiagnosed and undertreated in primary care settings, especially in low to middle income countries (LMIC) [7]. Factors such as: inadequate numbers of specialised mental health workers, stigma associated with mental illness, and lack of locally validated screening instruments contribute to poor rates of diagnosis and treatment of depression in primary care.
Screening tools, if used appropriately may help health care workers to accurately identify patients with depressive disorders and initiate appropriate management. There are several screening and diagnostic tools available for depression [8,9], however, not all of the instruments are suitable for use in primary care settings. The use of some depression screening and or diagnostic tools is limited by cost, time needed to administer the tools and in the case of self-administered tools, the patient's ability to accurately complete the tool.
The Patient Health Questionnaire-9 (PHQ9) was specifically developed for use in primary care settings [10], is available at no cost and has been validated in several settings to screen for depression [9][10][11][12]. As with many diagnostic tools the accuracy of the PHQ-9 may be affected by the population in which it is administered [13,14]. Although a study of the PHQ-9 amongst 4 different ethnic groups found that the interpretation of the PHQ-9 did not differ amongst the ethnic groups, all four groups were in America and could have similar understanding of depression based on a shared environment [15]. It is therefore important to determine the optimal cut-off point when screening for depression using the PHQ-9 in a specific population.
The aim of the present study was to evaluate the validity and reliability of an interviewer administered PHQ-9 for screening depression among primary care patients in Botswana. The MINI major depressive episode module was used as a reference standard. This is the first study to investigate the reliability and validity of a depression screening tool in Botswana.

Study design and site description
The study utilised a cross sectional design. The study was conducted at two primary care facilities in Gaborone, the capital and largest city in Botswana (population~200,000) [16]. The two facilities were purposively selected to make the sample heterogeneous. One facility is located in a low-income area and the other in a predominantly middle to high income area. The two facilities also have a psychiatric nurse on site, who were crucial in providing further management to study participants who were found to have depression.

Study population
The study targeted adult patients attending the selected clinics during the months of May and June 2019. Convenience sampling was used to select potential participants, selecting patients in the order they came to the clinic. To be included in the study, patients had to be 18 years or older and be able to comprehend and complete study components in Setswana or English. Patients with moderate to severe intellectual disability, experiencing acute psychotic symptoms and having any unstable medical condition were not included in the study.

Measures
Researcher designed socio demographic clinical questionnaire The researchers designed an instrument to capture social and demographic data such as gender and income. The instrument also screened for history of chronic medical conditions including hypertension, cancer, Human Immunodeficiency Virus (HIV), and family history of any mental illness.

Mini International Neuropsychiatric Interview (MINI)
The Mini International Neuropsychiatric Interview (MINI) is a short diagnostic structured interview developed in Europe and the United States to explore 17 psychiatric disorders [18,19]. It is fully structured to allow administration in about 15 to 20 min even by nonspecialized interviewers [20,21]. The MINI-plus demonstrates good sensitivity, specificity, validity and reliability in the assessment of psychiatric disorders [18,20,21]. The major depressive episode module was used in this study as a gold standard.

World Health Organization Quality of Life scale Brief Version (WHOQOL-BREF)
The instrument is used to assess cross cultural quality of life across four domains: Physical health, psychological health, social relationships and environmental health. It assesses the respondent's perceptions in the context of their standards, concerns and culture [22]. Respondents are requested to endorse perceived quality of life on a 5point likert scale ranging from 1 (not at all) to 5 (completely) with higher scores indicating a higher quality of life [23]. The WHOQOL-BREF has demonstrated good to excellent psychometric properties and is reliable with a Cronbach α of more than 0.7 for all domains [23].

Data collection procedures
Research assistants were trained by the authors to administer the English and Setswana versions of socio-demographic questionnaire, the PHQ-9 and the WHOQOL-BREF and the MINI major depressive episode module.
Participants were recruited conveniently by selecting consecutive patients from the consultation waiting room. Those interested to participate in the study were directed to a research assistant in a private room. A written informed consent was obtained before commencing with data collection. Each participant was sequentially interviewed by two research assistants in either Setswana or English according to the participant's preference. The first research assistant administered the socio-demographic questionnaire and the PHQ-9. Upon completion of the first interview each participant was then directed to a different room where the second research assistant (a Bachelor of Psychology graduate) who was blinded to the results of the PHQ-9 screening, then administered the MINI major depressive episode module and the WHOQOL-BREF.
Participants who were found to be suicidal or who screened positive for depression were referred to a psychiatric nurse stationed at the facility for further management.

Data analysis
Items means, standard deviations, frequencies and percentages were calculated for the socio-demographic variables. Independent samples t-tests was carried out to compare the mean PHQ-9 scores in the groups of depressed and not depressed patients according to MINI-Plus diagnosis.

Reliability
In order to investigate the reproducibility and consistency of PHQ-9, reliability coefficients as measured by Cronbach's alpha were calculated.

Concurrent and convergent validity
Using spearman's correlation analysis, the relationship between scores of PHQ-9 and MINI-plus diagnosis was investigated to determine the magnitude of the relationship between the two measures. Convergent and concurrent validity require that PHQ-9 should correlate with MINI-Plus diagnosis whilst it inversely correlates with quality of life as determined by the WHOQoL-BREF.

Factor structure
Before performing factor analysis, the correlation matrix was inspected to check for the strength of correlation and then factorability was tested using explanatory factor analysis using Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy and Bartlett's test of sphericity. We calculated KMO = 0.838 and Bartlett's test of sphericity (469.303, df = 36, p < 0.001) making factor analysis an appropriate method.

Sensitivity and specificity
For each PHQ-9 cutoff point, sensitivity or true positive rate (proportion of individuals with MDE according to MINI criteria that correctly identified by PHQ-9), specificity or true negative rate (proportion of individuals without MDE according to the gold standard correctly identified as such by PHQ-9), Positive predictive value +PPV (proportion of true positives among all positives identified by the PHQ-9) and Negative predictive value -NPV (proportion of true negatives among all those who will score negative by PHQ-9 was calculated. Positive likelihood ratio + LR (probability of an individual without the condition having a positive test) and Negative likelihood ratio -LR (probability of an individual without the condition having a negative test) 95% confidence intervals for each of these parameters was reported.
Youden's index was used as a criterion for choosing the "optimal" threshold value for the PHQ-9 test, the threshold value for which the value of [sensitivity + specificity − 1] is maximized.
Criterion validity was assessed by receiver operating characteristic (ROC) curve. The PHQ-9 point showing simultaneously the highest sensitivity and specificity was evaluated using the ROC curve. PHQ-9's accuracy was estimated by the area under the ROC curve. All analyses were performed using Stata® version 14.0 software.

Study sample characteristics
Majority of the participants in our study were female at 66.9% (n = 172) and the mean age of our participants was 34.3 years. The most common chronic diagnosis was HIV at 21.6% (n = 60) and alcohol was the most commonly used substance. Only 2.4% of the study participants had previously been diagnosed with a mental illness. Other characteristics are as depicted in Table 1.

Prevalence of depression
A total of 105 participants fulfilled the DSM-IV criteria for a major depressive episode, using the gold standard MINI depression module corresponding to a prevalence of 40.9% (95% C. I 35.0-47.1). The mean depression scores in the PHQ-9 were significantly higher among the depressed as compared to the non-depressed as shown in Fig. 1.
Reliability of PHQ-9 Table 2 presents the results of reliability. The PHQ-9 demonstrated good internal reliability with a Cronbach's alpha of 0.799. Table 3 shows sensitivity, specificity, positive and negative likelihood ratio, positive predictive value and negative predictive value for each of the PHQ-9 cutoff points compared to the gold standard interview (MINI). As expected,  The numbers do not add up to 257 because the participants answered more than one option sensitivity decreased progressively as the cutoff points increased, with a marked decrease between the ≥10 and ≥ 11 cutoff points (from 68.6 to 60.0%) while specificity between these two cutoff points increased from 79.6 to 81.6%. The standard cut off score of 10 or higher yielded a sensitivity of 68.6 and a specificity of 79.6. Both Youden's index and the cutoff point of maximum sensitivity and specificity according to the Receiver Operating Curve (ROC) (Fig. 2)  The PHQ-9 demonstrated very good accuracy with an area under the ROC (AUC) of 0.808 (95% C. I 0.755-0.854) as depicted in Fig. 2.

Concurrent and convergent validity
The PHQ-9 scores correlated with the results of MINI (r = 0.528, p < 0.001). As expected, higher PHQ-9 score correlated with lower quality of life in all domains of the WHOQoL-BREF. The correlation between PHQ-9 scores and WHOQoL-BREF scores on all the four domains were found to have a significant negative correlation in all the four domains of WHOQoL-BREF (Table 4).

Exploratory factor analysis
The Kaiser-Meyer-Olkin (KMO = 0.838). and Bartlett test of sphericity (469.308, df = 36, P < 0.001) justified a dimension reducing procedure such as the factor analysis. The measure of sampling adequacy was > 0.80, so the items could be considered suitable for factor analyses. The scree plot (Fig. 3) revealed one dominant dimension with a big decrease between first and second eigenvalues and small decreases afterward (eigenvalues: 3.4, 1.0, 0.8, 0.7, 0.66, 0.65, 0.6, 0.5 and 0.469). Factor loadings ranged from 0.54 to 0.82. The percentage of total variance explained by the first factor was 38.4% and the second factor explained 11.2% of the variance.
These two factors explained 49.6% of the variance as shown in the scree plot graph (Fig. 3). Factor 1 included questions 1, 2, 6, 7, 8, and 9 and factor two 3, 4, and 5 (Table 5). These factors did not reflect the multidimensional structure PHQ-9. However, component rotated space (Fig. 4) revealed only one factor and to judge the strength of the measurement dimension, the following cut off points were used for variance explained by the measure: > 40% is considered a strong measurement dimension, > 30% is considered a moderate measurement dimension, and > 20% is considered a minimal dimension. Factor 1 explained approximately 40% of the variance which is supportive of unidimensionality.

Discussion
Depression is one of the leading causes of disability [24] and the second leading cause of years of life lived with disability [25]. However, depression still remains underrecognized in primary care settings [26]. Identifying depression in primary care patients can be a challenge, especially in busy clinics. A short and valid screening tool can assist in the identification of patients with depression in different primary care settings. This study aimed to estimate the reliability and validity of PHQ-9 for the identification of depression in patients attending primary health care facilities in Botswana.
Using the gold standard MINI depression module, 40.9% (95% C. I 35.0-47.1) of the study population met the criteria for current Major Depressive Episode. This was higher than the prevalence of 38% in an earlier study of HIV-infected patients in Botswana [27] and the prevalence of 30.3% found among primary care patients in Malawi [28], but lower than 56% in a South African Primary care population [29]. The variance in the studies could be explained by the different depression diagnostic tools used in the studies. Additionally, factors such as  The internal consistency analysis of the PHQ-9 was assessed using the Cronbach's alpha coefficient, and it was found to be 0.799. This indicates that the PHQ-9 has acceptable predictive performance. Our findings were comparable or slightly lower to that seen in validation studies in other parts of sub-Saharan Africa whose Cronbach alpha values were found to be 0.76 in South Africa [11], 0.83 in Malawi [30] and 0.85 in Ethiopia [31].
When the PHQ-9 was compared with the goldstandard diagnosis of the MINI, it performed well showing reasonable accuracy in identifying cases of depression. The area under the ROC curve was found to be 0.808, suggesting good diagnostic ability of the PHQ-9 score. Kroenke at al., (2001) used a cut off, of 10, with a sensitivity of 88% and specificity of 88%. We found that, using the PHQ-9, depression was best identified by a cut off nine. At this cut off, the PHQ-9 had moderately high sensitivity (72.4%) and specificity (76.3%) in our study population. A validation study in a chronic disease outpatient clinic in Malawi also determined the optimal cut-of score to be nine, however the sensitivity was lower than in our study (64%) whilst the specificity was higher at 94% [30]. The performance of the PHQ-9 was similar to that from a range of other settings in sub-Saharan Africa [11,12] and indicate that the PHQ-9 has reasonable sensitivity and specificity to be used as a screening tool for depression in this setting. Although the sensitivity and specificity values found in this study are not as high as found by Kroenke et al., (2001) this could be because of the use of the MINI as a gold standard; a recent meta-analysis showed that the sensitivity of the PHQ-9 was lower (0.77 versus 0.88) when using the MINI as the gold standard as compared to semistructured interviews [32].
The construct validity of PHQ-9 was established by the factor analysis. All the items had factor loading in the range 0.54-0.82. Items related to low energy (0.82), feeling bad about self (0.72) and suicide (0.708) were most strongly related to the underlying construct meeting the diagnostic criteria for depression.
The PHQ-9 depression questionnaire appears to be a reliable and valid screening instrument for identifying depression in Botswana at a cut off score of nine or higher.
One of the strengths of this study is that it is the first to validate the PHQ-9 in Botswana. Another strength of

Limitations
A limitation of this study is that the participants were drawn from only two primary health care facilities in the capital of Gaborone which may not be representative of the wider population. The use of convenience sampling presents a sampling error which limits generazibility of the findings.

Conclusion
It is through primary health care services that mental health disorders can be observed and diagnosed. Thus, to improve health outcomes of patients in primary care, mental health disorders such as depression should be integrated into routine management. High burden of disease and negative impact on health outcomes, highlight this need for early diagnosis and management of depression. Despite being a middle-income country, poverty and high levels of income inequality persist and this has been found to predict depression in our population. Our study, similar to previous studies, demonstrates that the PHQ-9 is a reliable and valid instrument to screen for depression in primary care facilities in Botswana. The maximal specificity and sensitivity at a cut off score of 9 in our population compared to a cut off of 10 found in previous studies underscores the importance of validating the PHQ-9 in different socio-cultural populations. We recommend that, further studies should focus on integration of routine depression screening using the PHQ-9 in primary care.