Depression is one of the leading causes of disability worldwide and contributes to a decreased functioning and diminished quality of life across a wide range of educational and socioeconomic levels [1]. Because the diagnosis of depression is based on the degree of depressive symptoms, there has been great interest in understanding the distribution of depressive symptoms in the general population [2]. To date, numerous population studies on depressive symptoms have been conducted using a variety of depression screening scales, such as the Center for Epidemiologic Studies Depression Scale (CES-D), the 6-item Kessler Screening Scale for Psychological Distress (K6), and Patient Health Questionnaire-9 (PHQ-9).
The CES-D is a forerunner of self-reported questionnaires for depressive symptoms in the general population and now serves as a screening tool in primary care and research settings [3]. The 20 items of the CES-D are grouped into the following two groups: 16 depressive symptoms and four positive affects (good, hopeful, happy, and enjoyed). The K6 was developed using item response theory and is used to measure the severity of psychological distress [4]. Although the K6 is a broad measure of psychological distress (depression, nervousness, restlessness, fatigue, worthlessness, and hopelessness), and it is widely used as a screening tool for major depression and anxiety disorders [5]. Finally, the PHQ-9 is one of the most widely used instruments for screening of clinical depression [6, 7]. The PHQ-9 reflects the nine criteria for major depression in the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5) [8]. There are overlapping items among the CES-D, K6, and PHQ-9; they include depressive symptom items such as “depressed,” “worthlessness” and “fatigue (tired)” in common.
Because the interpretations of a depression screening scale are clinically important, the optimal cut-off point for detecting major depressive disorder and the average scores in various populations have been investigated by many researchers [9, 10]. However, little attention has been paid to the distributional patterns of the item responses and total scores on the depression screening scales. The mathematical patterns of item responses and total scores are important for some reasons. If the mathematical patterns of item responses and total scores are established, they will be useful for predicting the frequency of individuals with a certain score in a population. Moreover, the mathematical patterns determine which statistical procedures to apply. If the empirical distributions of item responses and total scores follow a non-normal distribution, the statistical model of normal variables (e.g., parametric statistics) will require reconsideration [11]. Finally, if the distributions of item responses and total scores follow a specific mathematical pattern, this may hint the mechanism of depressive symptoms.
Recently, analyzing the CES-D data from nearly 32,000 respondents of a Japanese national representative survey, we first reported that all item responses followed a similar pattern among the 16 depressive symptom items (Fig. 1) [12, 13]. The CES-D allows individuals to self-rate the amount of time of each depressive symptom during the past week, from “rarely,” “a little of the time,” “occasionally,” and “all of the time” [3]. As shown in Fig. 1, the lines for the 16 items cross at a single point between “rarely” and “a little of the time,” after which they decrease regularly (Fig. 1a). Using a log-normal scale, the lines of the item responses show a parallel decreasing pattern from “a little of the time” to “all of the time” (Fig. 1b) [12]. These item response patterns have been confirmed in an analysis of the CES-D data from 8000 Japanese employees [14] and the 6-item Kessler Screening Scale for Psychological Distress (K6) data from the National Survey of Midlife Development in the United States (MIDUS) [15].
In general, the response options of depression screening scales start with a negative adverb option at the lower end (e.g., never, rarely, and none), and continue with the degree adverb options for the remaining response options (e.g., a little, some, much, most, and all) [3, 16]. Mathematically, if the ratios of the adjacent degree adverb options are the same among all the items, the lines for item responses can cross at a single point between the negative adverb option and the adjacent degree adverb option, and they show a parallel pattern during the degree-adverb options on a log-normal scale. In fact, the ratios of the adjacent degree adverb options were similar among all items in the previous studies [12, 15].
Furthermore, the total scores on depression screening scales in the general population have been reported to approximate an exponential pattern except for the lower end of the distribution. These findings have been replicated in an analysis of Revised Clinical Interview Schedule (CIS-R) data from the British National Household Psychiatric Morbidity Survey [17], the CES-D data from the same nationally representative surveys [14, 18], and the K6 data from the MIDUS [15].
Taken together, these findings suggest that the item responses and total scores on depression screening scales follow the same characteristic pattern in the general population. The degree to which these findings can be generalized to other depression scales is unclear but warrants examination. To date, there are few studies that have investigated the distributional patterns of the item responses and total scores on the PHQ-9 in the general population. Thus, we investigated whether the item responses and total scores on the PHQ-9 follow characteristic patterns, consistent with other depression screening scales. If the empirical distributions of item responses and total scores on the PHQ-9 are shown to represent specific distributions, it will shed light on the mechanism of depressive symptoms. For example, the results of the previous factor analytic studies are incongruent about the number of latent traits of depressive symptoms [19]. If the distributions of depressive symptoms are proven to exhibit a common mathematical pattern, it may provide further evidence of the number of latent traits of depressive symptoms.
Moreover, the analysis of the PHQ-9 data potentially enables a further understanding of the prevalence of suicidal ideation in the general population. While the CES-D and K6 have no item about suicidal ideation, the PHQ-9 includes an item about suicidal ideation: “thoughts of being better off dead and active ideas of self-harm” [8]. While the prevalence of suicidal ideation is often expressed in percentage, the severity of suicidal ideation (intensity and duration of a patient’s thoughts about suicidal ideation) varies from individual to individual [20]. Furthermore, the risk of suicide death or suicide attempt increases with the severity of suicidal ideation [21,22,23]. Thus, to understand the prevalence of suicide ideation in the general population, it is necessary to elucidate the severity distribution of suicidal ideation. In this study, we sought to elucidate the pattern of response to the suicidal ideation item among the general population and determine whether the item response follows a characteristic pattern consistent with other items.
This study used the PHQ-9 data from the National Health and Nutrition Examination Survey (NHANES). The NHANES is a national survey conducted to understand the health and nutritional status of people in the United States [24], and the PHQ-9 has been included as part of the NHANES since 2006 [10]. The sample for the NHANES was designed to represent the US population and minimize selection bias. The PHQ-9 data from the NHANES are adequate to confirm the reproducibility of the findings due to the sufficiently large sample sizes. The NHANES data are accessible to researchers around the world and have been utilized in numerous studies on public health.
The aim of this study was to elucidate the patterns of item responses and total scores on the PHQ-9 in the general population and determine whether they follow the characteristic patterns consistent with the CES-D and K6. Furthermore, we investigated the pattern of item response of suicidal ideation and determined whether the item response of suicidal ideation item in the general population follow the characteristic pattern, consistent with depressive symptom items.