In a convenient sample of patients with cancer, with mixed diagnosis, receiving chemotherapy in an outpatient clinic, the Greek version of the DT compared to the clinical psychiatric interview demonstrated sufficient accuracy in classifying patients with depressive disorders.
The AUC was 0.79. Searching for the optimal cut-off point we faced a dilemma since 4 and 5 had similar operating characteristics. At a cut-off point of ≥4 the sensitivity was 0.85, the specificity 0.60, the PPV 0.44, the NPV 0.92 and the DOR 8.88. At a cut-off point of ≥5 the sensitivity was 0.81, the specificity 0.66, the DOR 7.92 whereas the Youden Index was slightly higher. We decided to choose 4 as the proposed cut-off point because from a clinical point of view at this cut-off point the test performs better (higher DOR, sensitivity exceeding the 0.85 level).
Our decision is not only clinically relevant but also recommended. Ma et al. [13] in their meta-analysis faced the same dilemma in the comparison of the DT to the DSM-IV. From their part, they chose a higher DOR and a higher sensitivity instead of a slightly better Youden Index. Thus, they recommended 4 as the optimal cut-off score “in order to rule in as many cases”. Furthermore, consistency on a global scale was an additional important criterion for adopting 4 as the cut-off score in the Greek Version as 4 is the preferred cut-off point worldwide [13].
The psychometric properties of the DT have been examined during the last 20 years, compared to several different tools. Paradoxically, in examining a screening test which of course attempts to detect firstly the most severely distressed people, i.e. those with a psychiatric diagnosis, the gold standard, the clinical interview, has not been commonly utilized. Ma et al. [13] in their meta-analysis examining the accuracy of the DT included 42 eligible studies from 20 counties in which 10 different reference standards were used. Only 8 of the 42 studies used “the real standard (the clinical interview)” while the others used questionnaires, mainly the HADS; researchers consider this finding as a limitation in their meta-analysis. Accordingly, Donovan et al. [20], in their research for translated versions of the DT, presented 23 publications describing the use of a non-validated foreign language version of the DT. Only in four of them mental diagnosis, following clinical interview, was utilized as a criterion in the ROC analysis.
In our study, DT showed a good sensitivity of 85% but a relatively low specificity of 60%. According to Ma et al. [13], when all the results were pooled together the DT, at the cut-off point of 4, demonstrated “a good balance between pooled sensitivity (0.81, 95% CI 0.79-0.82) and pooled specificity (0.72, 95% CI 0.71-0.72)”. When DT was compared to HADS-Total “the balance between pooled sensitivity (0.82, 95% CI 0.80-0.84) and pooled specificity (0.73, 95% CI 0.72-0.74) was maximized”. At the same cut-off point, in the comparison of the DT to the clinical interview/DSM-IV, the pooled sensitivity was 0.84 (95% CI 0.80–0.88) but the pooled specificity dropped to 0.63 (95% CI 0.61–0.66). Finally, in the comparison of the DT to the clinical interview/ICD-10 the pooled sensitivity was 0.79 (95% CI 0.60–0.87) and the pooled specificity 0.60 (95% CI 0.52–0.68). It is worth mentioning that in a previously published meta-analysis the psychometric properties were found even lower [21], while there are some studies that failed to find a link between the DT and the clinical interview [22, 23].
Few studies have focused in the ability of the DT to identify depressive disorders compared to the clinical interview. Akizuki et al. [24] reported that DT revealed a sensitivity of 84% and specificity of 61% for detection of adjustment disorders and major depression. Grassi et al. [25] found a sensitivity of 79.5% and specificity of 75.4%, following an ICD-10 diagnosis of affective syndrome. Rooney et al. [26] reported a sensitivity of 94 to 67% (at different time points) and specificity of 69 to 75% for MDD; researchers investigated the operating characteristics of HADS, PHQ-9 and DT and they concluded that “due to a modest positive predictive value of either instrument, patients scoring above these thresholds need a clinical assessment to diagnose or exclude depression”. On the other hand, in the Wagner et al. study [27] – where DT, Hopkins Symptom Check List-25 (HSCL-25), PHQ-9/PHQ-2 and Structured Clinical Interview (SCID) for major depression, dysthymia, and adjustment disorders were used – the DT showed a sensitivity of 0.80% and a specificity of 52%; the authors underlined that: “The NCCN®-DT (AUC=0.59) indicated poor accuracy in classifying patients with regard to the presence of mood disorders.”
Our results are in agreement with those derived by most researchers who used the psychiatric interview as the gold standard to validate the DT’s accuracy; the Greek version of the NCCN®‘s Distress Thermometer exhibited at least similar psychometric properties to previous reports from other international studies. Additionally, our results support a 2-step process; patients scoring ≥4 should undergo a more thorough mental evaluation.
The psychometric properties of the DT have raised a debate regarding its usefulness. Recklitis et al. [28] in their study of the DT compared to a psychiatric interview reported a sensitivity of 68.18%, and a specificity of 78.33%; they emphasized that “The DT … failed to identify 31.81% of survivors with a SCID diagnosis. No alternative DT cut-off score met criteria for acceptable sensitivity (≥.85) and specificity (≥.75).” Wagner et al. [27] extend similar concerns to an extreme by questioning the DT as useless.
Given the necessity of detecting mental problems in patients with cancer, various instruments are offered to clinicians to assist them identify patients in need for psychosocial support. Oncologists seem to face difficulties in recognizing the psychiatric morbidity [22, 29]. The nurses are often the first point of encounter with the patient and as such can be extremely assisted by a brief measure of psychological distress screening [30]. The DT belongs in the category of Ultra Short Term Questionnaires; in the clinical setting these tests are very easy to administer, quick and inexpensive. Nevertheless, their feasibility is counterbalanced by a modest accuracy and a poor specificity [21]. It would be worth noting that short tests do operate better when applied to rule out non-depressed patients [9, 31] In a busy oncology department, it would be extremely useful for the clinicians to be aware of the patients not suffering from depression. As for those highly distressed, a more thorough assessment of a possible diagnosis of depression can be utilized [9, 26, 31, 32].
As expected distressed/depressed patients reported more problems on the Problem List compared with non-distressed/non-depressed patients. The most frequent problems reported by the distressed/depressed patients were fatigue, followed by emotional problems, more specifically worries and nervousness; while pain and sleep were reported at a high percentage, spiritual/religious concerns, child care and sexual problems were in contrary at a low percentage. ‘Sexual problems’ was the only item in which more non-depressed than depressed patients expressed concerns to a significant point. However, the lack of randomization cannot exclude the possibility of selection bias in our sample.
In a previous Greek study, Antoniadis et al. [15] compared the DT with the HADS in elderly (mean age: 70, SD: 9.5) patients with colorectal cancer who were admitted for surgery in a period of surgical treatment; the researchers excluded patients with major health problems as well as those with a psychiatric history during the past 5 years. “Compared to cancer patients from other countries the mean HADS score of [their] sample was significantly higher” [15]. The mean score of DT was 5.7 (sd 2.7), the AUC 0.805 and for the cut-off point of 7, sensitivity was 0.73, specificity 0.80. In the Problem List worries (81.0%), nervousness (78.6%), fears (70.2%), treatment decisions (69.0%), sleep (67.9%), sadness (65.5%), child care (59.5%) and fatigue (52.4%) were the most reported. According to the authors, cultural factors may have contributed to the differences, especially for the high cut-off score; they also speculated that the socioeconomic condition in Greece and the economic crisis may have had an impact. Our results are not in agreement with these assumptions. Greek cultural factors or socioeconomic condition did not differentiate our results, which are similar to those reported from other countries [13]. Possibly, the sampling procedure and the treatment phase had a crucial influence on the differences reported by Antoniadis et al. Of note, DT scores may differ at different time points on the cancer trajectory [26, 33].
Several limitations to this study need to be acknowledged. This was a single-center study, at a University Hospital with patients in active treatment. We used a non-random sample and the numbers do not allow for comparisons between patients suffering from different types of cancer or being on different chemotherapy regimens. Finally, we did not search for possible subtypes within the construct of depression. A multi-center study, with a large heterogeneous sample will allow for more detailed comparisons between subgroups of patients on different points within the illness trajectory.