The clinical global impression scale and the influence of patient or staff perspective on outcome
© Forkmann et al; licensee BioMed Central Ltd. 2011
Received: 11 February 2011
Accepted: 14 May 2011
Published: 14 May 2011
Since its first publication, the Clinical Global Impression Scale (CGI) has become one of the most widely used assessment instruments in psychiatry. Although some conflicting data has been presented, studies investigating the CGI's validity have only rarely been conducted so far. It is unclear whether the improvement index CGI-I or a difference score of the severity index CGI-Sdif is more valid in depicting clinical change. The current study examined the validity of these two measures and investigated whether therapists' CGI ratings correspond to the view the patients themselves have on their condition.
Thirty-one inpatients of a German psychotherapeutic hospital suffering from a major depressive disorder (age M = 45.3, SD = 17.2; 58.1% women) participated. Patients filled in the Beck Depression Inventory (BDI). CGI-S and CGI-I were rated from three perspectives: the treating therapist (THER), the team of therapists involved in the patient's treatment (TEAM), and the patient (PAT). BDI and CGI-S were filled in at admission and discharge, CGI-I at discharge only. Data was analysed using effect sizes, Spearman's ρ and intra-class correlations (ICC).
Effect sizes between CGI-I and CGI-S dif ratings were large for all three perspectives with substantially higher change scores on CGI-I than on CGI-S dif. BDIdif correlated moderately with PAT ratings, but did not correlate significantly with TEAM or THER ratings. Congruence between CGI-ratings from the three perspectives was low for CGI-S dif (ICC = .37; Confidence Interval [CI] .15 to .59; F 30,60 = 2.77, p < .001; mean ρ = 0.36) and moderate for CGI-I (ICC = .65 (CI .47 to .80; F 30,60 = 6.61, p < .001; mean ρ = 0.59).
Results do not suggest a definite recommendation for whether CGI-I or CGI-S dif should be used since no strong evidence for the validity of neither of them could be found. As congruence between CGI ratings from patients' and staff's perspective was not convincing it cannot be assumed that CGI THER or TEAM ratings fully represent the view of the patient on the severity of his impairment. Thus, we advocate for the incorporation of multiple self- and clinician-reported scales into the design of clinical trials in addition to CGI in order to gain further insight into CGI's relation to the patients' perspective.
The Clinical Global Impression Scale (CGI) is a brief clinician-rated instrument that consists of three different global measures. 1. Severity of illness: overall assessment of the current severity of the patient's symptoms (CGI-S); 2. Global improvement: overall comparison of the patient's baseline condition with his current state (CGI-I); 3. Efficacy index: overall comparison of the patient's baseline condition to a ratio of current therapeutic benefit and severity of side effects (CGI-E). Since its first publication the CGI has become one of the most widely used assessment tools in psychiatry . For example, the CGI, especially the CGI improvement scale (CGI-I) has been widely utilized as an efficacy measure in clinical drug trials in different mental disorders [e.g., depression, schizophrenia; [2, 3]]. Its popularity is mainly based on its conciseness and easiness of administration.
It is widely accepted and some studies presented evidence arguing that the CGI is a valid assessment instrument. Moreover, the CGI was used as external criterion to test the validity of other outcome measures such as the Beck Depression Inventory [BDI; ], the Hamilton Depression Rating Scale [HAMD; ] or the Montgomery-Asberg Depression Rating Scale [MADRS; [6–9]].
Despite its general acceptance and extensive use as outcome measure and criterion for the validation of other instruments, the CGI's psychometric characteristics have only rarely been examined so far. Some evidence has been presented arguing for its validity when used in clinical trials . Beyond that, in a recent meta-analysis, Hedges et al.  calculated effect sizes for CGI and other rating scales from 16 different studies on social phobia and found mostly comparable effect sizes for the CGI-I and several social anxiety scales. In line with that, Khan et al.  found similar effect sizes for MADRS, HAMD and CGI in antidepressant clinical trials which were interpreted by the authors as supporting the CGI's sensitivity.
However, from early on, the CGI has been criticized for being inconsistent, unreliable and too general to measure clinical conditions or treatment responses validly [13, 14]. Guy  draws attention on the role of memory when using the CGI-I and claimed that the task to compare a patient's general clinical condition at study end to that at the beginning of the study using the CGI is essentially a test of the rater's memory. Recently, more empirical evidence for this criticism has been presented. Busner et al.  found that the CGI ratings of the clinicians are affected by indication-irrelevant adverse events reported by the patient. Participants were asked to rate the severity of a major depressive disorder or a generalized anxiety disorder and nausea or dizziness served as indication irrelevant medical events. The more such events being reported by the patient, the more likely the clinician rated the patient as more severely ill. The authors concluded that these reports can threaten validity of the CGI seriously. Jiang and Ahmed  found evidence for relatively low correlation between CGI-S and CGI-I which raised the question of whether it is more appropriate to use the CGI-I or a difference between CGI-S pre and CGI-S post intervention to judge change across treatment.
A couple of different efforts have been made to improve the psychometric characteristics of the CGI. Kadouri et al.  tested the use of a semi-structured interview, a new response format and a Delphi process to improve reliability of the CGI. Best results were found when ratings of four different clinical raters were averaged. Targum et al.  found significantly augmented scoring variance due to treatment emergent symptoms and developed targeted scoring criteria for the CGI to enhance inter-rater reliability. Another attempt to improve the CGI's psychometric quality was the development of alternative versions of the CGI for use in special patient groups [e.g., ].
To sum up, results of studies on the psychometric performance of the CGI are mixed. Additional research appears necessary. More precisely, the question of whether the CGI provides a valid measure of the patient's condition and if so whether it is more appropriate to use CGI-I or a difference score of CGI-S as outcome criterion is not ultimately answered. The current study therefore addressed this issue. First, we aimed at clarifying whether the CGI provides a valid measure of the patient's condition. For this purpose, it was investigated whether CGI ratings correspond to the view the patient has on his or her current condition. If so, clinician rated CGI scores should relate to patient rated CGI scores and scores on other patient reported outcome measures. Furthermore - in correspondence with findings from Kadouri et al.  - we expected that this relation improves if not a single clinician does the rating but a whole team of therapists using a consensus process. Second, starting from the results of Jiang and Ahmed  this study assessed whether it is valid to rely on the CGI-I when rating clinical change or whether calculating difference scores for CGI-S at the beginning and the end of the intervention would enhance validity. Based on Guy's  notion on the role of memory when using CGI-I we expected that difference scores for CGI-S were the more valid measure. Implications for clinical practice will be discussed.
At the hospital, patients are treated on an inpatient basis with high-density empirically-based psychotherapy that is personalized depending on the disorder of the patient. The program uses symptom-focused and highly individualized interventions. Each inpatient is treated by only one therapist for as long as eight hours per day.
High-density psychotherapy typically includes four phases: (1) Psychological assessment and a medical examination from which feedback is given to the patient, as is information about the therapy program. This phase includes 6-8 sessions, and it lasts one or two days. (2) Cognitive preparation for therapy is given to enhance the patient's motivation for specific treatment exercises. The patient's core assumptions about the aetiology of his or her disorder are taken into account when the treatment plan is devised. The therapist explains to the patient the details of the therapy and the subsequent steps to be taken. (3) During this phase, specific therapeutic exercises are carried out. These include standard elements of cognitive behavioural therapy for depression. (4) The self-management phase begins after several days of high-density psychotherapy. At the beginning of this phase, the therapist helps the patient to plan and organize the tasks to be undertaken; thereafter, the patient is asked to independently devise difficult tasks to do. Finally, the difficulties that the patient has in completing the tasks are evaluated. After discharge, therapists remain in telephone contact with their patients for at least six weeks.
Beck Depression Inventory (BDI)
The BDI contains 21 items . Each item consists of four self-referring statements (e.g. "I am sad"). Item scores range from 0 to 3 and participants are supposed to choose one or more statements per item that represents best their mental state during the last week. A total score >10 indicates mild to moderate depression and a total score >18 moderate to severe depression. The BDI was filled in at admission and discharge.
Clinical Global Impression Scale (CGI)
The CGI consists of three global measures. The CGI severity of illness measure (CGI-S) is rated from 1 (normal, not at all ill) to 7 (among the most extremely ill patients). A "0" is allocated if the patient was not assessed. The CGI-S was rated at admission (CGI-Sadm) and at discharge (CGI-Sdis). The CGI global improvement measure (CGI-I) is rated from 1 (very much improved) to 7 (very much worse). Again, "0" stands for "not assessed". The CGI-I was rated at discharge only. The third measure is called the efficacy index CGI-E. It was not assessed in the current study .
The CGI measures were rated from three perspectives: the treating therapist (THER), the team of therapists concerned with the patient (TEAM), and the patient him- or herself (PAT). The team of therapists concerned with the patient performed a delphi process to reach a consensus rating of the respective patient's condition.
CGI-I vs. CGI-S dif
Difference scores for CGI-S (CGI-S dif = CGI-Sadm-CGI-Sdis) were determined and contrasted to CGI-I ratings for all three perspectives to determine congruence of the two global ratings. Additionally, effect sizes d between CGI-S dif and CGI-I and their confidence intervals (95%) were calculated for all three perspectives. If the confidence interval for the ES includes zero, the effect can be regarded as statistically nonsignificant. In order to reduce sampling error effect sizes have been corrected using a factor provided by Hedges and Olkin . Following Cohen  effect sizes .20 < d ≤ .50 were interpreted as small, .50 < d ≤ .80 as medium, and d ≥ .80 as large. Before calculating effect sizes, CGI-S dif was rescaled for this step of analysis into values from 1 to 7 with 4 meaning no change in order to bring CGI-I and CGI-Sdif to a common metric. Above, both CGI-I and CGI-S dif were correlated (Spearman's ρ) with BDI difference scores (BDIdif = BDIadm-BDI dis).
Congruence between patients', therapists' and teams' perspectives on CGI-S and CGI-I
Means and standard deviations (SD) for CGI-Sadm, CGI-S dis and for CGI-I were calculated. Corrected effect sizes d were calculated between CGI-Sadm and CGI-S dis for all three perspectives. Afterwards, measures of congruency between the three perspectives were calculated. Because interval scale level of data collected with the CGI could not be taken for granted we decided to report both measures for interval scale level data and measures for ordinal scale level data. As measures of congruency for interval scale level data intraclass correlations (ICC) according to McGraw and Wong  were calculated separately for CGI-Sadm, CGI-Sdis, and CGI-I to determine congruency of the patients', therapists' and team's ratings on these three global measures. In addition, Spearman's ρ for ordinal scale level data was determined. Significance level was set at α = .05.
All analyses were conducted using SPSS 17 for Windows.
CGI-I vs. CGI-Sdif
Mean ratings on CGI-I and CGI-S at admission and discharge from all three perspectives
Congruence between patients', therapists' and teams' perspectives on CGI-S and CGI-I
Mean CGI-Sadm ratings at admission were 4.0 (SD = 1.9) for the patient, 4.97 (SD = 0.71) for the therapist, and 5.0 (SD = 0.63) for the team perspective. At discharge all mean ratings dropped: patients' CGI-Sdis mean ratings were 3.45 (SD = 1.50), therapists' were 3.87 (SD = 1.09), and teams' ratings were 3.94 (SD = 0.77). The resulting effect sizes d differed substantially (d Patient = .32; d therapist = 1.18; d team = 1.48). The effect size for the patient perspective was markedly smaller than for the other two perspectives which coincided with a much bigger standard deviation. Effect size for BDI sum scores was large (dBDI = 1.15; Madm = 20.2, SDadm = 8.4; Mdis = 10.7, SDdis = 7.9).
Intercorrelations between the three perspectives for CGI-I, CGI-Sadm, CGI-Sdis, and CGI-Sdif
Mean CGI-I ratings were 2.03 (SD = 1.20) for the patient, 2.16 (SD = .82) for the therapist and 2.10 (SD = .91) for the team perspective. The intraclasscorrelation between the patients', therapists' and team's ratings on CGI-I was ICC = .65 (CI .47 to .80; F 30,60 = 6.61, p < .001; mean ρ = 0.42) indicating moderate to high agreement between the ratings from the three perspectives.
The current study aimed at investigating the validity of the CGI-I and CGI-Sdif as outcome measures in clinical trials. More precisely, it was examined whether use of CGI-I or CGI-Sdif appears more appropriate. Above, it was investigated whether therapists' CGI ratings correspond to the view the patients themselves have on their condition.
The results of the present study showed that CGI-I provided relatively high change scores compared to the difference score CGI-Sdif in terms of effect sizes. To rate a patient's condition on the CGI-I clinicians first have to remember the patient's condition at admission and then contrast it to their condition at present. By contrast, CGI-S only needs representation of the patient's current condition. Thus, the current results might be interpreted as suggesting that using CGI-I might be more prone to well known effects of hindsight memory distortion [e.g., ]: When using CGI-I at discharge, therapists, teams and patients might have been inclined to retrospectively recall the patient's condition at admission as more impaired than it really was according to CGI-Sadm and thus rated change of condition as more prominent. If this was the case, in our view, it would threaten the validity of CGI-I as outcome measure in clinical trials. However, additional research is needed directly addressing the role of memory effects on results in CGI-I until a definite conclusion on this issue is possible.
The congruence of ratings from the three perspectives on CGI-I was moderate to good and much better than the congruence of ratings on CGI-S. Moreover, while congruence between the single therapists and the teams was moderate to good, patients gave divergent ratings especially on CGI-S dif. Overall, patients provided the most conservative ratings for change, in both CGI-I and CGI-Sdif. Simultaneously, patients' ratings correlated most strongly with BDIdif for both CGI-I and CGI-Sdif while correlations with BDI for the other two perspectives were virtually zero. One might oppose that doubts on the validity of a self-reported CGI-rating might be warrantable because originally the CGI was not designated to be a self-rated scale so that low correlations with self-reported CGI could be seen as weak criterion for validity. However, self-reported CGI-ratings correlated significantly with BDI and the validity of BDI as an instrument for the assessment of depression severity has been shown in numerous studies [for some recent examples see e.g., [27, 28]]. These results suggest that CGI ratings - regardless of whether CGI-I or CGI-S dif are concerned - made by the treating therapist or obtained through a consensus process in the team of therapists appear not to fully represent the view of the patient on the severity of his or her impairment.
So which global measure of CGI should be used as outcome measure, CGI-I or CGI-S dif? Results of the present study do not suggest a definite recommendation since no strong evidence for the validity of neither CGI-I nor CGI-Sdif could be found. In our view, the overall picture of results could be interpreted as being slightly in favour for CGI-I but without doubt additional research is needed.
As already noted, there were no substantial differences between therapists' and teams' ratings. One potential explanation is that in our study the therapist who did the single rating was also member of the team of therapists and might have influenced the consensus rating in his favoured direction. Nevertheless, at least under the conditions described, our results suggest that in contrast to Kadouri et al.  a consensus rating following a Delphi process does not necessarily change reliability or validity of the rating.
A couple of limitations of the current study have to be reported. The sample size was rather small so that reported results should be interpreted with care. Above, only patients suffering from a MDD have been assessed which impedes generalizability of the reported results to other patient groups. Because the length of the current depressive episode could not be determined from study data, it could not be ruled out that length of depressive episode or chronicity could have had an influence on results. Furthermore, since neither the CGI nor the BDI have been applied to a random sample of the adult population the rather low to moderate ICC found in the present study might simply be explained by the fact that only a very homogeneous sample consisting of patients who had been hospitalized for MDD has been investigated. Replication studies, ideally with larger and more heterogeneous samples are warranted.
The only criterion available for the validation of the CGI in this study was self-reported data (BDI and patients' ratings on CGI). However, the most valid procedure for diagnosing a depressive disorder is a structured diagnostic interview based on DSM-IV  or ICD-10  criteria that is conducted by a clinical expert. Thus, future studies should incorporate interview-based assessments at discharge for replication of the present findings.
The reported findings were not collected in a clinical trial which is one of the main areas of application for CGI. In clinical trials clinicians are usually blinded as to what study condition the patient belongs, e.g., treatment vs. placebo. Thus, they do not know whether it is supportive for the aim of the study to state that the patient improved much or not. However, in this study, clinicians treated and rated the patients themselves. It might therefore be possible that clinicians might have been inclined to assign relatively high change scores. However, they also knew that the conducted study did not aim at evaluating therapy effects so that we expect the effect of such demand characteristics in our data to be rather small. Nevertheless, future research should investigate whether our results could be replicated in a blinded setting.
In summary, in line with previous research [16, 17, 19] the results of the present study cast doubt on the validity of the CGI. To our knowledge, this is the first study that included correspondence of clinician rated CGI scores with the patients' own perspective on their clinical condition as one criterion of validity. Our results do not suggest a definite recommendation for whether CGI-I or CGI-Sdif should be used since no strong evidence for the validity of neither CGI-I nor CGI-Sdif in terms of high correlations with ratings from the patients' perspective could be found. We conclude that it cannot be recommended to rely upon CGI alone as outcome measure in clinical trials but rather advocate for the incorporation of multiple self- and clinician-reported scales into the design of clinical trials in addition to CGI in order to gain further insight into CGI's relation to the patients' perspective.
- Guy W: ECDEU Assessment Manual for Psychopharmacology. 1976, Rockville, MD: U.S. Department of Health, Education, and WelfareGoogle Scholar
- Hale A, Corral RM, Mencacci C, Ruiz JS, Severo CA, Gentil V: Superior antidepressant efficacy results of agomelatine versus fluoxetine in severe MDD patients: a randomized, double-blind study. Int Clin Psychopharmacol. 2010, 25: 305-314. 10.1097/YIC.0b013e32833a86aa.View ArticlePubMedGoogle Scholar
- Hsieh MH, Lin WW, Chen ST, Chen KC, Chen KP, Chiu NY, Huang C, Chang CJ, Lin CH, Lai TJ: A 64-week, multicenter, open-label study of aripiprazole effectiveness in the management of patients with schizophrenia or schizoaffective disorder in a general psychiatric outpatient setting. Ann Gen Psychiatry. 2010, 9: 35-10.1186/1744-859X-9-35.View ArticlePubMedPubMed CentralGoogle Scholar
- Beck AT, Steer RA: Beck Depression Inventory. 1987, San Antonio: The Psychological Corporation IncGoogle Scholar
- Hamilton M: Development of a rating scale for primary depressive illness. British journal of social and clinical psychology. 1967, 6: 278-296.View ArticlePubMedGoogle Scholar
- Montgomery SA, Asberg M: A new depression scale designed to be sensitive to change. British Journal of Psychiatry. 1979, 134: 382-389. 10.1192/bjp.134.4.382.View ArticlePubMedGoogle Scholar
- Riedel M, Möller HJ, Obermeier M, Schennach-Wolff R, Bauer M, Adli M, Kronmüller K, Nickel T, Brieger P, Laux G, Bender W, Heuser I, Zeiler J, Gaebel W, Seemüller F: Response and remission criteria in major depression - A validation of current practice. J Psychiatr Res. 2010, 44: 1063-1068. 10.1016/j.jpsychires.2010.03.006.View ArticlePubMedGoogle Scholar
- Korner A, Lauritzen L, Abelskov K, Gulmann NC, Brodersen AM, Wedervang-Jensen T, Marie Kjeeldgaard K: Rating scales for depression in the elderly: external and internal validity. J Clin Psychiatry. 2007, 68: 384-389. 10.4088/JCP.v68n0305.View ArticlePubMedGoogle Scholar
- Lindenmayer JP, Czobor P, Alphs L, Nathan AM, Anand R, Islam Z, Chou JC: The InterSePT scale for suicidal thinking reliability and validity. Schizophr Res. 2003, 63: 161-170. 10.1016/S0920-9964(02)00335-3.View ArticlePubMedGoogle Scholar
- Leon AC, Shear MK, Klerman GL, Portera L, Rosenbaum JF, Goldenberg I: A comparison of symptom determinants of patient and clinician global ratings in patients with panic disorder and depression. J Clin Psychopharmacol. 1993, 13: 327-331.View ArticlePubMedGoogle Scholar
- Hedges DW, Brown BL, Shwalb DA: A direct comparison of effect sizes from the clinical global impression-improvement scale to effect sizes from other rating scales in controlled trials of adult social anxiety disorder. Hum Psychopharmacol. 2009, 24: 35-40. 10.1002/hup.989.View ArticlePubMedGoogle Scholar
- Khan A, Khan SR, Shankles EB, Polissar NL: Relative sensitivity of the Montgomery-Asberg Depression Rating Scale, the Hamilton Depression rating scale and the Clinical Global Impressions rating scale in antidepressant clinical trials. Int Clin Psychopharmacol. 2002, 17: 281-285. 10.1097/00004850-200211000-00003.View ArticlePubMedGoogle Scholar
- Beneke M, Rasmus W: "Clinical Global Impressions" (ECDEU): some critical comments. Pharmacopsychiatry. 1992, 25: 171-176. 10.1055/s-2007-1014401.View ArticlePubMedGoogle Scholar
- Dahlke F, Lohaus A, Gutzmann H: Reliability and clinical concepts underlying global judgments in dementia: implications for clinical research. Psychopharmacol Bull. 1992, 28: 425-432.PubMedGoogle Scholar
- Guy W: Clinical Global Impressions Scale (CGI). Handbook of Psychiatric Measures. Edited by: Rush AJ. 2000, Washington, DC: American Psychiatric Association, 100-102.Google Scholar
- Busner J, Targum SD, Miller DS: The Clinical Global Impressions scale: errors in understanding and use. Compr Psychiatry. 2009, 50: 257-262. 10.1016/j.comppsych.2008.08.005.View ArticlePubMedGoogle Scholar
- Jiang Q, Ahmed S: An analysis of correlations among four outcome scales employed in clinical trials of patients with major depressive disorder. Ann Gen Psychiatry. 2009, 8: 4-10.1186/1744-859X-8-4.View ArticlePubMedPubMed CentralGoogle Scholar
- Kadouri A, Corruble E, Falissard B: The improved Clinical Global Impression Scale (iCGI): development and validation in depression. BMC Psychiatry. 2007, 7: 7-10.1186/1471-244X-7-7.View ArticlePubMedPubMed CentralGoogle Scholar
- Targum SD, Busner J, Young AH: Targeted scoring criteria reduce variance in global impressions. Hum Psychopharmacol. 2008, 23: 629-633. 10.1002/hup.966.View ArticlePubMedGoogle Scholar
- Zaider TI, Heimberg RG, Fresco DM, Schneier FR, Liebowitz MR: Evaluation of the clinical global impression scale among individuals with social anxiety disorder. Psychol Med. 2003, 33: 611-622. 10.1017/S0033291703007414.View ArticlePubMedGoogle Scholar
- Hiller W, Zaudiga M, Mombour W: ICD International Diagnostic Checklists for ICD-10 and DSM-IV. 1999, Goettingen: Hogrefe & HuberGoogle Scholar
- Wittchen H-U, Zaudig M, Fydrich T: Strukturiertes Klinisches Interview für DSM-IV. 1997, Goettingen, HogrefeGoogle Scholar
- Hedges LV, Olkin I: Statistical Methods for Meta-Analysis. 1985, Orlando: Academic PressGoogle Scholar
- Cohen J: Statistical power for the behavioural science. 1988, Hillsdale, NJ: Erlbaum, 2Google Scholar
- McGraw KO, Wong SP: Forming inferences about some intraclass correlation coefficients. Psychological Methods. 1996, 1: 30-46.View ArticleGoogle Scholar
- Bradfield A, Wells GL: Not the same old hindsight bias: outcome information distorts a broad range of retrospective judgments. Mem Cognit. 2005, 33: 120-130. 10.3758/BF03195302.View ArticlePubMedGoogle Scholar
- Loosman WL, Siegert CE, Korzec A, Honig A: Validity of the Hospital Anxiety and Depression Scale and the Beck Depression Inventory for use in end-stage renal disease patients. Br J Clin Psychol. 2009, 49: 507-516.View ArticlePubMedGoogle Scholar
- Moran PJ, Mohr DC: The validity of Beck Depression Inventory and Hamilton Rating Scale for Depression items in the assessment of depression among patients with multiple sclerosis. J Behav Med. 2005, 28: 35-41. 10.1007/s10865-005-2561-0.View ArticlePubMedGoogle Scholar
- American Psychological Association: Diagnostic and statistical manual of mental disorders, fourth edition. DSM-IV. 1994, Washington DCGoogle Scholar
- World Health Organization: The ICD-10 classification of mental and behavioral disorders: clinical descriptions and diagnostic guidelines. 1992, GenevaGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-244X/11/83/prepub