- Research article
- Open Access
- Open Peer Review
Psychometric evaluation of the Major Depression Inventory (MDI) as depression severity scale using the LEAD (Longitudinal Expert Assessment of All Data) as index of validity
© Bech et al. 2015
- Received: 15 December 2014
- Accepted: 14 June 2015
- Published: 5 August 2015
The Major Depression Inventory (MDI) was developed to cover the universe of depressive symptoms in DSM-IV major depression as well as in ICD-10 mild, moderate, and severe depression. The objective of this study was to evaluate the standardization of the MDI as a depression severity scale using the Visual Analogue Scale (VAS) as index of external validity in accordance with the LEAD approach (Longitudinal Expert Assessment of All Data).
We used data from two previously published studies in which the patients had a MINI Neuropsychiatric Interview verified diagnosis of DSM-IV major depression. The conventional VAS scores for no, mild, moderate, and severe depression were used for the standardization of the MDI.
The inter-correlation for the MDI with the clinician ratings (VAS, MES, HAM-D17 and HAM-D6) increased over the rating weeks in terms of Pearson coefficients. After nine weeks of therapy the coefficient ranged from 0.74 to 0.83.
Using the clinician-rated VAS depression severity scale, the conventional MDI cut-off scores for no or doubtful depression, and for mild, moderate and severe depression were confirmed.
Using the VAS as index of external, clinical validity, the standardization of the MDI as a measure of depression severity was accepted, with an MDI cut-off score of 21 for mild depression, 26 for moderate depression severity, and 31 for severe depression.
Martiny et al. Acta Psychiatr Scand 112:117-25, 2005: None – due to trial commencement date.
Straaso et al. Acta Neuropsychiatr 26:272-9; 2014: ClinicalTrials.gov ID NCT01353092.
- Major depression inventory
- Hamilton depression scale
- Melancholia scale
- Visual analogue scale
The Major Depression Inventory (MDI) was developed [1, 2] to cover the universe of depressive symptoms in DSM-IV major depression  and in ICD-10 depression  (mild, moderate, severe). Consequently the time frame (window) for the MDI is the past two weeks to accord with DSM-IV and ICD-10.
The MDI can be used as a diagnostic scale by following the algorithms in accordance with DSM-IV or ICD-10. Using as index of diagnostic validity the Schedules for Clinical Assessment in Neuropsychiatry (SCAN)  administered by experienced psychiatrists, we obtained a sensitivity of 90 % and a specificity of 82 % for DSM-IV major depression .
Via its summed total score the MDI can also be a measure of depression severity analogue to the Zung Self-rating Depression Scale (Zung-SDS, ) or the Beck Depression Inventory (BDI, ). However, we have previously shown that the MDI is superior to the Zung-SDS ( and to the BDI . Another widely used depression questionnaire, the Patient Health Questionnaire (PHQ-9) , was developed with reference to DSM-IV. However, the PHQ-9 was especially constructed to capture the diagnosis of major depression, not to be a measure of depression severity like the BDI. In contrast, the MDI actually fulfils both Mokken’s non-parametric item response theory model  and Rasch’s one-parametric model  as shown by Olsen et al.  and can thus be used as a unidimensional depression severity scale. However, we still need to confirm the conventional cut-off scores of MDI, such as that of >25 for major depression.
The clinical validity of a scale must be evaluated by the use of an independent global severity assessment performed by an experienced clinician. Spitzer  called this procedure the LEAD (Longitudinal Expert Assessment of All Data) approach. By “expert” Spitzer  was referring to a clinician who had demonstrated his or her competence to make this assessment based on a thorough clinical interview taking all available data into account. This LEAD approach was used in our validation study of the Hamilton Depression Scale  and was also used by Maier  when he validated the Hamilton Scale (HAM-D17), the Montgomery Åsberg Depression Rating Scale (MADRS)  and the Bech-Rafaelsen Melancholia Scale (MES) . In the analysis to be reported here we used a Visual Analogue Scale (VAS) from 0 to 100 mm for the LEAD assessment of depression severity [16, 17].
The objective of this study was to evaluate the MDI as a depression severity scale using both a global VAS assessment as well as the Hamilton Depression Scale (HAM-D17) and the Bech-Rafaelsen Melancholia Scale (MES) as indices of external validity.
Study 1: Martiny et al. 
A randomised, double-blind trial with bright light therapy versus sham light therapy as adjunct treatment to sertraline in non-seasonal major depression. In total, 102 patients with DSM-IV major depression, as verified by the Mini International Neuropsychiatric Interview (MINI) , were included. The planned duration of this trial was 9 weeks (with 5 weeks of the adjunct treatment and a follow-up four weeks later); in total, therefore, seven rating occasions to be analysed.
Study 2: Straaso et al. 
A randomised, double-blind controlled dose-remission study with pulsating electromagnetic fields as augmentation in therapy-resistant depression. In total, 65 patients with DSM-IV major depression, as verified by the MINI  were included. The planned duration of this trial was 9 weeks (with 8 weeks of pulsating electromagnetic fields therapy as augmentation and a follow-up one week later). In order to balance with the Study 1 ratings we have focused on the first five weeks and the last week, therefore in total seven rating occasions were analysed.
The study was carried out in accordance with the Declarations of Helsinki and the European Union directive of Good Clinical Practice. The study was approved by the Danish Health and Medicines Authority (2013030959) and the Committee on Biomedical Research Ethics (H-1-2010-031) and was reported to the Danish Data Protection Agency (PSV-2010-2). The trial was registered at ClinicalTrials.gov (ID NCT01353092). Patients were given information as requested by the Biomedical Research Ethics, and all patients signed an informed consent.
In the present analysis we have focussed on the following clinician-administrated rating scales:
The Hamilton Depression Scale (HAM-D17) in combination with the Melancholia Scale (MES) with a scoring sheet [16, 17] in which a Visual Analogue Scale for Depression Severity (VAS) is placed at the bottom as a horizontal line from 0 (no depression) to 100 mm (extreme depression). The interviewer is asked to score the VAS before completing the HAM-D17 and MES. The LEAD procedure (Longitudinal Expert Assessment of All Data) was thus used to make the global severity assessment of depressive states taking into account all available data over the past three days.
As discussed elsewhere  the horizontal version (yard stick-line) with descriptive cues at each end and 100 mm in between is generally preferred.
The LEAD principle was used to clinically validate the HAM-D17  which resulted in that six of the Hamilton items (depressed mood, guilt feelings, work and interests, psychomotor retardation, psychic anxiety, and general somatics (fatigability)), HAM-D6, were found to be most valid when associated with experienced psychiatrists’ global assessment of depression severity. The Bech-Rafaelsen Melancholia Scale (MES) was developed to capture the six HAM-D6 core items with reference to the Cronholm-Ottosson Depression Scale . For a review of the MES, see .
The three depression symptom rating scales (HAM-D17, HAM-D6, MES) were rated on a weekly basis by KM and ML, as was the VAS, using the time frame of the past three days for the VAS as well. The MDI was also completed each week by the patients. The clinicians (KM, ML) had no access to the MDI scorings. The inter-rater reliability of KM and ML as Danish University Antidepressant Group (DUAG) raters has been found acceptable with intraclass coefficients of 0.89 (HAM-D6), 0.93 (HAM-D17) and 0.91 (MES) [Martiny et al.: Relapse prevention in major depressive disorder: A four-arm randomised 6-month double-blind comparison of three fixed dosages of escitalopram and a fixed dose of nortriptyline in patients successfully treated with acute electroconvulsive treatment (DUAG-7) – Submitted 2015].
The Major Depression Inventory (MDI)
In the studies analysed in this report the time frame of the MDI was the past week and not the conventional two weeks, due to the fact that the MDI was used at weekly rating sessions in the two trials.
We used the SAS statistical package (version 9.0.0, 2002) both for the proportion of variance of the dependent variable (VAS) that is accounted for by the independent variable (MDI) within a regression analysis using R2 > 0.50 as goodness of fit  and for the intercorrelations between the depression scales in terms of Pearson coefficients . The weighted Kappa was used when testing the corresponding cut-off points between VAS and MDI .
Age, gender, and HAM-D17 baseline mean score in Study 1 and Study 2
Martiny et al. 2005 
Straaso et al. 2014
All included patients
N = 102
Patients with complete ratings at the seven rating weeks
N = 70
All included patients
N = 65
Patients with complete ratings at the seven rating weeks
N = 48
Age, years, mean (sd)
Gender % females
Baseline HAM-D17, mean (sd)
In study 1 a total of 70 patients had complete scorings on all the included weeks. In study 2 a total of 48 patients had complete scorings. Thus 118 patients, or 70 % of the 150 patients included in the two studies, were analysed.
Pearson inter-correlation for the MDI at the various weeks of treatment (N = 118)
(N = 826)
When using the MDI cut-off scores of 0–20, 21–25, and >25 versus VAS cut-off scores of 0–40, 41–50, and > 50, the distribution of the 826 observations was not random (weighted Kappa was 0.49, P < 0.001).
When using the conventional HAM-D17 cut-off score of 18 for major depression and the MDI cut-off score of > 25, we found that within the 826 observations (Table 2) the percentage convergence of MDI was 156 out of 195 observations with HAM-D17, or 80.0 %, i.e. an acceptable convergence, but of moderate degree.
Concerning the MDI algorithm for DSM-IV major depression or ICD-10 depression, we used the MINI diagnoses at baseline, excluding the observations with low HAM-D17 scores between 13 and 18 (N = 97). The MDI algorithm for DSM-IV depression identified 72 of the 97 patients, or 74.2 %. The MDI algorithm for ICD-10 depression identified 76 of the 97 patients, or 78.3 %.
In the data set analysed in this report the MDI was used as an outcome scale at the weekly ratings during a planned treatment period of nine weeks covering seven rating occasions. In this situation the MDI time frame was the past week and not the past two weeks as conventionally applied when the MDI is included as a diagnostic tool with reference to DSM-IV or ICD-10.
Using the clinician-rated VAS depression severity scale, the conventional cut-off standardization for no or doubtful depression, and for mild, moderate and severe depression was confirmed.
However, when pooling all assessments (N = 826), we actually introduce a mixture of both inter-individual differences and intra-individual changes as the patients are included at the various rating occasions. On the other hand, this mixed effects model approach has had a very slight influence in our analysis.
The reason for the moderate Pearson coefficients at the baseline ratings is that the score range on the various scales at that point in time is rather limited because the patients had to be in a depressive state and in need of therapy at inclusion in the two studies [1, 2].
The MDI cut-off score of >25 for major depression had a percentage convergence of 80 % with the HAM-D17 score of >18. The MDI cut-off point of > 25 has been found acceptable both in a sample of psychiatric outpatients with affective disorders  and in a general population sample when compared to patients with a first episode of psychotic depression followed up over 6 years .
A self-rating scale rather similar to the MDI is the Patient Health Questionnaire (PHQ-9) which was originally developed to screen for depression in primary care . The PHQ-9 is defined by the DSM-IV symptoms of depression and thus not designed for ICD-10 depression. However, the quantifier of the individual items differs from the MDI. Zimmerman  has evaluated the role of the PHQ-9 in connection with the need for a DSM-5 self-rating questionnaire to measure the dimensional approach to major depression. Zimmerman  has in this respect shown that the standardization of the PHQ-9 is not based on empirical studies, and that the conventionally used cut-off score overestimates the prevalence of depression when using the Hamilton Depression Scales as index of validity. Moreover, an analysis using the item response theory formulated by Rasch, Forkmann et al.  showed that the summed total score is not a sufficient statistic as a measure of depression severity. This is a conditio sine qua non for using the total score as cut-off index in the diagnosis of major depression. As recommended by Forkmann et al.  the diagnostic algorithm for DSM-IV major depression should be used in connection with the PHQ-9. The DSM-5 major depression diagnosis has maintained the same symptom universe and the same diagnostic algorithm as the DSM-IV. In this respect the recommendation put forward by Forkman et al.  is still valid for PHQ-9 in the DSM-5 context as is the MDI for the DSM-5 major depression diagnosis. Furthermore, the MDI has been accepted by the Rasch model  as a unidimensional scale for depression severity, which is the background for the standardization analysis performed in this report.
A limitation of this analysis is that we have used the time frame covering the past week and not the conventional frame of two weeks. On the other hand we have focused on the standardization of the MDI when used as a depression severity measure rather than when used for diagnostic properties. Another limitation is that completed data for all the ratings was not available for all the patients included in the two trials under examination. On the other hand, a coverage of 70 % as obtained in this analysis is acceptable in clinical trials of depression .
The clinical validity of the MDI as a unidimensional depression severity scale has been found acceptable using the global clinical VAS scale performed by experienced clinicians as index of validity. The conventional standardization of the MDI with cut-off scores for no, mild, moderate, and severe depression has been found adequate.
The Martiny et al.  study was funded by The Danish Medical Research Council, Eastern Region Research Foundation, Merchant L.F. Foght’s Foundation, Johannes M. Klein and Wife’s Memorial Foundation, The Tvergaard Foundation, The Danish Psychiatric Association, The Olga Bryde Nielsen Foundation,The A.P. Møller and Chastine Mc-Kinney Møller Foundation, The Region 3 foundation and The Frederiksborg General Hospital Research Grant.
The Straaso et al.  study was financially supported by a grant from the Lundbeck Foundation (Grant R54-A5567).
- Bech P, Wermuth L. Applicability and validity of the Major Depression Inventory in patients with Parkinson's Disease. Nord J Psychiatry. 1998;52:305–309.Google Scholar
- Bech P. Clinical psychometrics. Oxford: Wiley Blackwell; 2012.Google Scholar
- American Psychiatric Association. The Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV). Washington D.C.: American Psychiatric Association; 1994.Google Scholar
- World Health Organization: International Statistical Classification of Diseases and Related Health Problems, 10th Revision (ICD-10). Diagnostic Criteria for Research. Geneva: World Health Organization; 1993.Google Scholar
- Olsen LR, Jensen DV, Noerholm V, Martiny K, Bech P. The internal and external validity of the Major Depression Inventory in measuring severity of depressive states. Psychol Med. 2003;33:351–356.Google Scholar
- Zung WW. A self-rating depression scale. Arch Gen Psychiatry. 1965;12:63-70.Google Scholar
- Beck AT, Ward CH, Mendelson M, Mock J, Erbaugh J. An inventory for measuring depression. Arch Gen Psychiatry. 1961;4:561–571.Google Scholar
- Konstantinidis A, Martiny K, Bech P, Kasper S. A comparison of the Major Depression Inventory (MDI) and the Beck Depression Inventory (BDI) in severely depressed patients. Int J Psychiatry Clin Pract Mar. 2011;15:56-61.Google Scholar
- Kroenke K, Spitzer RL. <br />The PHQ-9: a new depression diagnostic and severity measure. Psychiatric Annals. 2002;32:509–521.Google Scholar
- Mokken RJ. Theory and practice of scale analysis. Berlin: Mouton; 1971.Google Scholar
- Rasch G. Probalistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research; 1960 (Reprinted Chicago University Press; 1980).Google Scholar
- Spitzer RL. Psychiatric diagnosis: are clinicians still necessary? Compr Psychiatry. 1983;24:399–411.Google Scholar
- Bech P, Gram LF, Dein E, Jacobsen O, Vitger J, Bolwig TG. Quantitative rating of depressive states. Acta Psychiatr Scand. 1975;51:161–170.Google Scholar
- Maier W. The Hamilton Depression Scale and its alternatives: A comparison of their reliability and validity. In: The Hamilton Scales. Edited by Bech P, Coppen A. Berlin: Springer Verlag; 1990:64–71.Google Scholar
- Montgomery SA, Asberg M. A new depression scale designed to be sensitive to change. Br J Psychiatry. 1979;134:382–389.Google Scholar
- Bech P, Gram LF, Kragh-Sorensen P, Reisby N. DUAG: Standardized assessment scales and effectiveness of antidepressants. Nord J Psychiatry. 1988;42:511–515.Google Scholar
- Bech P. Rating scales for psychopathology, health status and quality of life. A compendium on documentation in accordance with the DSM-III-R and WHO systems. Berlin: Springer; 1993.Google Scholar
- Martiny K, Lunde M, Unden M, Dam H, Bech P. Adjunctive bright light in non-seasonal major depression: Results from clinician-rated depression scales. Acta Psychiatr Scand. 2005;112:117–125.Google Scholar
- Sheehan DV, Lecrubier Y, Sheehan KH, Amorim P, Janavs J, Weiller E, et al. The Mini-International Neuropsychiatric Interview (M.I.N.I.): the development and validation of a structured diagnostic psychiatric interview for DSM-IV and ICD-10. J Clin Psychiatry. 1998;59 Suppl 20:22–33. quiz 34–57.Google Scholar
- Straaso B, Lauritzen L, Lunde M, Vinberg M, Lindberg L, Larsen ER, et al. Dose-remission of pulsating electromagnetic fields as augmentation in therapy-resistant depression: a randomized, double-blind controlled study. Acta Neuropsychiatr. 2014;26:272–279.Google Scholar
- Bech P. The Cronholm-Ottosson Depression Scale: the first depression scale designed to rate changes during treatment. Acta Psychiatr Scand. 1991;84:439–445.Google Scholar
- Bech P. The Bech-Rafaelsen Melancholia Scale (MES) in clinical trials of therapies in depressive disorders: a 20-year review of its use as outcome measure. Acta Psychiatr Scand. 2002;106:252–264.Google Scholar
- Cumming G. Understanding the new statistics: Effect Sizes, confidance intervals, and metaanalysis. London: Routledge; 2012.Google Scholar
- Siegel S. Nonparametric statistics for the behavioural sciences. New York: McGraw Hill; 1956.Google Scholar
- Cohen J. Weighted Kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychol Bulletin. 1968;70:213–220.Google Scholar
- Cuijpers P, Dekker J, Noteboom A, Smits N, Peen J. Sensitivity and specificity of the Major Depression Inventory in outpatients. BMC Psychiatry. 2007;7:Art 39–6.Google Scholar
- Forsell Y, Levander S, Cullberg J. Psychosocial correlates with depressive symptoms six years after a first episode of psychosis as compared with findings from a general population sample. BMC Psychiatry. 2004;4:Art 29–5.View ArticleGoogle Scholar
- Zimmerman M. Symptom severity and guideline-based treatment recommendations for depressed patients: Implications of DSM-5’s potential recommendation of the PHQ-9 as the measure of choice for depression severity. Psychother Psychosom. 2012;81:329–32.View ArticlePubMedGoogle Scholar
- Forkmann T, Gauggel S, Spangenberg L, Brahler E, Glaesmer H. Dimensional assessment of depressive severity in the elderly general population: psychometric evaluation of the PHQ-9 using Rasch Analysis. J Affect Disord. 2013;148:323–30.View ArticlePubMedGoogle Scholar
- Angst J, Bech P, Boyer P, Bruinvels J. Consensus conference on the methodology of clinical trials of antidepressants, Zurich, March 1988: Report of the Consensus Committee. Pharmacopsychiatry. 1989;22:3–7.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.