- Research article
- Open Access
- Open Peer Review
Psychometric evaluation of the Major Depression Inventory (MDI) as depression severity scale using the LEAD (Longitudinal Expert Assessment of All Data) as index of validity
BMC Psychiatryvolume 15, Article number: 190 (2015)
The Major Depression Inventory (MDI) was developed to cover the universe of depressive symptoms in DSM-IV major depression as well as in ICD-10 mild, moderate, and severe depression. The objective of this study was to evaluate the standardization of the MDI as a depression severity scale using the Visual Analogue Scale (VAS) as index of external validity in accordance with the LEAD approach (Longitudinal Expert Assessment of All Data).
We used data from two previously published studies in which the patients had a MINI Neuropsychiatric Interview verified diagnosis of DSM-IV major depression. The conventional VAS scores for no, mild, moderate, and severe depression were used for the standardization of the MDI.
The inter-correlation for the MDI with the clinician ratings (VAS, MES, HAM-D17 and HAM-D6) increased over the rating weeks in terms of Pearson coefficients. After nine weeks of therapy the coefficient ranged from 0.74 to 0.83.
Using the clinician-rated VAS depression severity scale, the conventional MDI cut-off scores for no or doubtful depression, and for mild, moderate and severe depression were confirmed.
Using the VAS as index of external, clinical validity, the standardization of the MDI as a measure of depression severity was accepted, with an MDI cut-off score of 21 for mild depression, 26 for moderate depression severity, and 31 for severe depression.
Martiny et al. Acta Psychiatr Scand 112:117-25, 2005: None – due to trial commencement date.
Straaso et al. Acta Neuropsychiatr 26:272-9; 2014: ClinicalTrials.gov ID NCT01353092.
The Major Depression Inventory (MDI) was developed [1, 2] to cover the universe of depressive symptoms in DSM-IV major depression  and in ICD-10 depression  (mild, moderate, severe). Consequently the time frame (window) for the MDI is the past two weeks to accord with DSM-IV and ICD-10.
The MDI can be used as a diagnostic scale by following the algorithms in accordance with DSM-IV or ICD-10. Using as index of diagnostic validity the Schedules for Clinical Assessment in Neuropsychiatry (SCAN)  administered by experienced psychiatrists, we obtained a sensitivity of 90 % and a specificity of 82 % for DSM-IV major depression .
Via its summed total score the MDI can also be a measure of depression severity analogue to the Zung Self-rating Depression Scale (Zung-SDS, ) or the Beck Depression Inventory (BDI, ). However, we have previously shown that the MDI is superior to the Zung-SDS ( and to the BDI . Another widely used depression questionnaire, the Patient Health Questionnaire (PHQ-9) , was developed with reference to DSM-IV. However, the PHQ-9 was especially constructed to capture the diagnosis of major depression, not to be a measure of depression severity like the BDI. In contrast, the MDI actually fulfils both Mokken’s non-parametric item response theory model  and Rasch’s one-parametric model  as shown by Olsen et al.  and can thus be used as a unidimensional depression severity scale. However, we still need to confirm the conventional cut-off scores of MDI, such as that of >25 for major depression.
The clinical validity of a scale must be evaluated by the use of an independent global severity assessment performed by an experienced clinician. Spitzer  called this procedure the LEAD (Longitudinal Expert Assessment of All Data) approach. By “expert” Spitzer  was referring to a clinician who had demonstrated his or her competence to make this assessment based on a thorough clinical interview taking all available data into account. This LEAD approach was used in our validation study of the Hamilton Depression Scale  and was also used by Maier  when he validated the Hamilton Scale (HAM-D17), the Montgomery Åsberg Depression Rating Scale (MADRS)  and the Bech-Rafaelsen Melancholia Scale (MES) . In the analysis to be reported here we used a Visual Analogue Scale (VAS) from 0 to 100 mm for the LEAD assessment of depression severity [16, 17].
The objective of this study was to evaluate the MDI as a depression severity scale using both a global VAS assessment as well as the Hamilton Depression Scale (HAM-D17) and the Bech-Rafaelsen Melancholia Scale (MES) as indices of external validity.
Patients: we have used data from two previously published studies in which weekly ratings were performed:
Study 1: Martiny et al. 
A randomised, double-blind trial with bright light therapy versus sham light therapy as adjunct treatment to sertraline in non-seasonal major depression. In total, 102 patients with DSM-IV major depression, as verified by the Mini International Neuropsychiatric Interview (MINI) , were included. The planned duration of this trial was 9 weeks (with 5 weeks of the adjunct treatment and a follow-up four weeks later); in total, therefore, seven rating occasions to be analysed.
Ethics: The study was carried out according to the declaration of Helsinki and the ICH-GCP guidelines (International Conference on Harmonisation, Good Clinical Practice). The study was approved by the Ethical Committee for the Counties of Bornholm, Frederiksborg, Roskilde and Storstrøm and by the Danish Central Data Register. Patients were given information as requested by the Biomedical Research Ethics, and all patients signed an informed consent.
Study 2: Straaso et al. 
A randomised, double-blind controlled dose-remission study with pulsating electromagnetic fields as augmentation in therapy-resistant depression. In total, 65 patients with DSM-IV major depression, as verified by the MINI  were included. The planned duration of this trial was 9 weeks (with 8 weeks of pulsating electromagnetic fields therapy as augmentation and a follow-up one week later). In order to balance with the Study 1 ratings we have focused on the first five weeks and the last week, therefore in total seven rating occasions were analysed.
The study was carried out in accordance with the Declarations of Helsinki and the European Union directive of Good Clinical Practice. The study was approved by the Danish Health and Medicines Authority (2013030959) and the Committee on Biomedical Research Ethics (H-1-2010-031) and was reported to the Danish Data Protection Agency (PSV-2010-2). The trial was registered at ClinicalTrials.gov (ID NCT01353092). Patients were given information as requested by the Biomedical Research Ethics, and all patients signed an informed consent.
In the present analysis we have focussed on the following clinician-administrated rating scales:
The Hamilton Depression Scale (HAM-D17) in combination with the Melancholia Scale (MES) with a scoring sheet [16, 17] in which a Visual Analogue Scale for Depression Severity (VAS) is placed at the bottom as a horizontal line from 0 (no depression) to 100 mm (extreme depression). The interviewer is asked to score the VAS before completing the HAM-D17 and MES. The LEAD procedure (Longitudinal Expert Assessment of All Data) was thus used to make the global severity assessment of depressive states taking into account all available data over the past three days.
As discussed elsewhere  the horizontal version (yard stick-line) with descriptive cues at each end and 100 mm in between is generally preferred.
The LEAD principle was used to clinically validate the HAM-D17  which resulted in that six of the Hamilton items (depressed mood, guilt feelings, work and interests, psychomotor retardation, psychic anxiety, and general somatics (fatigability)), HAM-D6, were found to be most valid when associated with experienced psychiatrists’ global assessment of depression severity. The Bech-Rafaelsen Melancholia Scale (MES) was developed to capture the six HAM-D6 core items with reference to the Cronholm-Ottosson Depression Scale . For a review of the MES, see .
The three depression symptom rating scales (HAM-D17, HAM-D6, MES) were rated on a weekly basis by KM and ML, as was the VAS, using the time frame of the past three days for the VAS as well. The MDI was also completed each week by the patients. The clinicians (KM, ML) had no access to the MDI scorings. The inter-rater reliability of KM and ML as Danish University Antidepressant Group (DUAG) raters has been found acceptable with intraclass coefficients of 0.89 (HAM-D6), 0.93 (HAM-D17) and 0.91 (MES) [Martiny et al.: Relapse prevention in major depressive disorder: A four-arm randomised 6-month double-blind comparison of three fixed dosages of escitalopram and a fixed dose of nortriptyline in patients successfully treated with acute electroconvulsive treatment (DUAG-7) – Submitted 2015].
The Major Depression Inventory (MDI)
In the studies analysed in this report the time frame of the MDI was the past week and not the conventional two weeks, due to the fact that the MDI was used at weekly rating sessions in the two trials.
Figure 1a shows the Major Depression Inventory (MDI) with the time frame of one week while Fig. 1b shows how this questionnaire can be used both as a depression severity scale by its total scale score from 0 = no depression to 50 = extreme depression, and as a diagnostic scale following the algorithm of ICD-10 depression  or of DSM-IV major depression . When using the MDI as a depression severity scale by its total score we have, as shown in Fig. 1b, suggested the cut-off scores for no, doubtful, mild, moderate, and severe depression.
We used the SAS statistical package (version 9.0.0, 2002) both for the proportion of variance of the dependent variable (VAS) that is accounted for by the independent variable (MDI) within a regression analysis using R2 > 0.50 as goodness of fit  and for the intercorrelations between the depression scales in terms of Pearson coefficients . The weighted Kappa was used when testing the corresponding cut-off points between VAS and MDI .
Table 1 shows the age, gender, and HAM-D17 baseline mean score in study 1 and study 2.
In study 1 a total of 70 patients had complete scorings on all the included weeks. In study 2 a total of 48 patients had complete scorings. Thus 118 patients, or 70 % of the 150 patients included in the two studies, were analysed.
Table 2 shows the inter-correlation of the MDI total score with VAS, MES, HAM-D17, and HAM-D6 from baseline to week 8 for the 118 patients, i.e. seven rating weeks in total. At the bottom in Table 2 all seven rating weeks with 826 observations are also shown. The association between the MDI and VAS in terms of Pearson coefficients was generally lower than the association between the MDI and MES, HAM-D17 and HAM-D6. After two weeks of therapy the four clinician-administered scales obtained a Pearson coefficient of 0.60 or higher when correlated to MDI. Taking all weeks into consideration (N = 826), a Spearman coefficient of 0.70 or higher was obtained (Table 2).
Figure 2 shows the regression analysis using the VAS scores to arrive at the corresponding MDI scores by the formula MDI = 0.49 x VAS + 2.40 (N = 826). The R2 was 0.55, indicating an acceptable goodness of fit. As indicated at the abscissa in Fig. 2 a VAS score of 50 is the average cut-off in moderate or major depression. Using the regression formula, a VAS score of 50 corresponds to a MDI score of 26.9 which is rather similar to the conventional MDI cut-off score of ≥ 26 (Fig. 1b). Using the regression formula, a VAS score of 60 corresponds to a MDI score of 31.8, which is rather similar to the conventional MDI cut-off score of ≥ 31 for severe depression (Fig. 1b). Similarly, a VAS score of 40 corresponds to a MDI score of 22.0, this is quite close to the conventional MDI cut-off ≥ 21 for mild depression (Fig. 1b). Finally, a VAS score of 30 corresponds to a MDI score of 17.1, this is rather close to the conventional MDI cut-off score of < 16 for no or doubtful depression (Fig. 1b).
When using the MDI cut-off scores of 0–20, 21–25, and >25 versus VAS cut-off scores of 0–40, 41–50, and > 50, the distribution of the 826 observations was not random (weighted Kappa was 0.49, P < 0.001).
When using the conventional HAM-D17 cut-off score of 18 for major depression and the MDI cut-off score of > 25, we found that within the 826 observations (Table 2) the percentage convergence of MDI was 156 out of 195 observations with HAM-D17, or 80.0 %, i.e. an acceptable convergence, but of moderate degree.
Concerning the MDI algorithm for DSM-IV major depression or ICD-10 depression, we used the MINI diagnoses at baseline, excluding the observations with low HAM-D17 scores between 13 and 18 (N = 97). The MDI algorithm for DSM-IV depression identified 72 of the 97 patients, or 74.2 %. The MDI algorithm for ICD-10 depression identified 76 of the 97 patients, or 78.3 %.
In the data set analysed in this report the MDI was used as an outcome scale at the weekly ratings during a planned treatment period of nine weeks covering seven rating occasions. In this situation the MDI time frame was the past week and not the past two weeks as conventionally applied when the MDI is included as a diagnostic tool with reference to DSM-IV or ICD-10.
Using the clinician-rated VAS depression severity scale, the conventional cut-off standardization for no or doubtful depression, and for mild, moderate and severe depression was confirmed.
However, when pooling all assessments (N = 826), we actually introduce a mixture of both inter-individual differences and intra-individual changes as the patients are included at the various rating occasions. On the other hand, this mixed effects model approach has had a very slight influence in our analysis.
The reason for the moderate Pearson coefficients at the baseline ratings is that the score range on the various scales at that point in time is rather limited because the patients had to be in a depressive state and in need of therapy at inclusion in the two studies [1, 2].
The MDI cut-off score of >25 for major depression had a percentage convergence of 80 % with the HAM-D17 score of >18. The MDI cut-off point of > 25 has been found acceptable both in a sample of psychiatric outpatients with affective disorders  and in a general population sample when compared to patients with a first episode of psychotic depression followed up over 6 years .
A self-rating scale rather similar to the MDI is the Patient Health Questionnaire (PHQ-9) which was originally developed to screen for depression in primary care . The PHQ-9 is defined by the DSM-IV symptoms of depression and thus not designed for ICD-10 depression. However, the quantifier of the individual items differs from the MDI. Zimmerman  has evaluated the role of the PHQ-9 in connection with the need for a DSM-5 self-rating questionnaire to measure the dimensional approach to major depression. Zimmerman  has in this respect shown that the standardization of the PHQ-9 is not based on empirical studies, and that the conventionally used cut-off score overestimates the prevalence of depression when using the Hamilton Depression Scales as index of validity. Moreover, an analysis using the item response theory formulated by Rasch, Forkmann et al.  showed that the summed total score is not a sufficient statistic as a measure of depression severity. This is a conditio sine qua non for using the total score as cut-off index in the diagnosis of major depression. As recommended by Forkmann et al.  the diagnostic algorithm for DSM-IV major depression should be used in connection with the PHQ-9. The DSM-5 major depression diagnosis has maintained the same symptom universe and the same diagnostic algorithm as the DSM-IV. In this respect the recommendation put forward by Forkman et al.  is still valid for PHQ-9 in the DSM-5 context as is the MDI for the DSM-5 major depression diagnosis. Furthermore, the MDI has been accepted by the Rasch model  as a unidimensional scale for depression severity, which is the background for the standardization analysis performed in this report.
A limitation of this analysis is that we have used the time frame covering the past week and not the conventional frame of two weeks. On the other hand we have focused on the standardization of the MDI when used as a depression severity measure rather than when used for diagnostic properties. Another limitation is that completed data for all the ratings was not available for all the patients included in the two trials under examination. On the other hand, a coverage of 70 % as obtained in this analysis is acceptable in clinical trials of depression .
The clinical validity of the MDI as a unidimensional depression severity scale has been found acceptable using the global clinical VAS scale performed by experienced clinicians as index of validity. The conventional standardization of the MDI with cut-off scores for no, mild, moderate, and severe depression has been found adequate.
Bech P, Wermuth L. Applicability and validity of the Major Depression Inventory in patients with Parkinson's Disease. Nord J Psychiatry. 1998;52:305–309.
Bech P. Clinical psychometrics. Oxford: Wiley Blackwell; 2012.
American Psychiatric Association. The Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV). Washington D.C.: American Psychiatric Association; 1994.
World Health Organization: International Statistical Classification of Diseases and Related Health Problems, 10th Revision (ICD-10). Diagnostic Criteria for Research. Geneva: World Health Organization; 1993.
Olsen LR, Jensen DV, Noerholm V, Martiny K, Bech P. The internal and external validity of the Major Depression Inventory in measuring severity of depressive states. Psychol Med. 2003;33:351–356.
Zung WW. A self-rating depression scale. Arch Gen Psychiatry. 1965;12:63-70.
Beck AT, Ward CH, Mendelson M, Mock J, Erbaugh J. An inventory for measuring depression. Arch Gen Psychiatry. 1961;4:561–571.
Konstantinidis A, Martiny K, Bech P, Kasper S. A comparison of the Major Depression Inventory (MDI) and the Beck Depression Inventory (BDI) in severely depressed patients. Int J Psychiatry Clin Pract Mar. 2011;15:56-61.
Kroenke K, Spitzer RL. <br />The PHQ-9: a new depression diagnostic and severity measure. Psychiatric Annals. 2002;32:509–521.
Mokken RJ. Theory and practice of scale analysis. Berlin: Mouton; 1971.
Rasch G. Probalistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research; 1960 (Reprinted Chicago University Press; 1980).
Spitzer RL. Psychiatric diagnosis: are clinicians still necessary? Compr Psychiatry. 1983;24:399–411.
Bech P, Gram LF, Dein E, Jacobsen O, Vitger J, Bolwig TG. Quantitative rating of depressive states. Acta Psychiatr Scand. 1975;51:161–170.
Maier W. The Hamilton Depression Scale and its alternatives: A comparison of their reliability and validity. In: The Hamilton Scales. Edited by Bech P, Coppen A. Berlin: Springer Verlag; 1990:64–71.
Montgomery SA, Asberg M. A new depression scale designed to be sensitive to change. Br J Psychiatry. 1979;134:382–389.
Bech P, Gram LF, Kragh-Sorensen P, Reisby N. DUAG: Standardized assessment scales and effectiveness of antidepressants. Nord J Psychiatry. 1988;42:511–515.
Bech P. Rating scales for psychopathology, health status and quality of life. A compendium on documentation in accordance with the DSM-III-R and WHO systems. Berlin: Springer; 1993.
Martiny K, Lunde M, Unden M, Dam H, Bech P. Adjunctive bright light in non-seasonal major depression: Results from clinician-rated depression scales. Acta Psychiatr Scand. 2005;112:117–125.
Sheehan DV, Lecrubier Y, Sheehan KH, Amorim P, Janavs J, Weiller E, et al. The Mini-International Neuropsychiatric Interview (M.I.N.I.): the development and validation of a structured diagnostic psychiatric interview for DSM-IV and ICD-10. J Clin Psychiatry. 1998;59 Suppl 20:22–33. quiz 34–57.
Straaso B, Lauritzen L, Lunde M, Vinberg M, Lindberg L, Larsen ER, et al. Dose-remission of pulsating electromagnetic fields as augmentation in therapy-resistant depression: a randomized, double-blind controlled study. Acta Neuropsychiatr. 2014;26:272–279.
Bech P. The Cronholm-Ottosson Depression Scale: the first depression scale designed to rate changes during treatment. Acta Psychiatr Scand. 1991;84:439–445.
Bech P. The Bech-Rafaelsen Melancholia Scale (MES) in clinical trials of therapies in depressive disorders: a 20-year review of its use as outcome measure. Acta Psychiatr Scand. 2002;106:252–264.
Cumming G. Understanding the new statistics: Effect Sizes, confidance intervals, and metaanalysis. London: Routledge; 2012.
Siegel S. Nonparametric statistics for the behavioural sciences. New York: McGraw Hill; 1956.
Cohen J. Weighted Kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychol Bulletin. 1968;70:213–220.
Cuijpers P, Dekker J, Noteboom A, Smits N, Peen J. Sensitivity and specificity of the Major Depression Inventory in outpatients. BMC Psychiatry. 2007;7:Art 39–6.
Forsell Y, Levander S, Cullberg J. Psychosocial correlates with depressive symptoms six years after a first episode of psychosis as compared with findings from a general population sample. BMC Psychiatry. 2004;4:Art 29–5.
Zimmerman M. Symptom severity and guideline-based treatment recommendations for depressed patients: Implications of DSM-5’s potential recommendation of the PHQ-9 as the measure of choice for depression severity. Psychother Psychosom. 2012;81:329–32.
Forkmann T, Gauggel S, Spangenberg L, Brahler E, Glaesmer H. Dimensional assessment of depressive severity in the elderly general population: psychometric evaluation of the PHQ-9 using Rasch Analysis. J Affect Disord. 2013;148:323–30.
Angst J, Bech P, Boyer P, Bruinvels J. Consensus conference on the methodology of clinical trials of antidepressants, Zurich, March 1988: Report of the Consensus Committee. Pharmacopsychiatry. 1989;22:3–7.
The Martiny et al.  study was funded by The Danish Medical Research Council, Eastern Region Research Foundation, Merchant L.F. Foght’s Foundation, Johannes M. Klein and Wife’s Memorial Foundation, The Tvergaard Foundation, The Danish Psychiatric Association, The Olga Bryde Nielsen Foundation,The A.P. Møller and Chastine Mc-Kinney Møller Foundation, The Region 3 foundation and The Frederiksborg General Hospital Research Grant.
The Straaso et al.  study was financially supported by a grant from the Lundbeck Foundation (Grant R54-A5567).
The authors declare that they have no competing interests.
PB: conception and design of present analysis of study data, draft of manuscript, and analysis and interpretation of data. NT: critical revision of manuscript for important intellectual content. KM: acquisition of data and critical revision of manuscript for important intellectual content. ML: substantial contribution to acquisition of data. SS: critical revision of manuscript for important intellectual content.