Skip to main content

Test-retest reliability of the computer-assisted DIA-X-5 interview for mental disorders

A Correction to this article was published on 09 July 2020

This article has been updated



There is a need of comprehensive standardized diagnostic assessment tools of psychopathology that match recent changes in diagnostic classification systems, such as the 5th edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5). Therefore, the computer-assisted DIA-X-5 was developed and its test-retest reliability was explored. The DIA-X-5 is based on the DIA-X/M-CIDI (Diagnostisches Expertensystem für psychische Störungen/Munich-Composite International Diagnostic Interview) which referred to the 4th edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV).


A convenience sample (N = 60, age: 15–67) was interviewed twice with the computer-assisted DIA-X-5 interview, on average nine days apart, by trained and blinded interviewers. The DIA-X-5 is a standardized instrument for research purposes covering symptoms, syndromes and diagnoses from eleven classes of mental disorders according to the DSM-5 with matching F codes of the 10th edition of the International Classification of Diseases (ICD-10).


Kappa values ranged from 0.90 for post-traumatic stress disorder to 0.30 for social anxiety disorder. For age of onset and age of recency, test-retest reliability as measured by intra-class correlation was satisfying with values above 0.90 for most disorders.


Test-retest reliability of the DIA-X-5 syndromes and diagnoses were comparable to those of previous DSM-IV/DIA-X diagnoses for most disorders. Due to low case numbers for some diagnoses, further research in larger samples is required.

Peer Review reports


The development of structured and standardized diagnostic interviews has considerably improved the reliability and validity of the assessment of mental disorders even when conducted by non-clinical (lay) interviewers [1, 2]. Within the World Health Organization (WHO) -Composite International Diagnostic Interview (CIDI; [2]) platform the DIA-X/M-CIDI (Diagnostisches Expertensystem für psychische Störungen/Munich-Composite International Diagnostic Interview; [3]) was found to possess good test-retest reliability [4] and validity [5]. It has been employed in multiple epidemiologic studies assessing mental disorders of the 4th edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV; [6]) and 10th edition of the International Classification of Diseases (ICD-10; [7]) in many countries [8,9,10,11].

With the publication of the fifth revision of the DSM [12], diagnostic symptom, duration, and severity criteria for several mental disorders have changed (e.g., posttraumatic stress disorder (PTSD), eating disorders). For some disorders, the classificatory position was changed (e.g. separation anxiety disorder being moved to the anxiety disorder category). For other disorders, the overall conception of the construct changed (e.g., somatoform disorders have been dismissed in favor of somatic symptom and related disorders). In addition, “new” mental disorders were defined (e.g., disruptive mood dysregulation disorder) and “specifiers” were introduced (e.g., panic attack specifier which can be added to most disorders; anxious distress specifier in bipolar and depressive disorders).

In response to the aforementioned changes, it has become necessary to provide a revised version of the DIA-X interview incorporating these changes, while maintaining the comparability to previous versions of the instrument and the implicit diagnostic algorithms to provide methodological consistency with previous studies and allow investigations into the effects of criteria changes on prevalence, onset, course and comorbidity findings. This paper presents the modification and extension process as well as test-retest reliability data of the new DIA-X-5.


This section is structured into two major components. Firstly, the development of the DIA-X-5 interview is described, including its diagnostic coverage as well as its outer format. Secondly, the reliability is examined in a retest study for which results are presented in the current manuscript.

Development of the DIA-X-5

The DIA-X/M-CIDI is a well-established standardized clinical interview [3] which was originally developed for clinical epidemiological purposes and served as basis for the DIA-X-5, keeping the form, the rules and conventions unchanged, whenever possible. Changes and additions were proposed, implemented and tested by a panel of experienced clinicians knowledgeable in the revision work of DSM [12] and ICD [7]. In addition, a board of external experts was invited to advise and review changes and additions to particular sections of the instrument.

Content of the DIA-X-5: diagnostic coverage

The DIA-X-5 assesses symptoms, syndromes and diagnoses of eleven major classes of mental disorders for the lifetime and the past 12 month time frame, except for adjustment disorders and premenstrual dysphoric syndrome which are only assessed for the past 12 months. Diagnostic algorithms were reprogrammed to match recent changes in DSM-5 [12] and linked with the corresponding ICD-10 code. A complete list of DIA-X-5 diagnoses and optional modules is provided in Table 1. DSM-IV [6] algorithms are still applicable for most disorders, with exception of anorexia nervosa (amenorrhea item removed) and substance abuse (legal problem item removed). Furthermore, the DIA-X-5 includes screens for several disorders that have been added to DSM-5 ([12]; e.g. disruptive mood dysregulation disorder, premenstrual dysphoric disorder) or are related to the major diagnostic classes assessed by the DIA-X-5 (e.g. hoarding, skin picking, body dysmorphic disorder).

Table 1 The DIA-X-5 diagnoses (DSM-5, with corresponding ICD-10 F code) and optional modules

Structure of the DIA-X-5: layout and outer format

For reasons of consistency the content, format, rules and style of the past DIA-X/M-CIDI were maintained. Previous data have shown that even minimal changes might influence the response behavior of the subjects [13]. Response lists were used in every section for symptoms and other items that are arranged in list format to reduce the risk of misunderstandings and increases the efficiency of the interview [14]. These lists were programmed for presentation on a tablet-computer. Thus, subjects’ responses can be directly recorded electronically and transferred wirelessly to a central data bank which eliminates the risk of coding errors during data entry. Finally, similar to the DIA-X/M-CIDI, time-related questions (e.g., age of onset) were complemented by further probe questions to improve the accuracy of estimates.

More extended changes in the DIA-X-5 include the following:

  1. 1)

    The structure of some disorder sections was modified to improve the interview flow given the changes in diagnostic criteria. First, in the substance use disorder sections (tobacco, alcohol, drugs), the separate assessment of abuse and dependence was omitted in favor of one comprehensive list covering all substance use disorder symptoms (without legal problems which have been omitted from DSM-5). Second, given the considerable changes surrounding somatoform disorders, this section has been changed to include questions assessing symptoms of somatic symptom disorder and omitting the previous probe questions assessing whether each individual symptom is “medically unexplained”.

  2. 2)

    In the section of anxiety disorders, for panic symptoms and panic disorder the order of questions was changed, given the prominent introduction of a “panic attack specifier” in DSM-5 [12], which can be coded virtually with all other diagnoses of mental disorders (except for panic disorder). Thus, the DIA-X-5 first asks for all symptoms of a panic attack before the criteria for panic disorder are enquired.

  3. 3)

    In disorder sections where variation in disorder symptomatology and severity are expected (e.g. mood episodes), the DIA-X-5 lifetime questions first ask for the presence of lifetime symptoms followed by questions regarding the worst lifetime episode; and finally, which of these symptoms have also been present in the past 12 months. In this way, past 12 month diagnoses assure that the full diagnostic criteria are met during the last year. Further, this additional probing allows for the derivation of partial remissions and severity coding for past year diagnoses.

Test-retest-study: sample and procedure

A convenience sample of adolescents and adults (targeting age 14–65 years) was recruited by advertisement and in collaboration with clinical institutions. Participants were invited to participate in two diagnostic face-to-face interview sessions approximately one week apart. Upon arrival at the interview location, participants were informed about the procedures of the study and asked whether they would provide their written informed consent; for participants younger than 18 years of age, written informed assent/consent was obtained from both the adolescent and all legal guardians (e.g., both mother and father). Both interviews were conducted at the research institution (Center for Clinical Epidemiology and Longitudinal Studies at Technische Universität Dresden) or at the university hospital by two independent interviewers, i.e. the second interview was conducted blinded to the results of the first interview. Individuals were instructed to answer the questions in the second interview disregarding their answers during the first interview.

Interviews were conducted by trained clinical interviewers (psychologists as well as psychology and medical students). The 25 interviewers had a mean age of 26 years and 23 were women. Each interviewer had sufficient experience in the computerized version of the DIA-X-5 having had received a standardized multiday interviewer training and underwent a practice and certification process. Participants received payment (30€) for their participation in both interviews.

Data analysis

Diagnostic concordance for each disorder, i.e. agreement of diagnostic results of the test (T1) and the retest interview (T2), was calculated as the relative frequency of individuals with equal disorder classification on both assessment times. The agreement for disorder categories were only calculated if the diagnostic base rate was ≥5 (at least five full cases available for analysis). Diagnostic agreement was also computed for all core DIA-X-5 (stem items) symptom questions. For diagnoses with multiple stem items, at least one of the multiple stem items had to be affirmed to compute agreement between both interviews. Calculated coefficients were Jaccard index (JI [15]), a reversed variant of the Jaccard Index (RJI) and Cohen’s kappa [16]. In the following the theoretical foundations of the coefficients JI, RJI and Cohen’s kappa are explained.

The Jaccard Index (JI [15]) is the number of individuals with a positive disorder diagnosis on both DIA-X-5 assessments divided by the number of individuals with a positive disorder diagnosis on at least one assessment. JI is an estimated lower bound for the probability that a case at one interview will be evaluated to be a case at another interview. A higher JI indicates a higher probability of re-identification of a case at another time point. JI was calculated in the following manner:

$$ JI=\frac{\# individuals\ with\ positive\ diagnosis\ on\ both\ assesments}{\# individuals\ with\ positive\ diagnosis\ on\ at\ least\ on e\ assesment\ } $$

For this study, a reversed Jaccard Index (RJI) was developed. The RJI is the number of individuals with a negative disorder diagnosis on both DIA-X-5 assessments divided by the number of individuals with negative disorder diagnosis on at least one assessment. The RJI is an estimated lower bound for the probability that a non-case at one interview will be evaluated to be a non-case at another interview. A higher RJI indicates a higher probability of re-identification of a non-case at another time point. RJI was calculated in the following manner

$$ RJI=\frac{\# individuals\ with\ negative\ diagnosis\ on\ both\ assesments}{\# individuals\ with\ negative\ diagnosis\ on\ at\ least\ on e\ assesment\ } $$

JI and RJI are easy-to-interpret and symmetric indices which mean that no pre-defined order of assessments is assumed. Thus, non-cases of the first interview (T1) that became cases at the second interview (T2) are counted like non-cases of T2 that were cases at T1, both are equally hampering the indices JI and RJI.

Cohen’s kappa [16] is the most frequently used chance-adjusted measure of agreement. It indicates the amount of difference between observed frequency of agreement and purely by chance expected frequency of agreement. Kappa ranges between − 1 (perfect disagreement) and 1 (perfect agreement), where 0 indicates chance agreement. Kappa has two well-known paradoxes [17]. First, kappa values increase whenever the true frequency of cases in a sample comes closer to 50%, regardless of the actual agreement frequency. In the present study, the frequency of cases is far below 50% for most disorders and as a consequence the reported kappa values would increase with increasing case frequencies. Second, kappa values increase with an increasing ratio of cases to non-cases from T1 to T2, regardless of the actual agreement frequency.

In the present study, for most disorders more cases emerged at T1 than at T2. Thus, also the bias-adjusted kappa (BAK; [17]) was computed. An investigation and arguments towards the use of Kappa can be found e.g. in Shrout et al. [18]. For analyses we included all disorders with at least five cases. However, Cicchetti et al. [19] suggested a case number of at least ten cases for each diagnosis. In our results all diagnoses with less than ten cases in both T1 and T2 are marked with footnotes in the tables.

Associations between time-related measures (age of onset, age of recency, persistence) were calculated as intraclass correlation coefficient (ICC). We used the one-way random-effects ICC [1] for absolute agreement ([20]; see also [21]) that corresponds to the correlation coefficient between the ratings assessed at T1 and T2 within a subject and it is also equal to the ratio of the between-subject variance of the time-related measure to the total sample variance of that time-related measure. Persistence of dysthymia was not enquired with an individual question (number of years affected by the symptoms), but was computed – given its general persistency – using the time span between age of onset and age of recency minus the longest period of time the participant indicated to feel good/OK again. For major depressive disorder with recurrent episodes the persistence corresponds to the number of years in which at least one depressive episode occurred. ICCs were calculated for all subjects who answered questions regarding the age of onset, age of recency, or persistence in both interviews, regardless of their fulfillment of the criteria for any specific diagnosis.

All analyses were conducted using Stata 14.2 [22] and confidence intervals for Cohen’s Kappa were calculated with the command ‘kapci’ [23, 24].


Sample characteristics

The sample consisted of 60 participants, aged 15 to 67 years (M = 26.6 years) and included 44 women (73.3%). Most participants were attending university (58.3%) and were not married (93.3%). The mean time interval between both interviews was nine days, with a range of one to 36 days. For 46.7% of participants, the time interval was 7–9 days, for 81.7% of participants, the time interval was 4–13 days. For more details see Table 2.

Table 2 Sample characteristics (n = 60) and time interval between test and retest

Duration of the DIA-X-5

The average time to complete the DIA-X-5 excluding the sociodemographic section was 125.9 min (range 30.8–405.0 min). The duration of all sections of the interview is listed in Table 3. Of note, the applied DIA-X-5 included - for the purpose of an attached study - additional embedded questionnaires, dimensional scales as well as additional screening modules which increased the duration of most DIA-X-5 sections. With the supplementary questionnaires/ dimensional scales/ screening modules, the longest sections were those for anxiety disorder (M = 32.9 min) and for depressive disorders (M = 16.1 min). Other sections that enquire only single or less prevalent disorders were considerably shorter, such as oppositional-defiant disorder (M = 1.8 min) and intermittend explosive disorder (M = 2.0 min).

Table 3 Duration of the DIA-X-5 interview by section

Diagnostic agreement for DSM-5 disorder categories

BAK values were comparable to kappa values (deviation ≤0.03) and are thus not reported in the following sections.

Test-retest reliability of DSM-5 diagnoses and diagnostic classes is displayed in Table 4.

Table 4 Diagnostic test-retest reliability of DSM-5 diagnostic categories with ≥5 cases for at least one interview (T1 diagnosis from test interview, T2 diagnosis from retest interview)

For most disorders the JI ranged between 0.65 and 0.85. The highest JI, 1.00, was found for anorexia nervosa, second highest JI (0.89) was found for ‘any’ DSM-5 disorder. PTSD and tobacco use disorder had a JI of 0.83. Social anxiety disorder displayed the lowest JI with 0.25, followed by any obsessive compulsive disorder with 0.37.

RJIs for most disorders were 0.80 and above. Anorexia nervosa displayed highest RJI with 1.00, followed by PTSD and cannabis use disorder with 0.98. The category of any anxiety disorder (without panic attack) revealed lowest RJI with 0.69, followed by ‘any’ DSM-5 disorder and any anxiety disorder (without social anxiety disorder) with an RJI of 0.76 each.

Highest agreement with 100.0% was found anorexia nervosa, followed by a 98.3% agreement rate for PTSD and cannabis use disorder. Any DSM-5 disorder had an agreement of 91.7%. The categories of any anxiety disorder (without panic attack) and social anxiety disorder revealed the lowest agreement with 78.3 and 80.0% respectively. However, most diagnostic categories displayed an agreement of 90% and above. Most Cohen’s kappa values ranged between 0.70–0.85, with anorexia nervosa displaying the highest kappa (1.00), followed by PTSD (0.90), tobacco use disorder (0.89) and cannabis use disorder (0.88). The diagnosis of ‘any’ DSM-5 disorder had a kappa of 0.81. Cohens Kappa was lowest for social anxiety disorder (0.29), followed by any obsessive-compulsive disorder (0.51) and any anxiety disorder without panic attack (0.55). For discordant cases it was also tested whether for those disorders with all fulfilled criteria in one interview, at least the respective stem question was endorsed in the other interview. Of the two single disorders with low kappa, the respective stem question was always endorsed for social anxiety disorder. However, for obsessive-compulsive disorder (OCD) two out of three cases did not endorse the stem question in one interview although they fulfilled all criteria for a diagnosis in the other interview.

Diagnostic agreement on core DIA-X-5 symptoms

Diagnostic agreement for the core DIA-X-5 stem items are shown in Table 5. Separation anxiety disorder displayed the lowest JI (0.33), whereas any drug use disorder - illicit drug consume (JI = 0.94) and major depressive disorder (JI = 0.91) displayed the highest JIs. For most disorders, JIs ranged between 0.65 and 0.85. RJI was the highest for any drug use disorder - illicit drug consume (0.98), followed by tobacco use disorder – symptom item (0.95) and conduct disorder (0.94). Medication use – consume item had the lowest RJI with 0.64. For most disorders the RJI was 0.85 and above.

Table 5 Test-retest reliability of the DIA-X-5 core (stem) items

Agreement was highest for any drug use disorder – illicit drug consume (98.3%) and tobacco use disorder – symptom item (96.7%). Medication use – consume item had the lowest agreement rate of 73.3%. For most disorders, agreement was 80% and above.

Kappa values were highest for any drug use disorder – illicit drug consume (0.96) and were also high for tobacco use disorder – symptom item (0.92). Separation anxiety disorder had the lowest kappa value with 0.43, followed by medication use – consume item 0.44. Stem items of most disorders had kappa values between 0.70 and 0.85.

Agreement on time-related questions

ICCs for age of onset, age of recency, and persistence are displayed in Table 6. ICCs could not be computed for manic episodes, attention deficit hyperactivity disorder (ADHD) with predominantly hyperactive/impulsive presentation and disruptive mood dysregulation disorder because of too few cases.

Table 6 Reliability of age of onset, age of recency, and persistence as measured by the intra-class correlation coefficient (ICC)

For most diagnoses, ICCs for age of onset were 0.90 and above. ICC for age of onset was highest for PTSD, panic disorder, single major depressive disorder and antisocial personality disorder with 0.99. ICC for onset was lowest for specific phobia with − 0.74 (situational/other type) and also low for major depressive disorder with recurrent episodes (0.40).

ICCs of age of recency were above 0.90 for most disorders. An ICC of 1.00 was found for recency of obsessive-compulsive disorder (behavior). Also, ICCs were high with 0.99 for tobacco use disorder, anorexia nervosa, binge eating disorder, panic disorder, specific phobia (animal type), single major depressive episode, possible psychotic disorder and ADHD (inattentive presentation). The lowest ICC for recency was shown by specific phobia (situational/other type) with 0.61.

Most disorders revealed ICCs for persistence between 0.55 and 0.85. Persistence reliability was highest for specific phobia (situational/other type) with 0.96, followed by specific phobia (natural environment type) with 0.94. Lowest ICC for persistence was revealed by somatic symptom disorder with 0.16, followed by possible psychotic disorder with 0.20.


The standardized assessment of symptoms, syndromes and diagnoses of mental disorders is essential for estimating the prevalence, onset, and course of mental disorders and determining their risk factors in epidemiologic research. Case definition in clinical and experimental studies also relies on reliable diagnoses. The standardized and fully computerized DIA-X-5 reveals good test-retest reliability for most DSM-5 diagnoses, stem items and time-related information in adolescents and adults.

Test-retest reliability of diagnoses and stem items of the DIA-X-5

Although most of the DIA-X-5 diagnoses showed good to very good test-retest reliability, some diagnoses showed relatively low reliability. For these diagnoses we examined in more detail the response patterns on the level of diagnostic criteria and individual questions.

The summary category of any anxiety disorder (without panic attack) reveals relatively low reliability because of low reliability indices for few specific anxiety diagnoses. For panic disorder, comparing each separate diagnostic criterion, there was no specific response pattern which changed from the first to the second interview. However, a change in the order of questions in this section may have affected subjects’ overall response behavior. In the DIA-X/M-CIDI, the panic attack stem question was followed by panic disorder questions only after which the panic attack symptoms were assessed. This order changed in the DIA-X-5, probing the panic attack symptoms before the panic disorder criteria, due to the prominent role of the panic attack specifier in DSM-5 [12]. Reliability for the panic stem item was good, as was the reliability for panic attack.

For social anxiety disorder, participants’ responses varied particularly for avoidance (5 of 12 discordant cases) and duration of anxiety (6 out of 12 discordant cases). Social anxiety disorder consists of a long criterion list and 9 out of 12 discordant cases were discordant only in one criterion. Agoraphobia and separation anxiety disorder did have a low overall number of cases in this study hampering reliability estimates.

For obsessive-compulsive disorder, different criteria for thoughts and behavior revealed divergent responses patterns between both interviews. For obsessive thoughts, mainly the response to the A criterion changed between both interviews. For compulsive behavior, the list of items/behaviors was expanded in the DIA-X-5 to also probe for OCD-related syndromes including nail biting, hair pulling, skin picking, and mirror checking. Although the mere presence of these symptoms was not counted toward the standard diagnosis of OCD, their inclusion may have affected the responses for OCD, given that symptoms such as nail biting were quite prevalent in the sample. Most discordant OCD-cases were due to the B criterion – referring to distress/impairment (mostly rated “some” in the second interview instead of “much”).

As already noted, the stem items of most diagnoses showed high reliability, even for the disorders for which relatively low reliability indices were found on the diagnostic level. It should be noted though that the stem question for the use of legal drugs, i.e. medication, reveals low reliability indices. This might depend on the type of listed substances and the broad open category of “other medications”. Unfortunately, no cases of medication use disorder were found in the current study, not allowing to test whether retest-reliability for medication would be higher on the diagnostic level.

When comparing the retest-reliability of diagnoses and stem items of the DIA-X-5 with previous results of the DIA-X/M-CIDI [4], kappa values on the diagnostic level are similar for depressive disorders, alcohol and illicit drug use disorder as well as for any DSM disorder. For PTSD, tobacco use disorder and any eating disorder, the DIA-X-5 reveals higher kappa values (kappa deviating at least 0.1), whereas the DIA-X/M-CIDI revealed higher kappa values for obsessive compulsive disorder and any somatoform disorder (relative to somatic symptom disorder). Mixed results were found for anxiety disorders; the DIA-X-5 had higher kappa values for panic attack and generalized anxiety disorder, the DIA-X/M-CIDI had higher kappa values for most anxiety disorder categories, such as any anxiety disorder, social anxiety disorder and panic disorder. However, most of these differences equal out when including kappa values for the stem items which reveal comparable kappa values for panic disorder, social anxiety disorder, agoraphobia, specific phobia, major depressive disorder, eating disorder and PTSD between DIA-X-5 and DIA-X/M-CIDI. However, relevant differences in kappa still remain for the stem items of generalized anxiety disorder (GAD) and OCD – with higher kappa values in the DIA-X-5 – and for dysthymia and manic/hypomanic episode – with higher kappa values for the DIA-X/M-CIDI. The differences between DIA-X/M-CIDI and DIA-X-5 likely depend on the available number of cases, which was higher in the DIA-X/M-CIDI retest study resulting in higher kappa values (first kappa paradox). As previously mentioned, the change in the order of questions for panic disorder might have decreased the kappa values for panic disorder in the DIA-X-5, in comparison to the DIA-X/M-CIDI.

Test-retest reliability of time-related questions

High test-retest reliability was found for the age of onset and age of recency questions in the DIA-X-5; the persistence questions generally revealed slightly lower reliability. Compared to the previous DIA-X/M-CIDI, the ICCs for age of onset in the DIA-X-5 were either similar or somewhat higher for most disorders. Substantially higher ICCs for age of onset were found in the DIA-X/M-CIDI, however, for most specific phobia subtypes, which could be due to the greater number of specific phobia cases in the DIA-X/M-CIDI retest study. In the current study low ICC in age of onset resulted from an overall small number of cases. For the specific phobia subtype, this was combined with an outlier who reported extremely different age of onsets in both interviews. Low ICC’s for age of recency in disruptive mood dysregulation disorder also may be due to few overall cases.

Persistence revealed a stronger variability than the other two time related measures. For specific phobias (situational/natural environment/other type), participants reported diverging onset/recency, most likely because these disorders often manifest early in development and take a waxing and waning course. However, overall persistence - meaning the overall number of years affected - is remembered similarly in both interviews. For illness anxiety and somatic symptom disorder a slow development and varying intensity levels can be assumed leading to difficulties in estimating persistence in terms of number of years affected. For separation anxiety disorder there were too few cases in this study to make reliable conclusions.


This test-retest study has limitations: First, the number of subjects assessed is at the lower bound for a test-retest reliability study, which principally affects the reliability estimates for conditions less frequently diagnosed in the sample. However, previous studies on retest reliability of structured clinical interviews included samples of comparable and even lower size with a range of 60 to 43 participants for the M-CIDI [4], the Structured Clinical Interview for DSM-5 disorders (DSM-5 SCID [25]) and the Spanish version of the Kiddie Schedule for Affective Disorders and Schizophrenia present and lifetime version DSM-5 (K-SADS-PL-5 [26]). Second, the test-retest interval varies between one and 36 days. Although an average retest interval of nine days is appropriate considering the variability for example depressive symptoms, shorter intervals (below 7 days) could increase kappa estimates. Third, a convenience sample was recruited for this study. Thus, the sample is community based and therefore useful for an instrument which is designed for representative studies. Fourth, the DIA-X-5 version applied in this study included a range of additional nested questionnaires, lists and screening modules which increased the length of the DIA-X-5 sections. This may have affected the response behavior of the participants. The relatively high agreements of the stem questions, however, argue against a systematic response behavior bias. Finally, the reliability coefficients of some disorders are at a lower bound. Those include panic disorder, social anxiety disorder and obsessive-compulsive disorder. In addition, some disorders reveal a fair Cohens kappa CI higher than 0.40 but do not meet the suggested criteria of ten cases [19], those are PTSD, anorexia nervosa, intermittent explosive disorder, and cannabis use disorder. Some reveal a kappa lower CI < 0.40 or a kappa CI range > .50, those disorders include persistent depressive disorder, panic disorder, obsessive compulsive disorder, any adjustment disorder, somatic symptom disorder, and alcohol use disorder. Concerning the diagnoses of these disorders the DIA-X-5 should be used with caution. Of note, reliability of stem items of these disorders is sufficient.


The DIA-X-5 is an extended, modified version of the DIA-X/M-CIDI. For the mental disorders with sufficient case numbers and thus analyzed in the present study, the DIA-X-5 reliably assesses symptoms, syndromes and diagnoses according to DSM-5 including their onset, recency, and persistence, when applied face-to-face by trained interviewers. These disorders are from DSM-5 sections containing the most prevalent mental disorders: depressive, anxiety, trauma-related, somatic symptom, eating, substance use disorders, and disruptive, impulse control or conduct disorders. The limited sample size calls for additional studies replicating the findings and allowing for more reliable conclusions with regard to less prevalent disorders. Finally, this study focused on the reliability of the DIA-X-5. However, its validity needs to be evaluated in a separate study.

Availability of data and materials

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Change history

  • 09 July 2020

    An amendment to this paper has been published and can be accessed via the original article.



Diagnostisches Expertensystem für Psychische Störungen


Munich-Composite International Diagnostic Interview


Diagnostic and Statistical Manual of Mental Disorders


International Classification of Diseases


World Health Organization


Post traumatic stress disorder


Jaccard Index


Reversed Jaccard Index


Bias-adjusted kappa


Intraclass correlation coefficient


Obsessive-compulsive disorder


Attention deficit hyperactivity disorder


Generalized anxiety disorder


Structured Clinical Interview for DSM Disorders


Spanish version of the kiddie schedule for affective disorders and schizophrenia present and lifetime version DSM-5


  1. Kessler RC, Ustun TB. The world mental health (WMH) survey initiative version of the World Health Organization (WHO) composite international diagnostic interview (CIDI). Int J Methods Psychiatr Res. 2004;13(2):93–121.

    Article  Google Scholar 

  2. Wittchen HU. Reliability and validity studies of the WHO composite international diagnostic interview (CIDI) - a critical review. J Psychiatr Res. 1994;28(1):57–84.

    Article  CAS  Google Scholar 

  3. Wittchen HU, Pfister H. DIA-X-Interviews: Manual für Screening-Verfahren und Interview; Interviewheft Längsschnittuntersuchung (DIA-X-Lifetime); Ergänzungsheft (DIA-X-Lifetime); Interviewheft Querschnittuntersuchung (DIA-X-Monate); Ergänzungsheft (DIA-X-Monate); PC-Programm zur Durchführung des Interviews (Längs- und Querschnittuntersuchung); Auswertungsprogramm. Frankfurt: Swets and Zeitlinger; 1997.

  4. Wittchen HU, Lachner G, Wunderlich U, Pfister H. Test-retest reliability of the computerized DSM-IV version of the Munich composite international diagnostic interview (M-CIDI). Soc Psychiatry Psychiatr Epidemiol. 1998;33(11):568–78.

    Article  CAS  Google Scholar 

  5. Reed V, Gander F, Pfister H, Steiger A, Sonntag H, Trenkwalder C, et al. To what degree does the composite international diagnostic interview (CIDI) correctly identify DSM-IV disorders? Testing validity issues in a clinical sample. Int J Methods Psychiatr Res. 1998;7(3):142–55.

    Article  Google Scholar 

  6. American Psychiatric Association. Diagnostic and statistical manual of mental disorders. 4th ed. Washington, DC: American Psychiatric Association; 2000.

    Google Scholar 

  7. World Health Organization. ICD-10: International statistical classification of diseases and related health problems - tenth revision. Geneva: World Health Organization; 1992.

  8. Andreas S, Schulz H, Volkert J, Dehoust M, Sehner S, Suling A, et al. Prevalence of mental disorders in elderly people: the European MentDis_ICF65+ study. Br J Psychiatry. 2017;210(2):125–31.

  9. Beesdo-Baum K, Knappe S, Asselmann E, Zimmermann P, Brückl T, Höfler M, et al. The ‘early developmental stages of psychopathology (EDSP) study’: a 20 year review of methods and findings. Soc Psychiatry Psychiatr Epidemiol. 2015;50(6):851–66.

    Article  Google Scholar 

  10. Jacobi F, Mack S, Gerschler A, Scholl L, Hofler M, Siegert J, et al. The design and methods of the mental health module in the German health interview and examination survey for adults (DEGS1-MH). Int J Methods Psychiatr Res. 2013;22(2):83–99.

    Article  Google Scholar 

  11. Jacobi F, Wittchen HU, Hölting C, Sommer S, Lieb R, Höfler M, et al. Estimating the prevalence of mental and somatic disorders in the community: aims and methods of the German National Health Interview and examination survey. Int J Methods Psychiatr Res. 2002;11(1):1–18.

    Article  Google Scholar 

  12. American Psychiatric Assocation. Diagnostic and statistical manual of mental disorders. 5th ed. Washington, DC: American Psychiatric Association; 2013.

  13. Kessler RC, Wittchen H-U, Abelson JM, McGonagle K, Schwarz N, Kendler KS, et al. Methodological studies of the composite international diagnostic interview (CIDI) in the US national comorbidity survey (NCS). Int J Methods Psychiatr Res. 1998;7(1):33–55.

    Article  Google Scholar 

  14. Wittchen HU, Kessler RC, Zhao SY, Abelson J. Reliability and clinical validity of UM-CIDI DSM-III-R generalized anxiety disorder. J Psychiatr Res. 1995;29(2):95–110.

    Article  CAS  Google Scholar 

  15. Jaccard P. The distribution of the flora in the alpine zone. New Phytologist. 1912;11(2):37–50.

  16. Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas. 1960;20(1):37–46.

    Article  Google Scholar 

  17. Byrt T, Bishop J, Carlin JB. Bias, prevalence and kappa. J Clin Epidemiol. 1993;46(5):423–9.

    Article  CAS  Google Scholar 

  18. Shrout PE, Spitzer RL, Fleiss JL. Quantification of agreement in psychiatric diagnosis revisited. Arch Gen Psychiatry. 1987;44:172–7.

    Article  CAS  Google Scholar 

  19. Cicchetti DV, Sparrow SS, Volkmar F, Cohen C, Rourke BP. Establishing the reliability and validity of neuropsychological disorders with low base rates: some recommended guidelines. J Clin Exp Neuropsychol. 1991;13:328–38.

    Article  CAS  Google Scholar 

  20. Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull. 1979;86(2):420–8.

    Article  CAS  Google Scholar 

  21. Koo TK, Li MY. A guideline of selecting and reporting Intraclass correlation coefficients for reliability research. J Chiropr Med. 2016;15(2):155–63.

    Article  Google Scholar 

  22. StataCorp. Stata Statistical Software: Release 14. College Station: StataCorp LP; 2015.

    Google Scholar 

  23. Fleiss JL. Statistical methods for rates and proportions. 2nd ed. New York: Wiley; 1981.

    Google Scholar 

  24. Reichenheim ME. Confidence intervals for the kappa statistic. Stata J. 2004;4(4):421–8.

    Article  Google Scholar 

  25. Shankman SA, Funkhouser CJ, Klein DN, Davila J, Lerner D, Hee D. Reliability and validity of severity dimensions of psychopathology assessed using the Structured Clinical Interview for DSM-5 (SCID). Int J Methods Psychiatr Res. 2018;27(1):e1590.

  26. de la Pena FR, Villavicencio LR, Palacio JD, Felix FJ, Larraguibel M, Viola L, et al. Validity and reliability of the kiddie schedule for affective disorders and schizophrenia present and lifetime version DSM-5 (K-SADS-PL-5) Spanish version. Bmc Psychiatry. 2018;18(1):193.

Download references


We thank Prof. Susanne Knappe, Prof. Jürgen Hoyer, Prof. Jules Angst, Prof. Winfried Rief, Prof. Roselind Lieb, Prof. Ron Kessler, Prof. Andreas Maerker and Dr. Axel Perkonigg for their expertise in developing the DIA-X-5. We thank the participants for their valuable time.


The study was supported by a starting grant of the Faculty of Psychology and additional funds of the Technische Universität Dresden. The positions of Dr. Jana Hoyer, Catharina Voss, John Venz, Dr. Lars Pieper and Prof. Dr. Katja Beesdo-Baum were funded during the conduction of the study by the Federal Ministry of Education of Research (grant number: 01ER1303 & 01ER1703). All authors had complete freedom to direct the analysis and its reporting within the current manuscript without influence from the sponsors. There was no editorial direction or censorship from the sponsors.

Author information

Authors and Affiliations



JH designed the study, collected and interpreted the patient data and wrote the manuscript. CV and LP helped with the study design, data acquisition and manuscript preparation. JS and JV programmed the interview, analyzed the data and helped in preparing the manuscript. SE helped in data acquisition and manuscript preparation. HUW and KBB designed the interview and the study and helped in preparing the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Katja Beesdo-Baum.

Ethics declarations

Ethics approval and consent to participate

The study protocol was reviewed by the local ethic committee of the Technische Universität Dresden (EK512112015).

Participants provided written informed consent. For participants younger than age 18 years, written informed assent/consent was obtained from both the adolescent and all legal guardians (e.g., both mother and father).

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hoyer, J., Voss, C., Strehle, J. et al. Test-retest reliability of the computer-assisted DIA-X-5 interview for mental disorders. BMC Psychiatry 20, 280 (2020).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: