To what extent does the anxiety scale of the Four-Dimensional Symptom Questionnaire (4DSQ) detect specific types of anxiety disorder in primary care? A psychometric study

Background Anxiety scales may help primary care physicians to detect specific anxiety disorders among the many emotionally distressed patients presenting in primary care. The anxiety scale of the Four-Dimensional Symptom Questionnaire (4DSQ) consists of an admixture of symptoms of specific anxiety disorders. The research questions were: (1) Is the anxiety scale unidimensional or multidimensional? (2) To what extent does the anxiety scale detect specific DSM-IV anxiety disorders? (3) Which cut-off points are suitable to rule out or to rule in (which) anxiety disorders? Methods We analyzed 5 primary care datasets with standardized psychiatric diagnoses and 4DSQ scores. Unidimensionality was assessed through confirmatory factor analysis (CFA). We examined mean scores and anxiety score distributions per disorder. Receiver operating characteristic (ROC) analysis was used to determine optimal cut-off points. Results Total n was 969. CFA supported unidimensionality. The anxiety scale performed slightly better in detecting patients with panic disorder, agoraphobia, social phobia, obsessive compulsive disorder (OCD) and post traumatic stress disorder (PTSD) than patients with generalized anxiety disorder (GAD) and specific phobia. ROC-analysis suggested that ≥4 was the optimal cut-off point to rule out and ≥10 the cut-off point to rule in anxiety disorders. Conclusions The 4DSQ anxiety scale measures a common trait of pathological anxiety that is characteristic of anxiety disorders, in particular panic disorder, agoraphobia, social phobia, OCD and PTSD. The anxiety score detects the latter anxiety disorders to a slightly greater extent than GAD and specific phobia, without being able to distinguish between the different anxiety disorder types. The cut-off points ≥4 and ≥10 can be used to separate distressed patients in three groups with a relatively low, moderate and high probability of having one or more anxiety disorders.


Background
Several anxiety scales are being employed in research and clinical practice for various reasons. Some scales, often used in research, measure specific types of anxiety (e.g., test anxiety, trait anxiety) or specific aspects of individual anxiety disorders (e.g., worry, social anxiety, specific fears) whereas other scales aim to measure a common characteristic of most, if not all, anxiety states or disorders (i.e., general anxiety) [1]. For use in primary care practice general scales are more relevant because of their promise to detect all or most types of anxiety disorder (i.e., panic disorder, agoraphobia, social phobia, generalized anxiety disorder (GAD), posttraumatic stress disorder (PTSD), obsessive compulsive disorder (OCD) and specific phobia). Detection of anxiety disorders in primary care is important because of their prevalence and associated disability [2]. Research has shown that general practitioners (GPs) recognize a mental health problem in most of their patients with an anxiety disorder but they have difficulty recognizing a specific anxiety disorder [3]. A solution to this problem might be the use of a case finding instrument to distinguish between patients with high risk of having an anxiety disorder and patients with low risk. This tool must be robust to prevalence variations as GPs will use it in patient populations with various prevalence rates.
As relevant studies typically either lump different anxiety disorders together or focus on a limited number of specific anxiety disorders, there is currently a lack of evidence that available and popular anxiety scales are capable of detecting all or most types of anxiety disorder in primary care. Examples of popular anxiety scales are the Hospital Anxiety and Depression Scale (HADS) [4], the Beck Anxiety Inventory (BAI) [5], the anxiety scales of the Depression Anxiety Stress Scale (DASS) [6] and the Mood and Anxiety Symptom Questionnaire (MASQ) [7,8], and the recently developed Generalized Anxiety Disorder scale (GAD-7) [9,10]. The HADS is mainly used in medical settings and appears to perform quite satisfactory [11,12], but it may not detect all relevant types of anxiety disorder (e.g., social phobia) [13][14][15]. The BAI seems to be biased towards panic disorder [16,17] and failed to detect any anxiety disorder in some studies [18,19]. The anxiety scale of the DASS also seems to favour panic disorder [20]. The anxiety scale of the MASQ was shown to be fairly good in detecting any anxiety disorder in a community sample [21], but in higher prevalence samples the scale discriminated poorly between anxiety disorders and other or no disorders [22,23]. The GAD-7 appears to be a good screener for GAD, panic disorder, social anxiety disorder and PTSD in primary care [9,10], but in higher prevalence samples the GAD-7 performed poorly in detecting GAD [24]. A few studies reported the failure of anxiety scales to discriminate between anxiety and depressive disorders [21,25,26], which may suggest that some anxiety scales actually measure negative affect or general distress [24].
The present study concerns the anxiety scale of the Four-Dimensional Symptom Questionnaire (4DSQ). The 4DSQ is a self-rating questionnaire comprising four scales measuring distress, depression, anxiety and somatization [27]. The anxiety scale is composed of a collection of symptoms that are more or less specific to the various distinct anxiety disorders (see Table 1 for its items). This raises questions about the dimensionality of the anxiety scale. Is the anxiety scale unidimensional, measuring a single trait of anxiety across different groups of patients (e.g., patients with different anxiety and depressive disorders or no disorder), or is the anxiety scale multidimensional, measuring different traits of anxiety in different patient groups (e.g., panic anxiety in panic disorder patients, social anxiety in social phobia patients and general anxiety in GAD patients)? If the 4DSQ anxiety scale is multidimensional, its scores could represent different anxiety problems depending on the specific anxiety disorder involved and anxiety scores could not be compared across diagnostic groups. For instance, an anxiety score of 15 could reflect a totally different problem in a panic disorder patient than in a social phobia patient. From a practical point of view the key question is whether the 4DSQ anxiety scale is able to detect the various specific anxiety disorders equally well (e.g., whether the scale will detect social phobia as well as panic disorder). For the primary care professional it is important to know whether the 4DSQ identifies all anxiety disorders to the same extent or whether it tends to detect some disorders preferentially and miss others. It should be noted that the 4DSQ is not intended to be used as a screening tool in unselected consecutive patients, but rather as an assessment and case finding instrument in emotionally distressed patients. As noted above, GPs usually recognize non-specific emotional problems in patients with an anxiety disorder without recognizing that these patients actually have an anxiety disorder that needs specific treatment [3]. The 4DSQ, as a case finding instrument, could assist GPs in separating patients with high risk of having an anxiety disorder from patients with low risk. The 4DSQ anxiety scale employs two cut-off points, based on clinical experience [28], a lower cut-off point with a relatively high sensitivity and a higher cut-off point with a relatively high specificity. The idea is that the lower cut-off point be used to identify a group of patients (below the cut-off) with a relatively low probability of having an anxiety disorder and that the higher cut-off point be used to identify a group of patients (above the cut-off) with a relatively high probability of having an anxiety disorder. The latter group should be given priority in a subsequent clinical diagnostic workup targeted at anxiety disorder. The current cut-off points (≥8 and ≥13) are probably set too high [29].
The present study evaluated the 4DSQ anxiety scale as a case finding tool to identify anxiety disorder and aimed to answer the following questions: (1) Is the 4DSQ anxiety scale unidimensional or multidimensional and what is the scale's reliability? (2) To what extent does the 4DSQ anxiety scale detect each of the specific anxiety disorder types? (3) Which cut-off points are suitable to rule out or to rule in (which) anxiety disorders?

Study populations
The design was a cross-sectional secondary analysis of 5 convenience samples collected in different primary care studies (total n = 969). Each of these samples consisted of patients selected for having mental health problems, defined in various ways. Each patient completed the 4DSQ and was subjected to a standardized psychiatric interview administered by trained research assistants. The range of disorders assessed differed across studies.
Dataset A contained the baseline data of general practice patients with emotional distress, who were assessed for eligibility to take part in a randomized clinical trial to investigate the effectiveness of a social work intervention [30]. The diagnostic interview used was the Composite International Diagnostic Interview (CIDI) [31], administered face-to-face. The study was carried out in compliance with the Helsinki Declaration and ethical approval was granted by the Ethical Committee of the Netherlands Institute of Mental Health and Addiction, Utrecht, the Netherlands. Anonymized data were made available by the Netherlands Institute for Health Services Research (NIVEL), Utrecht, the Netherlands.
Dataset B consisted of the baseline data of general practice patients with depressive symptoms, who were assessed for eligibility to participate in a randomized clinical trial to evaluate the effectiveness of antidepressant pharmacotherapy [32]. The CIDI was administered face-to-face. The study was carried out in compliance with the Helsinki Declaration and ethical approval was obtained from the Medical Ethical Committee of the VU University Medical Center, Amsterdam, the Netherlands. Anonymized data were made available by the EMGO Institute for Health and Care Research, Amsterdam, the Netherlands.
Dataset C comprised the baseline data of general practice patients with threshold and subthreshold mood and anxiety disorders, who were included in a randomized clinical trial to assess the effectiveness of a stepped care program [33]. The CIDI was administered by telephone.
The study was carried out in compliance with the Helsinki Declaration and ethical approval was obtained from the Medical Ethical Committee of the VU University Medical Center, Amsterdam, the Netherlands (registration number 2006/248). Anonymized data were made available by the Department of Clinical Psychology, VU University, Amsterdam, the Netherlands.
Dataset D consisted of the baseline data of general practice patients who were included in a randomized clinical trial aimed to evaluate a stepped care program for mood, anxiety and stress-related disorders [34]. The diagnostic interview used was the Mini-International Neuropsychiatric Interview (MINI) [35], administered face-to-face. The study was carried out in compliance with the Helsinki Declaration and ethical approval was obtained from the Medical Ethics Committee of the Twenteborg Hospital, Almelo, the Netherlands. Anonymized data were made available by Desiree B. Oosterbaan.
Dataset E was derived from a cross-sectional survey among employees who had been unable to work for more than two years due to mental health problems and who applied for a work disability benefit according to Dutch regulations [36]. The diagnostic interview consisted of the CIDI, administered face-to-face. The study was carried out in compliance with the Helsinki Declaration and ethical approval was obtained from the Medical Ethical Committee of the VU University Medical Center, Amsterdam, the Netherlands. Anonymized data were made available by the Department of Psychiatry, VU University Medical Center, Amsterdam, the Netherlands.
It should be noted that the selected patient samples were all more or less representative of the so called "indicated" population [37], the population in which the 4DSQ anxiety scale is indicated to contribute to the separation of patients with and without anxiety disorder.

Measures
Four-Dimensional Symptom Questionnaire (4DSQ) The 4DSQ has been developed in primary care as a tool to detect mental health problems, assess overall severity, and select patients with a high risk of having a depressive or anxiety disorder. Importantly, the 4DSQ dimensions were empirically derived through factor and cluster analysis of a pool of 96 symptoms covering the whole range of non-psychotic psychological and psychosomatic symptoms, without prior assumptions about the number and nature of the dimensions [38]. The 4DSQ comprises four scales measuring distress, depression, anxiety and somatization [27]. It takes on average 5-10 minutes to complete. The anxiety scale consists of 12 items measuring irrational fears, panic, avoidance, and other features associated with anxiety disorders (see Table 1). The scale's reliability is generally good with Cronbach's alpha values generally well over 0.80. Response categories are "no", "sometimes", "regularly", "often", "very often or constantly", which are scored as 0 for "no", 1 for "sometimes" and 2 for the other response categories. Item scores are summated to obtain scale scores. The rationale behind collapsing the highest response categories "regularly", "often", "very often or constantly" into a single score category is to avoid spurious correlations due to exaggerating response tendencies. This way of scoring ensures that the scale score reflects primarily the number of symptoms rather than their subjective severity [39]. The 4DSQ is freely available for non-commercial use as in health care and research [40].

Standardized psychiatric interview
The studies employed two different diagnostic interviews, the Composite International Diagnostic Interview (CIDI) and the Mini-International Neuropsychiatric Interview (MINI). The CIDI is a structured interview suitable to be applied by trained lay interviewers [31]. It allows standardized diagnoses of mental disorders according to the definitions of the ICD-10 and DSM-IV (we used DSM-IV diagnoses only). Reliability and validity are generally good [41]. The MINI is also a structured interview, but is it shorter than the CIDI [35]. The MINI has good reliability and agreement with the CIDI [42]. As both interviews are known to produce reliable and valid DSM-IV diagnoses, we assumed that the CIDI and the MINI interviews produced equivalent results. That is, we assumed that, for instance, panic disorder diagnosed in one study using the CIDI was essentially the same disorder as panic disorder diagnosed in another study using the MINI, although differences in prevalence and severity across the studies might have existed. There was no way to test our assumption of invariant diagnoses across studies, we simply had to rely on it. However, it should be noted that major violation of this assumption (e.g., when panic disorder according to the CIDI was a different condition than panic disorder according to the MINI) would have resulted in significantly decreased psychometric parameter estimates after pooling the samples as the 4DSQ anxiety score would have been compared to a hodgepodge of different conditions.

Analysis
To describe the study samples, we examined the composition of the samples regarding the prevalence of specific disorders, the occurrence of multiple anxiety disorders and comorbidity between anxiety and depressive disorders.
All analyses were performed in the five study samples separately and, where possible, in the pooled sample of five studies (total n = 969). Some anxiety disorders were only assessed in two studies; in these cases the pooled analyses were limited to the studies in which the specific anxiety disorder was assessed.
To assess the dimensionality of the 4DSQ anxiety scale we examined the fit indices of a one factor model using confirmatory factor analysis (CFA) in the five studies separately. Fit indices examined were the χ 2 /df statistic, the Root Mean Square Error of Approximation (RMSEA), the Comparative Fit Index (CFI) and the Tucker-Lewis Index (TLI). RMSEA values less than 0.08, χ 2 /df statistics less than 3, and CFI and TLI values greater than 0.95 were accepted as indicating adequate fit [43]. Strict factorial invariance across all five studies was tested using a multi-group CFA. The fit of the strict factorial invariance was compared to a partial factorial invariance model (in which the residual variances were allowed to differ between studies) using the χ 2 test. CFA and multi-group CFA analyses were performed in M-plus version 7 using theta parameterisation [43].
As a measure of internal consistency reliability we determined the anxiety scale's Cronbach's alpha. We calculated the anxiety scale's standard error of measurement (SEM) from the scale's standard deviation (SD) and the alpha coefficient, using the formula The SEM, being the standard deviation of the measurement error of the scale score, allows an estimation of the confidence interval around individual scores. This information is useful for choosing and interpreting practical cut-off points for the scale.
To assess the extent to which the 4DSQ anxiety scale was able to detect the various specific anxiety disorders, we explored the anxiety score distributions by drawing boxplots for the individual anxiety disorders, for patients with single and multiple anxiety disorders, and for patients with depressive disorder(s) only, anxiety disorder(s) only, and comorbid anxiety-depressive disorders. In addition, we calculated mean anxiety scores and standard deviations for the various diagnostic groups. Differences between groups were tested using the non-parametric Kruskal-Wallis test to account for the skewed score distribution in some of the groups. Pair-wise post hoc tests were performed using the software package "pgirmess" as implemented in the statistical program R version 3.0.1 [44].
To determine optimal cut-off points for the 4DSQ anxiety scale we performed receiver operating characteristic (ROC) analyses with the anxiety score as the test variable and anxiety disorders as the state variable, in the separate studies and in the pooled samples. As it turned out that the anxiety score seemed to be more consistently associated with panic disorder, agoraphobia, social phobia, OCD and PTSD than with GAD and specific phobia, we performed ROC analyses with the former five disorders as state variable. Because only panic disorder, agoraphobia and social phobia had been assessed in all five studies, we first performed a ROC analysis with these three disorders as outcome variable. We determined the best ROC thresholds, being the thresholds closest to the top-left corner of ROC graph (i.e., sensitivity = 1, 1-specificity = 0). In addition, we determined the highest thresholds with an arbitrarily chosen sensitivity of ≥0.85, possibly suitable as the lower cut-off point of the scale to rule out anxiety disorder when the test is negative, and the thresholds with an arbitrarily chosen specificity of ≥0.85, possibly suitable as the higher cut-off point of the scale to rule in anxiety disorder when the test is positive. We used package "pROC" as implemented in R to perform the ROC analyses and to estimate 95% confidence intervals (95% CI) of the thresholds and operational parameters using bootstrapping (2000 samples) [45]. Next, the analysis was repeated with panic disorder, agoraphobia, social phobia, OCD or PTSD as outcome variable in the samples in which the latter two disorders had been assessed.
A set of thresholds was chosen using all available information. Finally, we calculated likelihood ratios to evaluate the performance of these thresholds with respect to the detection of panic disorder, agoraphobia and social phobia, as well as to the detection of any anxiety disorder. The likelihood ratio (LR) of a test result (e.g., a certain anxiety score or a range of scores) is the ratio between the probability of this result in a population with the diagnosis of interest (e.g., anxiety disorder) and the probability of this result in a population without the diagnosis of interest [46]. LRs are relatively independent of the prevalence of the diagnosis of interest in the study population. Once LRs are known, the probability of a diagnosis, given a certain test result and a certain prevalence, can be calculated relatively easy because the LR is also the ratio between the posterior odds of having a disorder and the prior odds of having the disorder, with the latter simply being the prevalence divided by 1 minus the prevalence. The posterior probability of having a disorder is the posterior odds divided by 1 plus the posterior odds [46]. We have calculated LRs for the defined low, moderate and high anxiety scores based on the pooled sample. Subsequently, we used these LRs to calculate the predictive value of low, moderate and high anxiety scores with respect to ruling in or ruling out panic disorder, agoraphobia and social phobia, and any anxiety disorder respectively, in two hypothetical samples, one similar to our pooled sample, the other with half the prevalence of anxiety disorder. LRs and their confidence intervals were calculated using the website for statistical computation VassarStats (http://vassarstats.net/).
The analyses, other than the CFAs, the ROC-analyses and the post hoc analyses after the Kruskal-Wallis tests, were performed using SPSS 20.0.

Prevalence and comorbidity
Details of the study samples are presented in Table 2. The diagnostic composition of the samples varied to some extent. Studies that focused on the whole spectrum of depressive and anxiety disorders (studies C, D and E) selected relatively more patients with anxiety disorders. Study B that focused on patients with depressive complaints included relatively more patients with depressive disorders and fewer patients with anxiety disorder, except for GAD. We refrained from formal statistical testing of these between-study differences because generalization of these differences would serve no purpose. It suffices to note that there was some heterogeneity between the study samples, which likely resulted from the different settings and purposes for which the samples had been collected. Table 3 shows the prevalence of multiple anxiety disorders and the co-occurrence of anxiety and depressive disorders (anxiety-depression comorbidity). For instance, of all patients across the study samples diagnosed with panic disorder (n = 176) 86% had one or more other anxiety disorders too, and 59% of the panic disorder patients had a comorbid depressive disorder (i.e., major depressive disorder or dysthymia). For each of the specific anxiety disorders, the occurrence of multiple anxiety disorders (56-88%) and anxiety-depression comorbidity (55-74%) was more the rule than an exception. Of all patients with one or more anxiety disorders (n = 477), 228 (48%) had a single anxiety disorder, which was most frequently (in 99 cases) GAD. It should be noted that the already high prevalence of multiple anxiety disorders was probably underestimated to some extent because specific phobia, OCD and PTSD had not been assessed in three studies (see Table 2).

Unidimensionality and reliability
The results of the CFAs are displayed in Table 4. For all studies separately the one factor model showed adequate fit. Moreover, for the studies combined, the strict factorial invariance model showed adequate fit on all indices. Fit of the strict factorial invariance model was not worse  than that of the partial factorial invariance model (χ 2 = 57.6, df = 48, p = 0.162).
Cronbach's alpha varied between 0.85 (study C) and 0.92 (study E) and was 0.90 in the pooled sample. The anxiety scale's standard deviation varied between 5.2 (study C) and 7.3 (study E) and was 6.2 in the pooled sample. The SEM varied between 1.9 (studies A and B) and 2.0 (studies C-E) and was 2.0 in the pooled sample. This value of SEM means that, due to measurement error, the 96% confidence interval of a given score X was X ± 4 points and that the 84% confidence interval of a given score X was X ± 3 points.

Anxiety score distributions
The boxplots depicting the disorder-specific anxiety score distributions ( Figure 1) suggest a difference in overall level of anxiety, as measured by the 4DSQ anxiety scale, between GAD and specific phobia on the one hand and panic disorder, agoraphobia, social phobia, OCD and PTSD on the other hand. It appeared that, on average, panic disorder, agoraphobia, social phobia, OCD and PTSD were characterized by slightly higher anxiety scores than GAD and specific phobia. Figure 2 demonstrates the differences in anxiety score associated with the number of anxiety disorders per patient. The median anxiety score for patients with single anxiety disorders was 7 whereas the median score for patients with three or more anxiety disorders was 16. Clearly, the anxiety score was an indicator of the number of anxiety disorders per patient. Of the patients with three or more anxiety disorders over 50% scored very high (i.e., ≥16) and less than 10% scored low (i.e., <4). In contrast, no more than 10% of patients with single anxiety disorders scored very high (i.e., ≥16) and 29% scored low (i.e., <4). As noted above, GAD was the most frequent diagnosis in the single anxiety disorder group (43%). Also relevant for the ability of the anxiety score to detect anxiety disorders was the finding that only 11% of patients without a diagnosed anxiety disorder scored ≥10. An anxiety score ≥10 indicated a relatively high probability of having one or more anxiety disorders.
Note that Figure 2 does not account for comorbidity with depressive disorder.
Anxiety-depression comorbidity was also strongly related with the anxiety score distribution (Figure 3). Of the patients with non-comorbid anxiety disorders 27% scored low (i.e., <4) on the anxiety scale and 54% of them had a single anxiety disorder. The presence of depressive disorder was also associated with an increase in the anxiety score, although a smaller increase than the increase associated with the presence of anxiety disorder.
Mean anxiety scores per disorder are displayed in Table 5. The highest mean scores occurred in patients with panic disorder, agoraphobia, OCD, PTSD and social phobia. The mean anxiety score appeared to be strongly associated with the number of anxiety disorders per patient (Kruskal-Wallis test p <0.001). To account for multiple pair-wise comparisons between four groups (6 comparisons) we adopted a critical p-value of 0.0083 (0.05/6) for the post hoc tests. All between group tests were significant (p <0.0083). By the same token, the comorbidity groups were significantly different (Kruskal-Wallis test p < 0.001). Post hoc tests (with the same adjustment for multiple testing) revealed that all between-group differences were significant (p <0.0083).
In conclusion, the anxiety scale appeared to detect multiple anxiety disorders better than single anxiety disorders, comorbid anxiety-depressive disorders better than non-comorbid anxiety disorders, and panic disorder, agoraphobia, social phobia, OCD and PTSD better than GAD and specific phobia.

ROC analysis
ROC analysis with panic disorder, agoraphobia and social phobia as the outcome variable revealed area under the curve (AUC) values in the separate studies between 0.737 and 0.857 (Table 6). In the pooled sample the AUC was 0.793 (95% CI 0.763 -0.822) indicating that the overall diagnostic accuracy was fair [47]. The best ROC threshold nearest to the top-left corner of the ROC-graph (i.e., sensitivity =1 and 1-specificity = 0)

Likelihood ratios and predicted probabilities
Based on the ROC-analyses, we decided to choose ≥4 and ≥10 as the revised lower and higher cut-off points for the 4DSQ anxiety scale. The lower cut-off point (≥4) identified 85-90% of all patients with panic disorder, agoraphobia, social phobia, OCD or PTSD and 80% of all patients with GAD or specific phobia. The higher cut-off point (≥10) identified two thirds of all patients with panic disorder, agoraphobia, social phobia, OCD or PTSD and 50% of all patients with GAD or specific phobia. Anxiety disorder patients who scored low (<4) on the anxiety scale, consisted mainly of patients with single anxiety disorders (73%) and patients with non-comorbid disorders (60%) whereas 50% of them had GAD. The percentage of patients with anxiety scores 4-9 varied between 25% and 42% in the separate study samples, and was 31% in the pooled sample.
The LRs associated with low, moderate and high anxiety scores are presented in Table 7. The prevalence of panic disorder, agoraphobia and social phobia in the pooled sample was 317/969 = 32.7%. The likelihood of scoring 0-3 among patients with panic disorder, agoraphobia and social phobia was 42/317 = 0.132, whereas this likelihood among patients without panic disorder, agoraphobia and social phobia was 332/652 = 0.509. The ratio of these likelihoods was 0.132/0.509 = 0.259. Table 8 presents the predicted probabilities of having panic disorder, agoraphobia or social phobia, or any anxiety disorder respectively, based on the LR and the 4DSQ anxiety score. The probabilities were calculated using the following equations: prior odds ¼ prevalence= 1-prevalence ð Þ posterior odds ¼ prior odds Â LR probability ¼ posterior odds= 1 þ posterior odds ð Þ : As expected, low anxiety scores (0-3) predicted relatively low probabilities of having an anxiety disorder and high anxiety scores (10-24) predicted relatively high probabilities, depending on the prevalence of anxiety disorder. Note that low anxiety scores are relatively good in ruling out panic disorder, agoraphobia and social phobia but perform relatively poorly in ruling out any anxiety disorder, especially in high prevalence samples. The obvious reason is that, as we have seen before, about one fifth of patients with GAD or specific phobia have low anxiety scores. On the other hand, high anxiety scores do a relatively good job in ruling in (any) anxiety disorder. The LRs associated with moderate anxiety scores (4-9) were close to 1 and, consequently, the posterior probability was close to the prevalence of anxiety disorder. Moderate anxiety scores are little informative.

Discussion
Our results suggest that, in primary care patients, the 4DSQ anxiety scale measures a unidimensional construct. In other words, the scale seems to measure a common trait of anxiety symptoms that is present to a lesser or greater extent in various patient groups. This common trait of pathological anxiety appears to be present to a greater extent in patients with panic disorder, agoraphobia, social phobia, OCD and PTSD, and to a slightly lesser extent in patients with GAD and specific phobia. It is absent, or present to a relatively small extent, in patients with non-comorbid depressive disorders and in emotionally distressed patients without any anxiety or depressive disorder. Notwithstanding the fact that the 4DSQ anxiety scale consists of an admixture of vague anxiety symptoms (e.g., vague feeling of fear, feeling frightened) and symptoms that are more or less specific to distinct anxiety disorder types (e.g., anxiety or panic attacks, irrational specific fears, fear of public embarrassment, repeating actions, avoiding places, fear of public transport) the anxiety scale symptoms appear to work together to measure a common dimension of pathological anxiety. Although the specific anxiety disorders are conceptualized as separate disorders in DSM-IV, in our samples, the specific anxiety disorders relatively rarely occurred stand-alone as single disorders. Multiple anxiety disorders were the rule, rather than an exception. This might, in part, explain why we found the anxiety scale to be unidimensional. Additional research is needed to clarify the dimensions of anxiety symptoms and disorders.
The kind of anxiety that is measured by the 4DSQ anxiety scale was present in most patients with anxiety disorders. This finding compares favourably to existing anxiety scales. However, this anxiety was present to a slightly greater extent in patients with panic disorder, agoraphobia, social phobia, OCD or PTSD than in patients with GAD or specific phobia, and undeniably it was present to a greater extent in patients with multiple Table 7 Likelihoods and likelihood ratios associated with low (0-3), moderate (4-9) and high (10-24) 4DSQ anxiety scores with respect to panic disorder, agoraphobia and social phobia, and to any anxiety disorder respectively  anxiety disorders than in patients with single anxiety disorders, and in comorbid anxiety-depressive disorders than in non-comorbid anxiety disorders. Still, the majority of patients with GAD or specific phobia (79%), single anxiety disorders (71%) and non-comorbid anxiety disorders (73%) scored at or above the lowest cut-off point (≥4). Nevertheless, 20-30% of these disorders scored low (<4). In contrast, 85-90% of patients with panic disorder, agoraphobia, social phobia, OCD or PTSD, multiple anxiety disorders or comorbid anxiety-depressive disorders scored ≥4. When it comes to detecting anxiety disorders in primary care patients, the 4DSQ performs better with respect to panic disorder, agoraphobia, social phobia, OCD or PTSD, multiple anxiety disorders and comorbid anxiety-depressive disorders. A sufficiently strong association between the 4DSQ anxiety score and the presence of anxiety disorder constitutes a prerequisite for the anxiety score to be useful as a tool to detect anxiety disorder. This association depends, first of all, on the concordance of whatever the anxiety scale is measuring and what characterizes anxiety disorder (a matter of validity). In the hypothetical situation that there is 100% concordance, all patients scoring above a certain threshold on the anxiety scale would have an anxiety disorder and all patients scoring below that threshold would not. In practice, of course, the concordance is rarely 100%. In the present study there was evidence that very high anxiety scores not always implied a diagnosable anxiety disorder, and, conversely, that very low anxiety scores did not always imply the absence of anxiety disorder diagnosis. A possible reason for high anxiety scores in the absence of an anxiety disorder diagnosis might be that the patient did not fulfil all necessary criteria for a diagnosis (regarding e.g., duration, distress or disability). A possible reason for low anxiety scores in the presence of anxiety disorder might be that in some anxiety disorder cases manifest anxiety (as measured by the 4DSQ) was not a prominent feature of the disorder or was not necessarily present all the time. This happened relatively more often in cases diagnosed as GAD or specific phobia.
The observed association between the anxiety score and the diagnosis of anxiety disorder is also determined by the amount of measurement error, both in the anxiety score and in the assessment of the anxiety disorder diagnosis. Measurement error in the anxiety disorder diagnosis translates into misclassification and reduced reliability of the diagnosis. In our studies diagnostic reliability was not assessed, but typically the interrater agreement (Cohen's kappa) of anxiety disorder diagnoses varies between 0.60 and 0.80 [41]. A kappa of 0.70 means 70% agreement after correction for chance agreement. Considering that there is a continuity between normality and anxiety disorder, it should be realized that the risk of misclassification is greatest near the threshold of disorder.
The mean reliability of the anxiety score across the study samples was 0.90, yielding a SEM of 2 points, suggesting that the 84% confidence interval of a given observed anxiety score X was X ± 3. In other words, when the observed anxiety score was X, we could be at least 92% confident that the true score was not > (X + 3) and we could also be at least 92% confident that the true score was not < (X-3).
When performing ROC analyses, we observed wide confidence intervals and significant variability of the thresholds across the studies. This variability must be attributed to differences in prevalence and severity spectrum of the samples, and also to distributional irregularities produced by chance in relatively small samples. Combining the samples by pooling was a logical action in order to obtain more stable estimates. This way we obtained 6.5 as the best single threshold to detect panic disorder, agoraphobia, social phobia, OCD and PTSD. Yet, using this single threshold would misclassify over a quarter of patients in either group. Therefore, we chose two thresholds, one (3.5) with a relatively high sensitivity to single out patients with a relatively low probability of having panic disorder, agoraphobia, social phobia, OCD or PTSD and one threshold (9.5) with a relatively high specificity to single out patients with a relatively high probability of having panic disorder, agoraphobia, social phobia, OCD or PTSD. Note that both thresholds are 1.5 SEM away from the threshold (6.5) of panic disorder, agoraphobia, social phobia, OCD and PTSD. This implies that we can be more than 92% confident that patients with anxiety scores 0-3 do not have a true anxiety score above the threshold of panic disorder, agoraphobia, social phobia, OCD and PTSD. Conversely, we can be more than 92% confident that patients with anxiety scores 10-24 do not have a true anxiety score below the threshold of panic disorder, agoraphobia, social phobia, OCD and PTSD. The uncertainty about whether or not a patient has passed the threshold of panic disorder, agoraphobia, social phobia, OCD and PTSD has now been restricted to one third (25-42%) of all patients, who score 4-9 on the anxiety scale.
The primary care professional can use the two cut-off points of the 4DSQ anxiety scale to separate patients with mental health problems into three groups: (1) a group with high anxiety scores (10-24), (2) a group with moderate scores (4-9), and (3) a group with low scores (0-3). Patients with high anxiety scores have a relatively high probability of having one or more anxiety disorders. Importantly, a high anxiety score does not represent a clinical diagnosis in itself. In addition, as noted earlier, the 4DSQ anxiety score does not indicate which specific anxiety disorder(s) is (are) present. A clinical diagnosis should be made in the short term using clinical judgment and available clinical guidelines [48,49]. Given the likelihood ratio, the chance of diagnosing one or more anxiety disorders is relatively high. Moreover, patients with high anxiety scores tend to have relatively clear-cut disorders as most borderline anxiety disorders are classified into the moderate group. On the other hand, patients with low anxiety scores have a low probability of anxiety disorder and when they do have an anxiety disorder, it will often be GAD or specific phobia, or a borderline anxiety disorder. These patients do not need a diagnostic interview targeted at anxiety disorder for the time being. Probably, in this low anxiety scores group, other problems (e.g., depression, stressful life situations) are more important to address. In the middle group with moderate anxiety scores (which constituted one third of our pooled sample), the possibility of anxiety disorder has not been ruled out as the probability is about the same as the prevalence. Anxiety disorder cases in this group are relatively often just beyond the diagnostic threshold and other problems (e.g., depression, stress) might be in more need of treatment. We argue that a diagnostic workup targeted at anxiety disorder can be postponed for a few weeks while monitoring the effect of non-specific interventions (e.g., reassurance, encouragement, advice) and the passage of time. When after 3-4 weeks symptoms decline, further diagnostic workup targeted at anxiety disorder does not seem to be necessary, but when symptoms do not abate a diagnostic interview is warranted after all. In our experience, this way GPs can efficiently target their diagnostic efforts to patients with a relatively high risk of having an anxiety disorder while keeping patients with moderate risk under surveillance. We acknowledge that there is currently no firm evidence to support this strategy, but it is our impression that it works fine in the primary care setting. More research is needed in this area.
The main limitation of the present study relates to the representativeness of the datasets included. Each of the datasets had been collected for other purposes than to evaluate the measurement properties of the 4DSQ. We would have preferred a large representative sample of primary care patients with mental health problems, each extensively assessed using a standardized psychiatric interview. However, this is costly and logistically challenging. Therefore, we employed convenience datasets collected in other studies. We assumed that the psychiatric diagnoses were principally invariant across the study samples as the samples could all be considered draws from the same large pool of primary care patients with mental health problems. Due to sampling differences, a fair degree of heterogeneity across the studies was evident, but this probably represented a strength of our study instead of a weakness. Furthermore, as the 4DSQ anxiety scale demonstrated high reliability and identical measurement properties across the studies, we assumed that the operating characteristics of the scale (i.e., sensitivity and specificity) were principally the same across the studies, only varying due to sampling. Therefore, we assumed that pooling (i.e., effectively conducting a patient level meta-analysis) was the best way to obtain valid estimates for the operating characteristics of the anxiety scale.
A second limitation concerns the fact that some studies did not assess the whole range of anxiety disorders. Notably, specific phobia, OCD and PTSD were not included in three studies. We estimate that if these diagnoses would have been established with a prevalence of 10-15%, assuming that at least two thirds of these disorders would cooccur with another (already known) anxiety disorder, the total increase in anxiety disorder patients across the studies would amount to 5-10%. This would have lead to a small decrease in the anxiety scores of patients without anxiety disorder. We assume that this would not have changed the results in any substantial way. However, replication in new samples would be desirable.
A third limitation constitutes the lack of information about interrater reliability of the diagnostic interviews. We relied on the reported reliability of these standardized interviews when performed by carefully trained interviewers. However, it should be noted that low reliability (i.e., measurement error) would attenuate existing associations between the 4DSQ anxiety score and anxiety disorder diagnosis. Because measurement error usually does not correlate with anything, it is unlikely that low reliability would be responsible for false associations. In other words, the associations in this study, as expressed in areas under the ROC-curve, sensitivities, specificities and likelihood ratios, are real and provide some reassurance regarding the diagnostic reliability.
This study took place in the DSM-IV era. However, in the meantime the DSM-5published in May 2013has decided not to classify OCD and PTSD as anxiety disorders anymore [50]. Instead OCD is included in a separate section with disorders characterized by compulsive behaviour, whereas PTSD is included in a section with disorders following traumatic or stressful events. Yet, our findings provide evidence of at least some degree of kinship between these disorders and typical anxiety disorders like panic disorder, agoraphobia and social phobia.

Conclusions
The 4DSQ anxiety scale measures a common trait of pathological anxiety that is characteristic of the anxiety disorders, in particular panic disorder, agoraphobia, social phobia, OCD and PTSD. This property enables the anxiety scale to distinguish between patients with high risk of having an anxiety disorder (especially panic disorder, agoraphobia, social phobia, OCD and PTSD) and patients with low risk. It should be noted that the 4DSQ anxiety score is not able to distinguish between the separate anxiety disorder types. We propose to use ≥4 and ≥10 as cut-off points. Scores ≥4 should serve as a prompt to consider the possible presence of an anxiety disorder (while the probability is still relatively low), whereas scores ≥10 serve best as a prompt to pursue a clinical diagnostic workup for anxiety disorder immediately (as the probability is relatively high).