Developing a decision tool to identify patients with personality disorders in need of highly specialized care

Background Current guidelines recommend referral to highly specialized care for patients with severe personality disorders. However, criteria for allocation to highly specialized care are not clearly defined. The aim of the present study was to develop a decision tool that can support clinicians to identify patients with a personality disorder in need of highly specialized care. Methods Steps taken to develop a decision tool were a literature search, concept mapping, a meeting with experts and a validation study. Results The concept mapping method resulted in six criteria for the decision tool. The model used in concept mapping provided a good fit (stress value = 0.30) and reasonable reliability (ρ = 0.49). The bridging values were low, indicating homogeneity. The decision tool was subsequently validated by enrolling 368 patients from seven centers. A multilevel model with a Receiver Operating Characteristic Curve (ROC) was applied. In this way, an easily implementable decision tool with relatively high sensitivity (0.74) and specificity (0.69) was developed. Conclusions A decision tool to identify patients with personality disorders for highly specialized care was developed using advanced methods to combine the input of experts with currently available scientific knowledge. The tool appeared to be able to accurately identify this group of patients. Clinicians can use this decision tool to identify patients who are in need of highly specialized treatment. Electronic supplementary material The online version of this article (doi:10.1186/s12888-017-1460-6) contains supplementary material, which is available to authorized users.


Background
The prevalence of personality disorders is high. Several studies have suggested that approximately 1 out of every 10 people in the general population has a personality disorder [1]. When compared to disorders like depression or generalized anxiety disorder, the economic burden is large -this is especially true for the economic costs of borderline and obsessive-compulsive personality disorders [2]. Patients with personality disorders are substantial users of primary care and mental health services [2][3][4][5][6], in particular those with borderline personality disorder. When compared to patients with depression or other personality disorders, they receive the highest amount of care [4]. As a subgroup within this group, patients with severe personality disorders often face additional problems with regards to violence, antisocial behaviour and interpersonal relationships and a greater recurrence of self-harm and a greater duration of administered care [7,8]. The quality of life of patients who experience severe and complex personality problems in combination with a personality disorder is comparable to adults with depression [9,10]. As people with personality disorders form a very heterogenous group, the personality disorder diagnosis alone is seldom sufficient for treatment planning [11]. Guidelines advise highly specialized care for patients with more severe personality disorders [12]. This is supported by evidence that indicates that patients with personality disorders are less responsive to usual treatment [9]. However, research concerning the early identification of patients in need of highly specialized treatment is scarce. Therefore in clinical practice, referral to highly specialized care is often only considered after multiple ineffective regular treatments [12]. Thus, patients may receive insufficient and inappropriate treatment [9] and are expected to generate high costs over time.
Referral to highly specialized care may be optimized by improving diagnostics. To date, validated tools for decision support are scarce in psychiatric practice. This is in contrast to other parts of health care, such as oncology or cardiovascular disease. In the absence of validated tools for the identification of patients who may benefit from highly specialized care, clinicians currently rely on overall clinical impressions or severity of symptoms [13].
To develop a validated tool, it is important to first define the characteristics of patients with severe personality disorders. Until now, only a few studies provided definitions of patients with severe personality disorders. Crawford et al. [14] showed that only a few of these studies provide such definitions. These definitions fit five main themes: 1) some categories of personality disorders are more severe than others; 2) severity depends on the number of features of a personality disorder; 3) severity depends on the number of categories of personality disorders; 4) severity depends on the level of impairment in social functioning, and 5) severity depends on the risk of harm towards others [14]. Tyrer [7] and Crawford, Koldobsky, Mulder, and Tyrer [14] developed a severity scale for personality disorders based on the number of clusters, the number of personality disorders, the level of impairment in social functioning and the risk of harm towards each other. However, their scale is not based upon a systematic approach to the evidence. Moreover the relationship between severity, as defined by the criteria of the scale, and treatment allocation to highly specialized treatments is unclear. Although the criteria on this already existing severity scale are expected to partly overlap with the criteria for the identification of patients who may benefit from highly specialized care, these criteria may not cover these patients sufficiently.
As there is no knowledge on how to identify these patients, the aim of the present study was to develop a decision tool that can aid clinicians in identifying patients with personality disorders in need of highly specialized care.

Study design
The Decision Tool Personality Disorder (DTPD) was developed by clinicians in collaboration with researchers. Its development progressed through three primary phases.
During the first phase, a systematic review of the literature was conducted to serve as a scientific foundation for the decision tool. In this phase, experts were asked to suggest search terms in addition to the search terms that the researchers had already generated. In this way, a large set of potential predictors relevant for treatment allocation was formed. In the second phase, a structured conceptualization methodology known as concept mapping was employed to complement the initial list of features. These criteria were provided by clinical experts and used to develop a consensus-based conceptual framework to guide tool development. Experts were asked to sort the potential predictors into distinguishable categories. In this way, a list of items based on the concept mapping results was generated. These items were used to create the DTPD. Experts were consulted at every step to ensure clinical usability. In the third phase, the instrument was studied for its psychometric properties. An overview of the three phases is presented in Fig. 1.
To effectively take key decisions in the concept mapping process, guidelines recommend to use a small group of participants and/or researchers [15]. Therefore, before the study started a small working group was formed. This group consisted of researchers and clinicians from two mental health care institutions (De Viersprong, a mental health centre specialized in personality pathology; PsyQ, a mental health centre for general psychopathology) and a university (Erasmus University, Institute for Medical Technology Assessment), all specialized in the treatment of personality disorders. This working group contacted experienced clinicians who were then invited to complete a digital survey to provide their contact details and the contact details of other experts for participation in the research.

Phase 1: Literature search
To develop the first set of criteria for the DTPD, a systematic literature search was conducted in PubMed and Psychinfo. In the absence of studies directly examining factors associated with a need for highly specialized care, proxy indicators had to be identified. The following proxy indicators were defined by experts using a webbased survey: comorbidity, severity, dropout, prognosis and patient characteristics. Search terms were then based on these terms. Studies were first screened by title for selection. Selection was based on eligibility criteria, numbered: (1a) English/Dutch/Human/Abstract or full text available, (1b) Randomized trial/(systematic) Review/ Clinical trial/Observational study and (2) published after 1992, (3a) Personality disorder, (3b) Proxy indicators in combination with patient characteristics/comorbidity, and (4) no overlap between studies. For criteria 3b we searched for possible characteristics of patients or certain psychological comorbidities that were associated with having such a proxy indicator -such as certain characteristics that are associated with dropout. Two researchers independently performed the selection process and data extraction of the studies (MG and DS) 1 . Differences in selection were resolved by discussion. Following this, the studies were screened for information on predictors/criteria associated with dropout, severity of the personality disorder, predictors for the course of treatment and other prognostic factors. In the review, we have adopted the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [16]. The criteria defined in the literature search were subsequently used in the concept mapping phase.

Phase 2: Concept mapping
Concept mapping is a method that integrates qualitative research design with quantitative analytic techniques to conceptualize a phenomenon. The concept mapping in the present study consisted of three successive actions for the participants: a brainstorming session, sorting criteria and rating the relevance of criteria. Participants were given access to an online concept mapping system [17]. The web-based concept mapping procedure consisted of three successive steps: The brainstorming session: initial criteria from the review were presented to the panel and subsequently the experts were asked to formulate additional criteria which they thought could distinguish between patients who are in need of highly specialized care and patients who are not in need of highly specialized care. The criteria from the literature review and the additional criteria provided by the experts were merged together and subsequently edited for redundancies. Criteria were solely selected by the working group if they related to the focus question and demonstrated a similar abstraction level. Moreover, all criteria had to be clearly defined and overlapping criteria were taken together. Sorting the criteria: the experts were asked to sort these criteria into piles on the basis of shared content or theme. Rating the relevance of the criteria: experts were asked to rate the perceived importance of the The concept mapping phase resulted in a number of clusters (criteria that were sorted onto the same pile most frequently by clinicians).
A meeting with experts was organized to operationalize the clusters. The overall content of the clusters could not be changed. During the meeting, the participants were given four tasks concerning each of the clusters.
Examine the variables in the cluster. Which variables do you think should be discarded, or are there other variables that should be included? Each cluster should have a name that adequately describes the contents. Can you indicate an appropriate name for this cluster? To operationalize the cluster, it is necessary to ask the patient questions. What questions can be asked? Or what questionnaire(s) could be administered to assess how a patient scores on the cluster in question? What value should the cluster have for referral to a specific therapy?
On the basis of this process, criteria were added or omitted to the clusters. This meeting was followed by a conference call in which the clusters were operationalized and a first concept decision tool consisting of the clusters (the criteria on the tool) was presented.

Phase 3: Validation study
The concept decision tool was filled out during the intake phase by the clinician and was composed of the criteria that were acquired from the concept mapping phase. One extra question was added to indicate whether the patient needed highly specialized care or not (yes/no). The clinicians based their answer to this question upon clinical impression. During the validation phase, the cut-off point of the final set of criteria was not shown to the therapists. At the end of the validation procedure, a meeting with the expert group was held to determine whether criteria that were not significantly associated with the clinical decision should be included in the final decision tool. A second meeting was organized to determine the cut-off score of the instrument.

Participants
In total, 87 experts were approached to participate in the literature search and the concept mapping phase. Twenty-three of them provided search terms, 28 experts participated in the brainstorming session, 22 in the sorting task and 22 in the rating task. For concept mapping, our goal was to include a minimum of 15 experts to participate, since the average number of participants needed for reliable concept mapping is between 10 and 20 [18]. The data of five of the sorters could not be used due to incorrect execution of the sorting task. The pilot study included 20 therapists evaluating 44 patients, assessing the concept DTPD at the two mental health care institutions. Next, a larger validation study was performed in which seven centres participated, including 88 therapists evaluating 368 patients.

Statistical analysis
Concept mapping software (Concept mapping, 2003) was used for the digital data collection process. Demographic data on the experts regarding sex, age, number of years of experience, title and setting, and the results of the statistical analysis were also collected. SPSS (IBM SPSS statistics, version 19.0.0) was used for the statistical analysis during concept mapping and Excel (Excel, 2010) for building a database during the validation study. R was used for modelling after the validation study.
Statistical analysis during concept mapping took the form of an analysis where criteria were grouped into clusters by putting the criteria into clusters that are more similar to one another and by determining the importance of the clusters (by ratings). These clusters were then operationalized and used as the final set of criteria in the validation phase. Statistical analysis in the this phase consisted of modelling the final set of criteria and determining a threshold for the set of criteria.
In the concept mapping phase, the frequency by which participants put criteria on the same pile was assessed (see Additional file 1 for more details about the analysis). These criteria were then plotted in a two dimensional plane. Criteria that were more similar (based on the frequency by which participants put the same criteria on the same pile) were closer to each other. A "goodness of fit" test that calculated the stress index (a goodness of fit measure) was conducted. Using cluster analysis, the criteria that were more similar to each other were grouped. The working group decided on the maximum and minimum number of groups (clusters). Subsequently, a stepped procedure was followed: starting at the maximum number of clusters, at each step two clusters were combined into one (hierarchical cluster analysis) until a minimum limit was reached. The working group based their decision not only on their clinical expertise (do certain criteria belong together in a cluster?) but also on the average number of clusters the participants had created, and on the bridging values. The bridging values are a measure of coherence between the criteria in the clusters (low means high coherence) and are an indication of the probability of experts sorting those criteria together into a single cluster. The mean value of rating on the 1-6 scale was calculated for each cluster and tested on significant differences to assess if the clusters should be weighed evenly. Reliability was subsequently evaluated by means of the point-biserial correlation, through which the correlation between individual sorting and group sorting was determined. The clusters acquired from concept mapping were considered to be the criteria on the decision tool.
For the validation study of this decision tool, similarity of the criteria with clinical decision was examined in a pilot study by calculating the percentage where a specific criterion was checked as positive and where at the same time clinicians indicated that this patient should be referred to highly specialized care. In the validation study, criterion validity was assessed. For criterion validity, the sensitivity and the specificity of the decision tool were evaluated. Sensitivity is the ability of the instrument to identify patients that belong to highly specialized care. Specificity is the ability to identify those patients that do not belong to highly specialized care. To determine whether patients do (or do not) belong to highly specialized care, clinical judgement was used. A multilevel model was applied as we expected that the clinical decisions within each treatment centre would correlate more than the clinical decisions between the centres. A binomial family of functions was used with a logit link function. The correlation structure was "exchangeable". Using this model, sensitivity and 1-specificity were plotted in a Receiver Operating Characteristic Curve (ROC curve). Subsequently, the Area Under the Curve (AUC) was calculated. When sensitivity and specificity are both high, the AUC approaches 1. By using this model we could determine which of the criteria correlated significantly with clinical decision. Subsequently, easily implementable scoring systems were tested; the criteria were summed and sensitivity and specificity were determined at the specific cut-off points. For internal consistency, we calculated Cronbach's Alpha.

Phase 1: Literature search results
Respectively 8912 and 5025 studies were retrieved in PsycINFO and PubMed. These studies were selected according to the selection criteria (Fig. 2). The review yielded 11 studies, including four reviews and seven  Criteria found in the studies were mostly positively related to a specific treatment outcome or dropout, see Table 1. After removal of the duplicates, this resulted in 71 criteria, see Additional file 2. As none of these criteria were known to be directly related to referral of patients to highly specialized care, they were used in the brainstorm phase as input for the experts in formulating criteria for referral.

Phase 2: Concept mapping results
Twenty-seven experts completed questions about their demographics, see Table 2. The average age of the participants was 49 years, with on average 20 years of working experience. Most experts were psychiatrists working in an outpatient mental health care setting.

Results of the brainstorm process
Following the brainstorming session, another 35 criteria were added to the criteria of the literature search.  Selection of the criteria left a remaining total of 95 criteria, see Additional file 2 These criteria were included in the concept mapping system.

Results of the sorting process
The average number of clusters created by the participants during the sorting process was 10 (Range 5-23). The working group decided on a maximum number of clusters of 15 and a minimum of two with an optimal number of clusters of 6. In general, the bridging values (level of homogeneity) were low or acceptably low, which indicates a high homogeneity level for these six clusters. Cluster 4 exhibited the highest bridging value.
The bridging values and the criteria in the clusters are shown in Additional file 3. The clusters were presented in a cluster map with bridging values, see Fig. 3. This revealed that there were three clusters with very low bridging values (Cluster 1, Cluster 2, Cluster 3), indicating a high degree of cluster homogeneity. Goodness of fit was tested using the stress value (0 = very stable, 1 = distances between the criteria are completely at random). A stress value of 0.30 was found, which meant that the model fitted the data reasonably well.

Results of the rating process
The rating values per cluster are given in Additional file 3. Cluster 6 had a high rating score, despite its high bridging value. This means that, although the clinicians rated the criteria as important, the criteria often were not sorted into the same cluster (i.e. were not homogeneous). A t-test was performed to examine whether the scores from the rating task varied between the different clusters. As multiple t-tests were performed, a Bonferroni correction was applied (p < 0.003). Significant differences were found between cluster 1 and cluster 2. Also, a significant difference emerged between cluster 1 and cluster 3. This is caused by the higher rating of cluster 3 and cluster 2 as compared to cluster 1.

Results of the (final) expert meeting
Five experts attended the meeting, and three submitted input for the discussion in advance by email. During this meeting, the various clusters were defined that represented the content. They determined that six clusters that would be used. A relatively large number of variables were moved into cluster 6, which was in line with the high bridging values and corresponding low level of homogeneity in this cluster. Cluster 1, 2, 3, 4 and 6 were respectively operationalized as "Severe negative effect with disadaptive coping", "Severe destructive behaviour to oneself or others", "Multiple comorbid disorders on axis I and/or axis II due to severe psychiatric problems", "Severe chronic traumatisation in childhood", "Severe social and societal dysfunction: Global Assessment of Functioning (GAF) 2 <45" and "Difficulties in developing a therapeutic relationship". "Specialized treatment was not successful" was added. Also "Possibility and willingness to strictly follow minimal treatment conditions" was added as a starting point for assessing patients with the checklist. After the conference call, the set of criteria was finalized. All clusters were evenly weighted based on relevance. As there was not much difference in rating between the clusters, it was decided to weight them evenly. A preliminary cut off point was also chosen during the conference call (score ≥ 4).

Reliability
Reliability was estimated by correlating each individual sort matrix with the total matrix. The resulting correlations were all averaged. The reliability, if no Spearman-Brown correction was used, was 0.49.

Pilot study
The similarities of the outcome of the criteria with clinical judgement were as follows: severe negative effect with disadaptive coping (77%), severe destructive behaviour to oneself and others (67%), multiple comorbid disorders on axis I and/or axis II due to severe psychiatric problems (72%), severe social and societal dysfunction: GAF < 45 (62%), severe chronic traumatisation in childhood (74%), difficulties in developing a therapeutic relationship (72%) and specialized treatment was not successful (81%). All criteria were highly similar. Severe social and societal dysfunction had the lowest similarity. Subsequently, we changed the cut-off score to GAF ≤ 50 because this yielded a higher similarity (68%).

Validation study
Demographics: The characteristics of patients and therapists are shown in Table 3. There was no significant difference between the characteristics of the patients in the specialized and highly specialized care group. For therapists, only years of experience differed between the groups. In highly specialized care, therapists had more years of experience when compared to specialized care, see Table 3 (t(67.52) = 4.16, p-value = 9,2*10^-5).
Model: A multilevel model was applied. An overview of the outcomes in the model is showed in Table 4.
ROC curve: A ROC curve was plotted for the model, see Fig. 4. The area under the curve was high for the model, 0.865 (95% CI: 0.812-0.918) and subsequently the model discriminated well between low and high risk observations.
Meeting: During the meeting the experts agreed upon the criteria being part of the decision tool, as they cover expert opinion.
Scoring system: In Table 5, the criteria were summed and sensitivity and specificity were determined at the specific cut-off points. A cut-off score of 4 and a cut-off score of 5 were associated with relatively good sensitivity and specificity, see Table 5.
Meeting: In the second meeting, the experts agreed that it was more important for the tool to be sensitive rather than specific, and subsequently a cut-off score of 4 was chosen. The decision tool was then finalized, see Table 6.

Discussion
Based on evidence from literature, a consensus method and a validation study a decision tool was developed to identify patients who may benefit from highly specialized care. Experts were consulted at every step to ensure good clinical relevance. The meetings ensured that the experts played a decisive role in the realization of the final result, while at the same time taking into account the generated clusters and ratings derived from the systematic concept mapping approach.
The DTPD consisted of seven criteria, as shown in Table 6. The criteria "Multiple comorbid disorders on axis I and/or axis II due to severe psychiatric problems", "Severe social and societal dysfunction" and "Severe destructive behaviour to oneself or others" were similar to the criteria of "Comorbidity", "Social functioning" and "Harm towards others" were found in the studies of Tyrer [7] and Crawford, Koldobsky, Mulder, and Tyrer [14]. As in the validation study, these criteria were significantly associated with clinical judgement. However, our decision tool also contained additional criteria that were considered important for clinical judgement, two of which were also significantly associated with clinical judgement. This may indicate that by using a systematic method, we covered a wider range of criteria compared to other studies.

Limitations
Although a decision tool was developed that may cover a wide enough range of criteria to identify patients with personality disorders for highly specialized care, there are some limitations that need to be addressed. One limitation of the review was that an explicit statement containing information on the participants, interventions, comparisons, outcomes, and study design (PICOS) was not included. This approach was chosen to increase clarity, as the objective of this study was very broad (all interventions/comparisons were included and patients who were more severe and less severe were compared). Secondly, bias and quality of the studies was not assessed. All studies and subsequently all criteria on the decision tool that were found were included to minimize the risk of deleting important criteria. However, in the rating phase of concept mapping the importance of the criteria was assessed by the experts and criteria that were not relevant were excluded. A limitation of the concept mapping methodology is that no specific combinations of criteria can be created in the concept mapping system. For example, when the combination of comorbid disorders and low functioning is considered to be important for referral to highly specialized care but the separate criteria are not, it was not possible to address this issue in the digital system. However, when relevant, combinations were discussed during the final meeting. In future studies, it might be feasible to define these combinations in a more structured manner and at an earlier phase by arranging a separate focus meeting or by using an additional consensus method for defining combinations (such as the Delphi method). Although the goal of the decision tool is to prevent ineffective treatment for patients with a personality disorder, "Treatment in specialized care was not successful" was a criterion of the tool. The reason behind this is that in reality many patients still have ineffective treatments. Additionally, the criterion was frequently mentioned by clinicians and rated as important. The concept mapping model fitted the data reasonably well (stress value was 0.30). According to Kane & Trochim [15] a value of between 0.20 and 0.35 implies a reasonable fit. This finding is underscored by a meta-analysis of concept mapping studies, in which 95% of the stress values ranged between 0.205 and 0.365 [19]. The reliability of our study   was reasonably high, compared to the studies of Bedi [20] and Van Manen et al. [19] which found reliability estimates of respectively 0.45 and 0.56. A limitation of the pilot and validation study is that clinical judgement was used as a gold standard. However up to date, there are no other validated questionnaires that can be used to measure the same construct. An additional limitation was that only one therapist provided input on both clinical judgement and on the criteria of the decision tool. This may have contributed to bias in the validation study and in future studies this should be addressed. As for the psychometrics, the interrater reliability was not assessed -and thus the degree of agreement between therapists is not known. In addition to this, the construct validity was not measured as we did not have any instrument which would measure the same construct. In future studies the interrater reliability should be assessed and when possible the construct validity. The internal consistency assessed by Cronbach's alpha was relatively low. When criteria all measure one construct, Cronbach's alpha would be high. However, a psychological construct consists of several different related aspects. When the construct is broader, as in the current study, more aspects are measured and the Cronbach's alpha score will automatically be lower. In this way, a low alpha is not necessarily a disadvantage and may not prove a useful estimate. The selection of items on the instrument during the concept mapping phase ensured that only criteria that were thought to be clinically relevant by the experts were part of the the decision tool.
The results from the pilot study showed promise as the correlation between clinical judgement and judgement based on the set of criteria were high. The validation study confirmed the positive results for this study as the decision tool had high sensitivity and moderate specificity.
Although several forms of psychotherapy have proven to be effective in the treatment of personality disorders [21], not all patients profit from these treatments. Studies indicate that patients with more severe and complex personality disorders or specific characteristics may not profit from treatment [22] and are more prone to dropout [23]. Subsequently, they often have a long treatment history with negative results. There is, however, growing attention on early detection and early intervention to confine future damage caused by personality disorders [24]. The decision tool can be used in such a way as it may detect severe patients in an earlier stage of the disorder and improve their prognosis.