The first aim of the study was to develop an extended version of the Brøset-Violence-Checklist that includes both the structured clinical assessment of observable patient behavior as well as the unaided subjective clinical assessment of psychiatric nurses on the patient's risk of perpetrating a violent attack. The second aim of the study was to test the instruments test accuracy and application in clinical practice. To this end, we conducted a prospective cohort study involving separate samples for instrument development (derivation sample) and clinical application (validation sample). The main findings of the study were that the visual analogue scale slightly improved the diagnostic accuracy in the derivation dataset (where no interpretation was provided), but that this effect was not retained in the validation dataset (where interpretation of the score was available). In the validation dataset the test accuracy of the VAS was significantly lower than in the derivation dataset. In contrast, the performance of the BVC was identical in both samples.
What are the clinical implications of these findings? The original BVC checklist proved to be remarkably stable in the independent dataset. Apparently, the BVC checklist combines the virtues of a structured clinical method by inquiring about specific patient behaviors. While it is still left to the discretion of the rater to decide, whether a specific behavior is actually present or not (e.g. being boisterous). Such subjective decisions may be more reliable than the subjective overall assessment provided in a Visual Analogue Scale. Moreover, we cannot rule out that providing the interpretation of the score affected the ratings. Of the two assessment methods, the BVC score is closer to resembling the practice of actuarial scores. The replication of almost identical test accuracy to the original Norwegian study in two independent samples underscores the possible generalizability of the instrument. Notwithstanding these encouraging findings, a relevant issue remains the limited positive predictive value in our settings with a low prevalence of physical attacks. This underscores the need for cautious interpretation of positive results and reporting of multilevel likelihood ratios. The satisfactory test accuracy (AUCROC = 0.90) of the combined instrument when using the composite endpoint emphasizes the applicability in daily routine. Our data do not support the presumption that the test accuracy improved to a relevant extent by including the subjective element of the visual analogue scale. We hesitate to recommend to solely using the VAS, for three reasons: First, in the derivation dataset nurses were unaware of the interpretation of the VAS rating and its clinical implication. A significantly lower test accuracy of the VAS was observed in the validation dataset, were scoring mattered – suggesting possible assessment biases. Second, a checklist of observable behaviour is not only helpful for less experienced staff, but also facilitates communication. Third, the VAS-results has to be regarded as product of the hidden process of clinical reasoning (black-box). However, the nurses' feedback on the user friendliness of the combined instrument as compared to our previous experience when using the BVC alone suggested an increased compliance and acceptance of the instrument. Therefore, we have opted for using the combined instrument in the ongoing randomized controlled trial evaluating the efficacy of systematic prediction on occurrence rates of violent attacks and intense coercive measures.
Several caveats of the study must be acknowledged. A purist approach to the validation study would have mandated employing exactly the same presentation and forms as used for the derivation set in the validation dataset. Instead we skipped this step and moved directly to the clinical application of a practicable and user-friendly form along with recommendations as to the consequences of the ratings to be considered. This design feature inhibits clearer delineation, whether the observed differences in the VAS performance were due to the different sample, differing professional experience amongst staff, the alteration of the design (scale versus ruler), the immediate feedback of the result on the score or the provided recommendations. A related problem is the lack of information on the factors considered by the nurses when rating the VAS. A second limitation is the small number of events that prevented the calculation of more elaborate statistical models accounting for other patient covariates such as diagnosis or demographic variables. We are currently addressing the first problem by means of a qualitative research project, in which nursing staff is interrogated about the thoughts and considerations leading to a specific subjective risk assessment. This project will reveal whether subjective risk assessment is actually incorporating actuarial data such as knowledge about prior patient behavior. Finally, providing an interpretation and suggestion for action with the score result partially violates the condition of independence between outcome and prediction. If only the occurrence of attacks is considered as an outcome event, cases of attacks prevented by interventions initiated as a consequence of the rating may inflate the false positive rate. In contrast, the composite outcome (attacks and interventions initiated following the rating) overestimates the true positive rate. It is reassuring that the area under the Receiver Operating Characteristic curve using either outcome definition differed only by a small margin (0.90 versus 0.86). It should also be noted that the performance of the VAS in the validation dataset was similar to that of earlier reports from other investigations .
In summary, we ascertained satisfactory performance of the BVC in an independent dataset where multilevel likelihood ratio based interpretations and action plans were provided. Adding a visual analogue scale for subjective risk assessment appeared to improve the compliance of the staff with systematic risk prediction but did not result in improved test accuracy in the validation dataset. The considerable difference in test performance for the visual analogue scale between the application within a research framework (derivation dataset) and the use in daily practice warrant further scrutiny. The combined instrument is currently been tested in a multi-center randomized controlled trial to assess the efficacy of systematic risk assessment. Until these data are available the recommendation for routine use cannot be extended from the BVC risk assessment to the combined BVC-VAS instrument. Finally, it should be born in mind that attacks are rare events. Even the use of the BVC-VAS may imply that about half of the attacks will not be properly predicted and that only about 1 in 10 of all patients classified as moderate or high risk would indeed have proceeded to commit an attack.