Susceptibility (risk and protective) factors for in-patient violence and self-harm: prospective study of structured professional judgement instruments START and SAPROF, DUNDRUM-3 and DUNDRUM-4 in forensic mental health services

Background The START and SAPROF are newly developed fourth generation structured professional judgement instruments assessing strengths and protective factors. The DUNDRUM-3 and DUNDRUM-4 also measure positive factors, programme completion and recovery in forensic settings. Methods We compared these instruments with other validated risk instruments (HCR-20, S-RAMM), a measure of psychopathology (PANSS) and global function (GAF). We prospectively tested whether any of these instruments predict violence or self harm in a secure hospital setting (n = 98) and whether they had true protective effects, interacting with and off-setting risk measures. Results SAPROF and START-strengths had strong inverse (negative) correlations with the HCR-20 and S-RAMM. SAPROF correlated strongly with GAF (r = 0.745). In the prospective in-patient study, SAPROF predicted absence of violence, AUC = 0.847 and absence of self-harm AUC = 0.766. START-strengths predicted absence of violence AUC = 0.776, but did not predict absence of self-harm AUC = 0.644. The DUNDRUM-3 programme completion and DUNDRUM-4 recovery scales also predicted in-patient violence (AUC 0.832 and 0.728 respectively), and both predicted in-patient self-harm (AUC 0.750 and 0.713 respectively). When adjusted for the HCR-20 total score however, SAPROF, START-S, DUNDRUM-3 and DUNDRUM-4 scores were not significantly different for those who were violent or for those who self harmed. The SAPROF had a significant interactive effect with the HCR-dynamic score. Item to outcome studies often showed a range of strengths of association with outcomes, which may be specific to the in-patient setting and patient group studied. Conclusions The START and SAPROF, DUNDRUM-3 and DUNDRUM-4 can be used to assess both reduced and increased risk of violence and self-harm in mentally ill in-patients in a secure setting. They were not consistently better than the GAF, HCR-20, S-RAMM, or PANSS when predicting adverse events. Only the SAPROF had an interactive effect with the HCR-20 risk assessment indicating a true protective effect but as structured professional judgement instruments all have additional content (items) complementary to existing risk assessments, useful for planning treatment and risk management.


Background
The assessment of risk of violence [1][2][3][4] has developed into 'structured professional judgement' approaches to risk assessment [1,5,6]. Identifying risk factors is held to be an aid to treatment planning [7] and perhaps for this reason risk assessment has come to pervade forensic mental health practice.
Doyle and Dolan [8] reviewed what they called 'generational' developments or phases in risk assessment. The 'first generation' -unstructured clinical or professional judgement [9] gave way to the second generation actuarial risk assessment tools [8]. However, the actuarial approach was criticised for focusing on a limited number of factors without taking into account potentially crucial case-specific idiosyncratic factors [8,10]. A combination of both the clinical and actuarial approaches was required. This led to the development of the third generation risk assessment [8] described as empirically validated structured decision making [11] or structured professional judgement (SPJ) [12]. The leading structured professional judgement instrument for the assessment of risk of violence has been the Historical-Clinical-Risk Management-20 (HCR-20) [13]. This added the distinction between fixed historical risk factors and dynamic factors that are subject to change over time and in response to treatment. Although rated according to a set of defined risk items, the final judgement of risk level allows for clinical judgement rather than a simple actuarial score.
Forensic hospital patients and others like them are also at a greatly increased risk of suicide both in hospital and on returning to the community [14]. There has also been a recent interest in the assessment of risk of suicide and self-harm using structured professional judgement instruments [15].
Gaps in the structured professional judgement approach to risk assessment could be identified. Protective or resilience factors that might reduce risk of violence were first used in the structured clinical risk assessment instruments devised for children and adolescents [16,17]. This reflects not only the importance of resilience as a developmental factor in young people, but also the reality that protective factors are taken into account by clinicians when making decisions about risk and treatment. The assessment system for adults should therefore allow for a broader assessment of susceptibility factorsnegative risk or vulnerability factors that increase the probability of violence and self harm, and positive, protective or resilience factors that reduce the risk of violence and self-harm. Several new SPJ risk assessment instruments have appeared that are designed to assess protective factors or progress in treatment and recovery as part of the assessment and management of susceptibility (risk and protective factors) for violence and self-harm.
The Short-Term Assessment of Risk and Treatability (START) is a clinical guide for the dynamic assessment of risks, strengths and treatability which is relevant to everyday psychiatric clinical practice [18]. According to the authors and others, the START is intended to "stimulate discussion about strengths, vulnerabilities and appropriate interventions and management" [19,20].
The SAPROF [21] is a recently-developed instrument for the assessment of factors protecting against violent acts. By specifically focusing on protective factors, the SAPROF aims to provide a more accurate and well-rounded assessment of risk for future violent behaviour [21].
The DUNDRUM-3 programme completion scale and DUNDRUM-4 recovery scale [22,23] are two structured professional judgement instruments designed for use as measures of progress along the recovery pathway for those detained in secure forensic psychiatric services. These have been shown to predict moves from more secure to less secure places, along with measures of risk [24] and they have been shown to predict conditional discharge from hospital to the community [25]. The DUNDRUM-1 triage security instrument is a measure of the need for therapeutic security and is designed to be a static measure of a quality that is complementary to and distinct from risk of violence [26]. It has been shown to influence moves between levels of therapeutic security [24] and it is used also as a benchmark to enable comparisons between studies [26]. The SPJ instruments of the DUNDRUM toolkit are all designed to be complimentary to measures of risk of violence or self-harm. The DUNDRUM-3 and DUNDRUM-4 are included here as they can be conceptualised as positive or protective factors likely to reduce the risk of violence and self-harm.
Rutter [27] pointed out that a protective or resilience factor should do more than simply predict the absence of harm or adverse outcomes, since predicting the absence of harm is merely the absence of risk. Risk or vulnerability factors (or their reciprocals, measuring the absence of risk) and protective or resilience factors can be validated as predictive or not using the receiver operating characteristic, as a means of taking into account base rate variations between samples [28]. The strength of association in the specific population and setting studied can be assessed with unadjusted odds ratios. According to Rutter [27] a truly protective factor would interact with risk factors to reduce the probability of an adverse event or outcome, even when risk factors were present. This requires a form of analysis of interactive effects additional to that normally used to validate risk factors.

Objectives
In this prospective study we set out to assess psychometric properties, concurrent validity and criterion outcome measures of the validity of the START and SAPROF. We prospectively tested whether START and SAPROF, DUNDRUM-3 and DUNDRUM-4 would predict adverse events (or the absence of adverse events), violence or self harm. We compared these to existing validated instruments for the assessment of risk of violence  and self-harm (S-RAMM) and examined whether they accounted for any element of statistical prediction over and above an existing 'gold standard' instrument for the assessment of risk of violence, the HCR-20. We also examined the predictive properties of measures of symptoms (PANSS [29,30]) and global function (GAF [31]) treating these as another standard to be beatenare specific risk assessment instruments and their constituent items better than assessing symptoms and function?

Study design
This is a naturalistic six month prospective cohort study of in-patients in a therapeutically secure forensic hospital. Data were gathered as part of the clinical audit of service delivery and the study was approved by the research ethics, audit and effectiveness committee of the National Forensic Mental Health Service. All patients gave informed consent to participate.

Setting
The Central Mental Hospital is a 94 bed forensic secure hospital providing high, medium and low security integrated on a single campus. The hospital is the only legally designated centre for forensic mental health treatments for a population of 4.6 million. At the time of the study the hospital was organised into a series of eight units from high secure admission and intensive care through medium secure and low secure to pre-discharge and community high support places so that the location at the start of the study period can be used as an index of the level of therapeutic security for the environment in which the patient is located [24].

Participants
All patients at the Central Mental Hospital during the period March to April 2010 (n = 100) with severe mental illness participated as part of routine assessments of risk and outcome measures.

Variables and data sources
The researchers who made ratings or collated them were each blind to the work of the others. One post membership psychiatric trainee (ZA) rated the START and SAPROF by interviewing patients, reviewing case notes and speaking to members of the multi-disciplinary team and ward-based nursing staff. The START takes a list of risk factors and treats each one as both a risk factor and a protective factor. The SAPROF includes items thought to be protective against violence such as intelligence, secure attachment in childhood and empathy that are not included in existing risk assessment instruments. Two post membership psychiatric trainees (LN and OG) carried out interviews using the PANSS and GAF.
The HCR-20 provides ratings for ten stable historical risk factors (HCR-H items) though we omitted item H7 'psychopathy' as this was not in routine use, five current 'clinical' (HCR-C) and five future 'risk management' (HCR-R) items. Each item is scored 0 to 2 and the total scale is scored 0 to 38. The 'C' and 'R' items added together constitute a 'dynamic' or change sensitive score (HCR-dynamic, scored 0 to 20). Similarly, the Suicide Risk Assessment and Management Manual S-RAMM [15] is made up of 23 items each scored 0 to 2 with an overall scale score from 0 to 46 and generates a nine item stable, background score (S-RAMM-B) and change sensitive dynamic scales for eight current (S-RAMM-C) and five future (S-RAMM-F) risk items for self-harm or suicide, the latter two of which combine as a thirteen item dynamic score (S-RAMM-dynamic) rated 0 to 26. The HCR-20 and S-RAMM scales were collated from team assessments by an advanced nurse practitioner (AN) who ensured quality and fidelity to the handbook definitions.
Measures of need for therapeutic security (the DUND RUM-1), treatment programme completion (DUNDRU M-3) and recovery (DUNDRUM-4) [22] were assessed by a forensic psychiatry lecturer / higher trainee (MD). These SPJ scales are composed of items rated 0 to 4 where '0' indicates no need for therapeutic security, '1' indicates a need for admission to an open ward or equivalent, '2' for low security, '3' for medium security and '4' for high security. For the DUNDRUM-3 and DUNDRUM-4 '4' indicates no readiness for a move to a less secure place, '3' indicates a move from high to medium security, '2' a move from medium to low security, '1' a move from low security to open conditions and '0' indicates no need for therapeutic security. The DUNDRUM-1 (eleven items rated 0 to 44) was used to provide a benchmark for comparative purposes so that other researchers replicating this study or carrying out meta-analyses can compare groups of patients according to their assessed need for therapeutic security. The scale includes items for seriousness of violence and self harm, immediacy of risk of violence and self harm, specialist forensic need, absconding, preventing access to contraband, victim sensitivity and public confidence, complex risk of violence, institutional behaviour and legal process. The DUNDRUM-3 (seven items rated 0 to 28) is a measure of programme completion in domains relevant to risk and harm reduction such as physical and mental health, substance misuse, problem behaviours, self-care and activities of daily living, education, occupation and creativity and family and social networks.
The DUNDRUM-4 recovery items (six items rated 0 to 24) include stability, insight, therapeutic rapport, leave, dynamic risk and victim sensitivity.

Validity of measures
We first measured inter-rater reliabilityalthough this was not necessary in the design of this study. Inter-rater reliability refers to the extent of convergence of judgements about individual items and overall scale scores of different assessors using the tool on the same patient.
We tested concurrent validity with the HCR-20 and S-RAMM because of the expected inverse relationship on the one hand between risk assessment scales HCR-20 and S-RAMM and the protective scales START-strength and SAPROF. The S-RAMM had been validated for the prediction of self-harm in this population and was known to overlap with assessment of risk of violence [32,33]. We also expected positive correlations of HCR-20 and S-RAMM with the START-vulnerability score. We examined concurrent validity with the PANSS because of the known relationship between active symptoms and risk assessment measures [30]. We examined concurrent validity with the GAF because of the expected positive correlation with the protective scales START-strengths and SAPROF and because of the expected inverse relationship with the START-vulnerability scale. Finally we examined concurrent validity with the DUNDRUM-3 programme completion and DUNDRUM-4 recovery scales because they are measures of progress in treatments relevant to risk and increasing strength in domains related to recovery for forensic patients. Lower scores on these scales could be taken to represent 'negative predictors' or protective factors.

Outcome measures
The outcome measures were any adverse events. An adverse event was defined as in the START [19] handbook (page 9) where violence is defined as "any actual, attempted or threatened harm to self or others". However in this study we have distinguished between violence and self-harm. The START handbook goes on to define self harm as "behaviours involving intentional injury of one's own body without apparent suicide intent". We have supplemented this by including any self-harming act whether it was thought to be with suicidal intent or not.
Adverse events were collated by one researcher (ZA) from routine incident report forms. These were supplemented by nurse management daily logs and statutory forms for seclusion and restraint over a 6 month period from March to April 2010 until 31st November 2010. These alternative sources of information acted as a cross check on the completeness of the record of adverse events.

Study size
All patients in the hospital during the period of baseline data gathering were included. START, SAPROF, HCR-20, S-RAMM, PANSS and GAF were obtained for 98 of 100. The DUNDRUM-1, DUNDRUM-3 and DUND RUM-4 could not be completed for 6 patients who were discharged before these measures could be completedthey had a significantly shorter length of stay when assessed -0.28(SD 0.46) years v 7.67(SD 10.09) years, (t = −7.0, df = 98, p < 0.001). The follow-up period was complete to the date patients left the hospital or to the end of the study period.
Based on earlier studies [33,34] we estimated that approximately 10 violent and 10 self-harming adverse events might be expected over a six month period and that these would be sufficient to yield an area under the curve (AUC) in the receiver operating characteristic (ROC) that was capable of being significantly different from the line of random information.

Quantitative variables
The patients were grouped according to their location in the hospital as this has been established as a proxy for risk levels [23,24,32,33]. Adverse events were further subdivided into violence and self-harm, as outcome measures for the prospective study.

Statistical methods
All data were analysed in SPSS-20 [34]. Correlations were calculated using the non-parametric Spearman correlation coefficient. Adverse events as outcomes of the prospective study were analysed using the receiver operating characteristic (ROC) area under the curve (AUC). An association was deemed significant if the 95% confidence interval of the AUC was greater than 0.5, the line of random information. The strength of association between measures and outcomes was measured using unadjusted odds ratios (OR). Because the odds ratio is a measure of the increase in odds for each increase of one point in the measurement scale, the magnitude of the odds ration differs according to the properties of item and scale scores. The odds ratio for a scale such as the GAF which is rated 0 to 100 will be inherently smaller when comparing like for like with the HCR-20, a scale rated 0 to 40. Likewise, the odds ratio for items rated 0 to 2 as in the HCR-20, S-RAMM, SAPROF and START will appear larger when comparing like for like with items from the DUNDRUM-1, DUNDRUM-3 and DUN DRUM-4 where items are rated 0 to 4. Confidence intervals for epidemiological rates were calculated using Confidence Interval Analysis [35].
Cronbach's alpha statistic was used to measure the extent to which each item fits into the subscale or overall scale to which it is allocated. This is a measure of content coherencewhether all items in the overall scale or subscale measure the same thing. High internal consistency also indicates multiple co-linearity for items within a scale.
To examine the extent to which the protective instruments SAPROF, START-S, and recovery instruments DUNDRUM-3 and DUNDRUM-4 were protective in the presence of risk factors, we first carried out an analysis of variance in SPSS-20 with SAPROF, START-S, DUN DRUM-3 and DUNDRUM-4 as dependent variables, violence to others or self-harm as fixed factors (in separate analyses) and the HCR-20-dynamic score as covariate. To examine for interactive effects, we then carried out univariate analysis of variance to examine for main effects and interactive effects.
For item to outcome analysis, the receiver operating characteristic (ROC) area under the curve (AUC) and 95% confidence interval was calculated for each item, with harm to others and self harm as outcome measures. The unadjusted odds ratio (OR) and 95% confidence interval was also calculated for each item and both outcomes, as a measure of the strength of association. Because the items of each scale were strongly inter-correlated, regression models for the items of each scale were not attempted.
Mean follow-up time was 181.9 days (SD 70.3, range 0 to 265). The number of patient-days at risk was 18,190.
Inter-rater reliability of new measures START and SAPROF For 21 patients rated at different times by SM and ZA, the SAPROF total score correlated Spearman's r = 0.829, p < 0.001. For the START-strength score, r = 0.694 p < 0.001 and for START-vulnerability score, r = 0.853, p < 0.001. The data subsequently analysed are the ratings made by one researcher ZA. These correlations are given only as an indication of the utility of the instruments.

Construct validity
The SAPROF and START were compared with each other. If the 'strengths' scales are valid, they should correlate positively with each other. If the concept of 'strengths' is distinct from risks or vulnerabilities, the strengths scales should not correlate strongly with risk or vulnerability scales. The START-S and SAPROF correlated strongly with each other (r = +0.810 p < 0.001), indicating that they measure the same construct. (Table 1). If the START strength and START vulnerability scales measure different constructs, they should not correlate. The correlation between the two was very strong and inverse r-0.947 p < 0.001 indicating that they measure the same thing, one as the inverse of the other.

Concurrent validity
The SAPROF and START are said to measure dynamic factors and so they are not expected to correlate with established scales or sub-scales made up of historical or static risk factors. Table 1 shows that the START-S correlated moderately and inversely with HCR-H and weakly and inversely with SRAMM-B. The SAPROF correlated moderately and inversely with the HCR-H and weakly with the S-RAMM-B.
If the SAPROF and START-S measure something different from risk, they should not correlate with the HCR-20 dynamic or S-RAMM dynamic risk assessment scales. Actually they correlated strongly but inversely with the HCR-20 dynamic scale. There was a moderate inverse correlation between the S-RAMM dynamic scale and the START-S. The S-RAMM dynamic score also had a moderate inverse correlation with the SAPROF (Table 1).
START and SAPROF were also compared with measures of global function and mental state. There was a strong positive correlation between GAF and START-S and strong inverse correlation between GAF and STA RT-V. SAPROF and GAF correlated best (Table 1). Table 1 shows that the PANSS-positive, PANSS-negative, PANSS-general PANSS-total scores and the PANSS-SAR score all correlated strongly and inversely with START-S. PANSS-positive, PANSS-total and PANSS-SAR scores correlated positively with START-V but START-V correlated less well with PANSS-negative and PANSS-general scores ( Table 1). The SAPROF correlated inversely with the PANSS scales.
The DUNDRUM-1 correlated weakly and inversely with the START-S, moderately with the START-V and had a weak inverse correlation with the SAPROF. The DUNDRUM3 and DUNDRUM-4 both had strong inverse correlations with the START-S, strong positive correlations with the START-V and strong inverse correlations with the SAPROF.

Prospective study of violence and self harm
Thirteen individuals had adverse incidents concerning harm to others (broadly defined, as above) during the follow up period and 7 individuals had incidents involving self harm (broadly defined, as above). There was a significant overlap between self-harm and harm to others (X 2 = 35.2, df = 1, p < 0.001, phi = 0.593, p < 0.001). The rate of events of harm to others (the base rate for violence) was 7.1 per 10,000 patient-days at risk (95% confidence interval 3.8 to 12.2/10,000) and the rate of self-harming events (the base rate for self-harm) was 3.8 per 10,000 patient-days at risk (95% CI 1.5 to 7.9/10,000).
The location at baseline (for eight locations from the most to least secure) predicted harm to others (AUC = 0.812, 95% confidence interval 0.677 to 0.948, p < 0.001) as expected, since we have previously shown that location is a proxy for measures of risk [32] and recovery [23,24]. Length of stay at the beginning of the observation period did not predict harm to others (AUC = 0.504, 95% CI 0.343-0.665, p = 0.963). Location at baseline also predicted self-harm (AUC = 0.838, 95% CI 0.689-0.987, p = 0.003) while length of stay did not predict self harm or the absence of it (AUC = 0.578, 95% CI = 0.383-0.722, p = 0.495). Table 2 shows that the SAPROF score predicted both the absence of violence and self harm (absence of violence AUC = 0.847 and absence of self-harm AUC = 0.766). The START Strengths and START Vulnerabilities predicted violence (START-S and absence of violence AUC = 0.776, violence; START-V and presence of violence AUC = 0.823) but not self harm.
By contrast, the HCR-20 predicted both violence (AUC = 0.872) and self harm (AUC = 0.881) as did all of its sub-scales. The S-RAMM predicted violence (AUC = 0.838) though not as quite so well as the HCR-20 and the S-RAMM predicted self-harm (AUC = 0.818) as did the S-RAMM sub-scales, though the S-RAMMbackground and future scales did not reach significance for self-harm in this study. It is interesting to note that the SAPROF did almost as well as the S-RAMM as a predictor of the absence of self harm.
The GAF score was a significant predictor of the absence of both violence and self harm, with high AUCs (absence of violence AUC = 0.813 and absence of selfharm AUC = 0.855).
PANSS positive, PANSS general and PANSS total scores each predicted violence and self harm though the PANSS negative symptom score was neither a positive nor a negative predictor for violence or self harm. The odds ratios for these sub-scale scores, though significant, were only modestly better than chance. The PANSS supplemental aggression risk (SAR) score (not Table 1 Cross validation using Spearman's rank correlation coefficient All significant at p < 0.001 except DUNDRUM-1 vs SAPROF p < 0.005. included in the PANSS total score) was also a significant predictor of both harm to others and harm to self. Contrary to expectations, The DUNDRUM-1 triage security score predicted violence (AUC = 0.743) and though the AUC for the prediction of self harm did not reach significance, the odds ratio did (OR = 1.226). The DUNDRUM-3 programme completion score predicted violence (AUC = 0.832) and self-harm (AUC = 0.750), while the DUNDRUM-4 recovery scale also predicted violence (AUC = 0.728) and self-harm (AUC = 0.713).

Interactive effects between risk factors and protective factors
Univariate analysis of variance was used to test for the presence of interactive effects between risk measures and protective measures. Tables 3 and 4 show that the SAPROF, START-S, DUNDRUM-3 and DUNDRUM-4 were all significantly different for the 13 patients who were violent when compared to the non-violent. Likewise for the 7 who self harmed compared to those who did not self-harm. However when these results were adjusted for the HCR-20-dynamic score the differences were no longer significant.
For harm to others, the SAPROF and HCR-20-dynamic had significant main effects (HCR-20-dynamic F = 3.97, df = 17, p = 0.003, SAPROF F = 4.67, df = 25, p < 0.001) and a significant interaction effect (F = 2.973, df = 38, p = 0.008) indicating that the SAPROF had a 'true' protective effect. The SRAMM-dynamic score also had a significant interaction with the HCR-20-dynamic score (HCR-20-dynamic F = 3.828, df = 18, p = 0.001, SRAMMdynamic F = 3.909. df = 20, p < 0.001, interaction F = 2.794, df = 33, p = 0.003) apparently indicating a synergistic effect. The START-S, DUNDRUM-1, DUNDRUM-3, DUNDRUM-4, PANNS positive, PANSS negative, PANSS  Item to outcome analysis Table 5 shows the performance of each item of the SAPROF as predictors of violence and self-harm. Twelve of the 17 items predicted the absence of violence including factors such as empathy (OR = 0.231), coping ability (OR = 0.187), self-control (OR = 0.205), work and leisure activities (OR = 0.336), financial management (OR = 0.231), motivation for treatment (OR = 0.340) and attitudes towards authority (OR = 0.264). Five items in the SAPROF predicted the absence of self-harm including empathy (OR = 0.293), coping (OR = 0.192), self-control (OR = 0.260), leisure activities (OR = 0.203) and use of medication (OR = 0.314). Odds ratios could not be calculated for items 14 to 17 (intimate relationships, professional care, living circumstances, external control) because for in-patients in a secure hospital there was too little variation in these item scores. Table 6 shows that 16 of the 20 START-strengths items predicted the absence of violence (strongest odds ratios impulse control OR = 0.244, external trigger OR = 0.249), though only one (mental state OR = 0.180) appeared to predict absence of self-harm. Table 7 shows that for START-vulnerabilities, 16 of 20 items predicted violence, not always the same items as for STARTstrengths (strongest associations 'relationships' OR = 5.5, 'external triggers' OR = 6.3, 'conduct' OR = 5.1), while 'mental state' was again the only item predicting selfharm (OR = 3.9). Table 8 shows that five of the ten historical items of the HCR-20 predicted violence in this in-patient forensic group (strongest associations 'early maladjustment' OR = 1.2 and 'prior supervision failure' OR = 1.2) while four of the five 'current' or 'C' items and three of the five 'risk management' or 'R' items predicted violence (e.g. C5 'unresponsiveness to treatment' OR = 3.9, R4 'non-compliance' OR = 4.8). H1 'past violence' was not a significant predictor in this population as all subjects scored positive.
For the prediction of self-harm in this group, two of the ten HCR-20 'H' items, four of the five 'C' items and one of the five 'R' items were better than chance. As before, odds ratios for individual items were an interesting guide to the relative importance of items, with highest odds ratios for items C5 'unresponsiveness to treatment' OR = 6.1, C1 'Lack of insight' OR = 5.8, H 2 'young age at first violent incident' and H3 'relationship instability' both OR = 5.5. Table 9 shows that for the S-RAMM, one of the nine background or 'B' items, two of the eight current or 'C' items and one of the five future or 'F' items predicted  violence, while two of the nine 'B' items predicted selfharm (B1 'history of deliberate self harm' OR = 2.5, B3 'previous hospitalisation' OR = 6.4), as did three of the eight 'C' items (C3 'psychological symptoms' OR = 3.8, C7 'psychosocial stress' OR = 3.4 and C8 'problem solving deficits' OR = 7.9). Table 10 shows that three of the eleven DUNDRUM-1 triage security items (scored 0 to 4) predicted violence (TS4 'immediacy of risk of suicide or self harm' OR = 1.5, TS9 'complex risk of violence' OR = 3.3, TS10 'institutional behaviour' OR = 2.7) and 3 items predicted self-harm (TS2 'seriousness of self harm' OR = 1.4, TS4 'immediacy of risk of suicide/self harm' OR = 1.5, TS10 'institutional behaviour' OR = 2.7). For the DUNDRUM-3 programme completion items, all seven predicted violence (odds ratios ranged from 1.9 to 4.9) and four predicted self-harm (odds ratios ranged from 1.9 to 4.9). Item PC2 'Mental health' had the strongest odds ratio for both harm to others (OR = 4.9) and to self (OR = 5.2). For the DUNDRUM-4 recovery scale, four of the six items predicted violence and 3 predicted self-harm. Odds ratios were similar for harm to others and harm to self, except for the item derived from the HCR-20 dynamic risk scale, which had a stronger odds ratio for harm to others.
For the PANSS, Table 11 shows that of the seven positive symptoms (scored 0 to 7), four were significantly associated with violence ('conceptual disorganisation' OR = 1.9, 'hyperactivity' OR = 3.6, 'suspiciousness' OR = 1.5 and 'hostility' OR = 2.2). Some of these were marginal (AUC > 0.5) and neither delusions nor hallucinations were associated with violence in this treated in-patient group of patients with severe mental illness, because of lack of variation in the population studied. Of the seven negative symptoms only one, 'poor rapport' (OR = 1.7) was associated with violence and of the 16 general symptoms, 'tension' (OR = 2.1), 'uncooperativeness' (OR = 1.7), 'poor attention' (OR = 2.3) and 'poor impulse control' (OR = 2.4) were associated with violence. Odds ratios could not be calculated for item G10 'disorientation' as all subjects scored negative.
The three items of the PANSS supplemental aggression risk scale (SAR) predicted both harm to others ('anger' OR = 2.6, 'difficulty in delaying gratification'

Main findings
This paper presents validation studies for 'fourth generation' risk assessment instruments. We have examined the utility of these instruments for assessing risk and protective factors for both violence and self-harm. We have identified both overlaps and differences in the risk factors that contribute to predictions of risk of violence and self-harm. We have included some methodological approaches intended to facilitate future researchers who might replicate this work or include it in meta-analyses. These include stating the base rates for violence and self-harm and giving the DUNDRUM-1 triage security ratings as a means of benchmarking the background need for therapeutic security. We believe the most important finding is confirmation that true protective effects can be identified. The SAPROF, a protective scale does more than assess the absence of riskthe SAPROF also had an interactive effect with the HCR-20, offsetting risk. The SAPROF and START achieved satisfactory levels of inter-rater reliability. The SAPROF and START have good internal consistency. The START strengths and START vulnerabilities scores were strongly inversely correlated, suggesting that the START strengths score is simply the risk measure repeated. However there is sufficient difference in content between the START strengths and SAPROF on the one hand and the HCR-20 sub-scales to explain the interactive effect between the HCR-20 and the SAPROF, so that the 'strengths/ protective' paradigm is not merely the same risk factors in new clothes.
The DUNDRUM toolkit instruments were not designed as risk assessment instruments; they were designed to be complementary to risk assessments. The DUNDRUM-1 was included only as a benchmarking measure to enable future replication and meta-analysis. In spite of this, the DUNDRUM-1 triage security scale predicted violence and some of its items predicted violence and also self harm. This may be because items such as suicidal behaviour, complex needs and institutional behaviour are indicators of the seriousness of the behaviour that follows and such acts are easier to detect. DUNDRUM-3 programme completion and DUNDRUM-4 recovery scales were good predictors of violence, comparable to the HCR-20 sub-scales and total score. The S-RAMM, an assessment of risk of suicide and self-harm, was a predictor of violence and the S-RAMM dynamic score had a synergistic interaction with the HCR-20-dynamic score. The GAF and PANSS scales (other than the PANSS negative score) also performed well as predictors of violence. Although designed as assessments of risk and protective factors for violence, most scales were also predictive of self-harm. The S-RAMM-C scale performed well but the S-RAMM-B (background or fixed historical risk factors for suicide), S-RAMM-F ('future' risk factors for suicide), START-S and START-V were notable for their lack of predictive capability for self harm in this study. The overlap between risk factors for violence and selfharm, and the need to assess both has been established [32,33,37,38]. An item analysis shows considerable overlap of the content of each of the scales examined here. Of these, the DUNDRUM-3 programme completion items appeared particularly strong predictors of self-harm or the absence of it, perhaps because of an underlying element of positive motivation that is inherent in the way each item is defined and rated. Of greatest relevance is that most scale scores were predictors of both violence and self-harm, though this was often because of different items within each scale. Much of this appears to be contextual. In a group made up of forensic patients admitted to a forensic hospital because of severe mental illness and violence, it is not surprising that items such as the first item of the HCR-20 'past violence' should be poor discriminants for further violence in a group where all score positive. It is important to note that in this context items such as the first S-RAMM item 'past self-harm' are such good predictors of violence to others, though not self-harm, while items such as HCR-20 C5 'unresponsiveness to treatment' and S-RAMM item C3 'psychological symptoms' and C8 'problem solving deficits' predicted both harm to others and self-harm.
For violence, only some items in each scale were predictive. The highest AUC results were obtained for lack of progress in treatment programmes such as education, occupation and creativity, a low GAF score, conduct problems, lack of progress in mental health programmes, impulse control, adverse institutional behaviour, leisure activities, external triggers, negative attitudes, poor attention, financial problems, hyperactivity and self-control, relationship problems, stability, empathy and hostility.
For self-harm a different selection of items predicted adverse events with the highest AUC results for the GAF, poor attention, conceptual disorganisation, lack of progress in mental health programmes, disturbance of volition, unresponsiveness to treatment, adverse institutional behaviour, preoccupation, leisure activities, hyperactivity, tension, problem solving deficits, stability, self-control and negative attitudes. A notable feature emerges when unadjusted odds ratios are compared with AUC results. Scales and items with significant AUC statistics may have better than random sensitivity and specificity, but may still be weak predictors. While this reflects the reality of multiple colinearity, any argument that a risk assessment scale made up of just a few of the strongest items would be sufficient is at odds with the clinical need to take notice of a much wider range of risk and protective factors when planning care and treatment [7,8] and when making recommendations or decisions regarding discharge [25]. However it is also the case that using structured professional judgement instruments to assess treatment needs would be invalid if many of the items were poor predictors on their own. We believe the poor performance of many scale items should lead to two forms of revision of these scales. The first would be to specify that some items are useful only for certain contexts -such as in-patient settings, out-patient community placements or prisons. The second would be to drop some items or refine their handbook definitions.

Study limitations
This paper describes the predictive validity of the SAP ROF, START, DUNDRUM-3 and DUNDRUM-4, PANSS and GAF -a range of risk assessment, symptom measurement and outcome measurement instruments for a group of forensic in-patients, including only 6% women. The outcomes found here may not generalise to other settings. Factors influencing in-patient violence may not generalise to violence in the community. Similarly, self-harm in hospital may not equate to self-harm in the community. Replication in other populations would be helpful. However we found that length of stay did not predict violence or self harm whereas location along the continuum of care did. This demonstrates that in this hospital milieu, location was determined by risk and need for therapeutic security, not by a simple chronological waiting list or tariff for movement from more secure to less secure locations. The location was therefore a proxy for risk and an indication that placement and milieu were appropriate to individual need, to manage and reduce risk [39][40][41]. Many items included in validated risk assessment instruments such as the HCR-20 appeared not to be predictive in this study. The item H7 'psychopathy' was omitted in keeping with modern practice and the latest revision of the HCR-20. This study size and length of follow-up may have been insufficienta longer followup period would have generated a higher base rate for violence and self-harm. We have stated the exact base rates for these events to enable future research and meta-analysis to make valid comparisons. Future studies in more acute in-patient populations (acute psychiatric intensive care units) might be expected to observe higher base rates, more incidents of violence and self-harm amongst fewer patients. Such a study might show stronger effects for more risk factors. Similarly, studies of community based samples might yield fewer adverse events and very different results for individual items. However the completeness and reliability of the recording of such events in the community might be less reliable.
There may be other predictive factors not included in the instruments studied here. The PANSS does not include specific items for sadness or hopelessness, though the item 'psychological symptoms' in the S-RAMM would include such symptoms.
According to Rutter [27] a proper analysis of protective effects or resilience would have to involve examining for the effects of protective factors and the interactions they might have not just with risk factors but also with adverse life-events and difficulties. We found some evidence for an interactive effect between the SAPROF and the HCR-20 dynamic score. However a further analysis of interactions between items is required. It may be that some 'strength' items are protective against some 'vulnerability' risk items but not others. Analysing the many possible combinations would require very large numbers to allow correction for multiple testing, an analysis beyond the scope of this study. Including the location in an analysis of variance may go some way towards this. In a forensic hospital with high levels of staff to patient ratios and professional, 'low expressed emotion' interactions, provocations may be so limited that even high personal risk factors are less likely to lead to violence or self harm than might occur in the community. And high average levels of positive symptoms and some risk factors may overshadow other risk factors, making powerful risk factors appear to be poor discriminants in that setting [42]. Finally, this prospective study covered a six month observation period, though a shorter observation period may have been more meaningful, at least for symptom ratings. A longer observation period might have improved

Future validation studies
The definitive study would probably have to screen very large numbers of high-risk patients in order to demonstrate an effect, including interactive effects. Serial assessments to demonstrate change and assess the effect of change would be of interest. Mindful of the criteria suggested by the Risk Management Authority of Scotland for an evidence based risk assessment tool, it may be necessary to replicate studies such as this in different populations and cultures. Future studies should also consider the synergistic interactions between individual items. True 'protective effects' may only emerge from such studies [43,44].

Advantages of fourth generation (START and SAPROF) structured professional judgement instruments in clinical practice
Measures of strengths and protective factors such as START-S and SAPROF, as well as measures of progress in treatment programmes and recovery such as the DUNDRUM-3 and DUNDRUM-4 performed well in this study of in-patient violence and self-harm in a forensic setting. While identifying items as 'protective' rather than 'vulnerability' factors may to some extent be a matter of semantics (the START-S and START-V appear to be simple reciprocals of each other), new content can be found in the SAPROF and in the DUNDRUM-3 programme completion and DUNDRUM-4 recovery instruments as well as some START items. The S-RAMM item B1 "history of deliberate self harm" predicted harm to others as well as harm to self (AUC 95% CI greater than 0.5) in this population. In contrast, the HCR-20 H1 item "previous violence" was a poor discriminator in this forensic population, because almost all patients scored positive. This may be an unexpected benefit of using diverse assessments and may also be understandable in psychological terms. The S-RAMM is also valuable because so much of the item content is not replicated in violence risk assessment instruments.
The new content of instruments such as the SAPROF, START, DUNDRUM-3 and DUNDRUM-4 as well as the S-RAMM lends itself to the use of risk assessment as a form of needs assessment and a means for planning care and treatment. These positively connoted factors are more likely to be acceptable to patients or service users when working to engage them in recovery oriented programmes in which risk management is important.
The pairing of assessments of risk of violence to others with assessments of risk of self-harm and suicide is similarly a means of identifying the process of risk assessment and risk management with the patient or service user's best interests rather than a process intended exclusively to serve the purposes of criminal justice and public protection.
The START has been shown to have good psychometric properties and to be predictive of violence when used as part of the assessment for mental health review boards in a forensic hospital [45]. De Vries Robbé et al. found that the use of the SAPROF can be helpful in formulating treatment goals, progressing through stages of treatment, planning the phasing of treatment and facilitating risk communication [46,47]. The DUNDRUM-3 and DUNDRUM-4 are intended to serve the same processes of treatment planning and measurement of treatment outcome in domains relevant to risk reduction and risk management and to provide transparency when reporting to mental health tribunals and boards [22][23][24][25][26].
The GAF, a simple global assessment of function, performs as well as many of the more specific assessment instruments. It may be that global function is the most consistent underlying measure of risk and resilience. Alternatively this may be an example of the use of an 'intuition based' assessment. Carroll [48] has recently pointed out the advantages of such intuitive approaches while cautioning that a structured professional judgement instrument should always be used alongside intuitive assessments as a valid, transparent, deliberative and unbiased check on some of the problems that can arise with intuitive methods.

Conclusions
The START and SAPROF have good psychometric characteristics for use as clinical or research instruments in severe mental illness. The SAPROF predicted the absence of violence and self harm. The START-Strengths predicted absence of violence but not self harm. The DUNDRUM-3 programme completion and DUNDRUM-4 recovery instruments also predicted violence and self-harm. They were not consistently better than the HCR-20, S-RAMM, PANSS or GAF in predicting adverse events-violence or self harm. The SAPROF, START, DUNDRUM-3 and DUNDRUM-4 however have the advantage of covering a wider content than existing risk assessment instruments and have different purposes. The GAF performs as well as the scale scores of most specific risk assessment instruments. Many individual items in the SPJ instruments studied were strongly associated with adverse outcomes in this setting, meriting further study of context and interactive effects.
paper, designed the study and carried out the data analysis. All authors reviewed the drafts and agreed the final text. All authors read and approved the final manuscript.