Identifying long-term and imminent suicide predictors in a general population and a clinical sample with machine learning
BMC Psychiatry volume 22, Article number: 120 (2022)
Machine learning (ML) is increasingly used to predict suicide deaths but their value for suicide prevention has not been established. Our first objective was to identify risk and protective factors in a general population. Our second objective was to identify factors indicating imminent suicide risk.
We used survival and ML models to identify lifetime predictors using the Cohort of Norway (n=173,275) and hospital diagnoses in a Saskatoon clinical sample (n=12,614). The mean follow-up times were 17 years and 3 years for the Cohort of Norway and Saskatoon respectively. People in the clinical sample had a longitudinal record of hospital visits grouped in six-month intervals. We developed models in a training set and these models predicted survival probabilities in held-out test data.
In the general population, we found that a higher proportion of low-income residents in a county, mood symptoms, and daily smoking increased the risk of dying from suicide in both genders. In the clinical sample, the only predictors identified were male gender and older age.
Suicide prevention probably requires individual actions with governmental incentives. The prediction of imminent suicide remains highly challenging, but machine learning can identify early prevention targets.
In 2016, suicide was the second leading cause of death in the 15-29 age group and accounted for 793,000 deaths worldwide . Suicide has a huge economic cost. In Spain, this was 6 billion euros annually (5 billion for men, 1 billion for women at 2013 prices) . The cost for women is certainly underestimated because the value of housework and childcare is hard to estimate. Although men are increasingly involved in parenting, household duties are still largely shouldered by women .
Generally, suicide prevention programs focus on high-risk groups [4, 5] and high-risk periods . There are studies about primary prevention programs in educational  or primary care settings but the quality of the evidence is hard to evaluate . Suicide lags behind cardiovascular outcomes in primary prevention guidelines. Whereas healthy people know how to reduce their risk of cardiovascular disease overall, this is not true for suicide. For example, the American Heart Association (AHA) recommends 150 minutes of moderate physical activity (75 minutes of vigorous activity) per week for adults . If people heed this advice, small reductions in blood pressure would translate into a lower incidence of coronary heart disease . In effect, individuals become agents of prevention for cardiovascular events. By contrast, people receive no such guidance to reduce suicide risk. A UK study argued that broad population-based strategies result in greater suicide reductions than those focused on high-risk groups . An example of a broad population approach is one hour of physical activity per week—this may prevent 12 percent of future depression cases . Several modifiable suicide risks (discussed below) are already known and machine learning (ML) may identify additional ones.
Regarding secondary prevention, identifying high-risk patients is challenging. Clinicians cannot foresee which patients will act upon suicidal thoughts [13, 14]. Two reasons are: (1) suicidal thoughts do not progress linearly to suicide.  (2) Suicide-related outcomes (i.e. thoughts, attempts, and completions) have common risk factors . As computers become cheaper and ubiquitous, ML is increasingly used for precision medicine, including the prediction of suicide [16–22]. ML can be defined as programs that learn from previous experience , in contrast to rule-based artificial intelligence that relies on programmer instructions .
There are known modifiable targets for suicide such as smoking [25–28], lipid and cholesterol profiles [29–31], dietary patterns , unemployment  and BMI in which overweight and obese individuals had lower suicide risk [34, 35]. Likewise, a meta-analysis reported that compared to those with normal weight, underweight people had higher suicide risk and overweight people had lower suicide risk . We were unsure if these variables are causal or markers of suicidality. Nevertheless, each person has freedom of action subject to genetic, social, and environmental constraints . Regarding the prediction of imminent suicide, the literature suggests that transitions in care are high risk periods . These include: initial diagnosis with a mental condition, initiation of psychotropic medication, discharge from the hospital, and having a recent life-changing event [38, 39].
Previous ML papers used administrative data for suicide prediction [40, 41]. For example, Simon and colleagues examined about 3 million people who visited mental health and primary care centers to identify precursor events for suicidality . An Australian group developed a risk score that accumulated information longitudinally and this score was shown to predict repeat episodes . Both papers recommended using electronic health records to identify high-risk people. Whether it is worthwhile to do so is being debated. Belsher and colleagues systematically reviewed 17 studies and reported that the accuracy for predicting a future event is near zero . However, this conclusion is disputed by Simon and colleagues, claiming that their model  has superior predictive value for imminent suicide compared to prediction models for breast cancer .
ML models could aid suicide prevention because these techniques combine the joint action of many risk factors without making typical statistical assumptions . However, ML is not immune to other challenges in predicting suicide. Suicidal people may inadvertently or deliberately terminate their life  without presenting to care services. Also, the class imbalance problem—referring to data in which an outcome of interest is exceedingly rare compared to the other class is pervasive in suicide research . Classifying every instance as a non-suicide would be correct most of the time but miss all the suicide cases whose deaths might otherwise have been prevented.
We had two main objectives in this study. First, we sought to identify early risk or protective factors for the primary prevention of suicide, especially those within each individual’s sphere of influence. Secondly, we examined if longitudinally collected records of mental health related hospital visits can predict suicides in a high-risk population.
Materials and methods
The demographic characteristics of the participants of the general population and the clinical sample are presented in Table 1.
Ascertainment of suicide
The outcome variable in both the general population and clinical sample was suicide established by official authorities. For the Cohort of Norway, cause of death for deceased participants was provided to the research team as suicide or other cause. This was based on death certificates completed by a physician and entered into a national Cause of Death registry. Suicide is indicated by the ICD-10 codes X60-X84 and Y87.0 . Of the 319 suicide deaths among cohort members, all were based on ICD-10 except for two deaths in 1995 that used ICD-9. From 2005 to 2014, three assessments regarding the quality of the data Norwegian Causes of Death Registry were made. The quality was classified in the second-best category (first two assessments) and in the best category (third assessment) .
For the Saskatoon data, the research team was provided with a list of suicide decedents (based on the same ICD-10 codes as Norway) by the provincial coroner. We are not aware of an external assessment of the mortality data from Saskatoon (and Canada in general) but the lack of a national standard and an accreditation system for coroner offices are notable weaknesses .
The research project was approved by University of Saskatchewan ethics board (Saskatoon data) and the Regionale Komiteer for Medisinsk og Helsefaglig Forskningsetikk (Norway data). All Cohort of Norway participants provided written consent to link their responses with government registers. Consent to participate was waived for the Saskatoon data by the University of Saskatchewan (Approval number: Bio 17-11). Handling of both Norway and Saskatoon data adheres to the declaration of Helsinki.
Cohort of Norway, population study
The Cohort of Norway (CONOR) study consisted of 11 health surveys carried out between 1994 and 2003 in various Norwegian regions . CONOR included demographic data, self-reported medication use, lifestyle (diet and physical activity), smoking, alcohol consumption, and blood test results from 173,275 people who were between ages 18 to 105 at enrollment. For 7235 people who participated more than once, we used data from their initial participation only. Survey responses were linked with ICD-coded deaths up to December 31, 2016 by the Norwegian Institute of Public Health.
Candidate predictor variables
Our main candidate predictor for suicide was the sum of 7 questions regarding psychological health (mood symptoms). These were: felt nervous or worried, felt anxious, felt confident and calm, felt irritable, felt happy and optimistic, felt down, depressed,and felt lonely. These items are based on the Hopkins Symptom Checklist  which has been validated in various populations including Norway . We reverse-coded the positively worded items before summation. The variables representing a healthy lifestyle and dietary factors were: engaging in light / hard physical activity, alcohol use (never or seldom, about monthly, more than monthly to once a week, several times a week), daily smoking, exposure to smoke-filled rooms, and exposure to secondary smoke as a child. We had the following biological measurements: triglycerides, HDL-cholesterol, glucose, total cholesterol (all in μmol). The details regarding the collection of biomarkers and other characteristics are described in the cohort profile .
Other candidate predictors were: BMI, taking blood pressure medications, month of birth, having an injury requiring hospitalization, age, waist-hip-ratio, married status, and living with a spouse (partner). Although Norway is a welfare state, we included two measures of social status as predictors: years of education and relative social deprivation. A previous Norwegian study reported an association of higher psychological distress and low education . Relative social deprivation was defined as the proportion of residents in a county with an after-tax income that is 50 percent below the median income or greater .
We likewise considered a wider range of suicide predictors but these had missing rates higher than 20 percent, beyond which imputation is not recommended . These variables (missing rates) were: number of sleepless nights in a week (32%), having young children (79%), immigrant background (22%), number of good friends (24%), use of vitamins and supplements (66%), and taking antidepressants (70%).
Saskatoon, Canada clinical sample
We created a retrospective cohort of people (n=12,614) who ever visited a Saskatoon hospital for a mental health or substance-related reason between 2011 and 2016. Using the first such visit as an index date, we constructed a longitudinal record of hospital and community visits for 4 years up to 31 March 2016 or until death by suicide, whichever came first. We required that people had at least 6 months of follow-up time, but included people dying of suicide in the first 6 months (n=13). We transformed this person-level data (1 row: 1 person) into a person-period dataset grouped into 6-month intervals. Each interval had time-varying predictors or suicide death (if applicable). These are explained in the next section.
Candidate predictor variables
Our main candidate predictor was the Repeated Episodes of Self-harm (RESH) score for each six-month interval . RESH ranges from 0 to 25, with people scoring in the 20-25 range having over 80 percent risk of a repeat self-harm episode . Although it was not developed for the purpose of suicide prediction, other studies have used the RESH components (psychiatric diagnoses [20, 21], hospitalizations , or self-harm episodes ) for suicide prediction. Aside from the RESH score, we had ICD diagnosis codes for each visit, intake and discharge dates, and whether the patient visited the emergency room only or was admitted as an inpatient. Each of 20 diagnosis fields was searched for the following ICD diagnoses: Substance misuse (F10-F19), Depression (F32-F39), Anxiety (F40-F49), Eating Disorder (F50), Personality disorders (F51-59), Schizophrenia and related (F20-F29), Mania (F30-F31), and ADHD (F90). We also include self-harm episodes not resulting in death, an indicator of high suicide risk , as a candidate predictor.
Just as with the CONOR, there were variables of interest to us but high missing proportions precluded their use. These variables (missing rates) were: highest educational attainment (63%), aboriginal status (70%), and area-level deprivation (28%).
Our analytical strategy can be summarized in seven steps:
Partitioning the data (CONOR or Saskatoon) into training and testing subsets. The training set was dedicated to developing survival and ML models while the testing set was held out to be later predicted by the trained models.
Balancing the training data such that equal numbers of suicides and non-suicides were represented. This would allow the statistical and ML models to detect predictors of suicide.
Imputing missing values in the training and test sets separately. Suicide status and time to death (or censorship) were not included in imputation.
Fitting univariate (multivariable) survival and ML models to the training data. The ML models were variations of random forests. These are described more fully in the Supplementary Material. For the Cohort of Norway, we developed separate models by gender because there were adequate numbers of suicide deaths, but not with Saskatoon data.
Identifying the top predictor variables.
Using the survival and ML models to predict survival probabilities in the test data.
Comparing the sensitivity, specificity, negative predictive value (NPV), positive predictive value (PPV), and area under the receiver operating characteristic (ROC) curve of the survival and ML models.
Readers who are interested in the technical details are referred to the Supplementary Material. There we present diagrams, tables of intermediate results, and the detailed accuracy measures (Step 7). We have also provided the R and Stata codes therein.
Cohort of Norway
In the Cohort of Norway, gender-separate Cox models showed that in women, higher age, higher proportion of low income residents, daily smoking, number of hours spent in smoke-filled rooms, and mood symptoms were associated with higher suicide risk. Being married was associated with lower suicide risk for women (Table 2). Among men, the risk factors were: higher proportion of low income residents, higher triglycerides, daily smoking, and mood symptoms (Table 3). A higher waist-hip ratio was associated with lower risk. The Cox model for females had an area under the ROC curve of 0.38 and for males it was 0.57 (Table S4 in the Supplement). Figure 1 shows the relative survival probabilities in females at high/low values of mood symptoms, low-income proportion, and daily smoking. Figure 2 shows the same comparison for males.
The random survival forest for females identified the following significant predictors: higher proportion of low income residents, daily smoking, and mood symptoms (Table 4). The random survival forest model for males identified the same three variables and in addition, living with a spouse or a partner, being married, and taking blood pressure medications as protective factors (Table 4). The random survival forest model for females had area under the ROC curve of 0.50 and for males it was 0.43. (Table S4 in the Supplement)
Saskatoon clinical sample
Of the univariate discrete survival models, only four models had interpretable odds ratios (ORs). These were the ones containing age, male sex, RESH score, and number of community mental health visits, each entered as a single predictor. Of these four models, higher age was the only factor that was associated with suicide death (Table 5). A one year increase in age at index increased suicide risk by 2 percent. The other variables had ORs that were infinitesimally small (e.g. for substance use, the OR was 1.97e-7).
Since age alone had a p value <0.20 in univariate models, we did not create a multivariate model. We therefore used the univariate model to predict the held-out data for the probability of suicide at those intervals that contained at least 1 suicide death.
The historical random forest model identified only age at index and male gender as important predictors of suicide death (Table 6). The historical random forest model had an area under the ROC curve that was higher than that of logistic regression in 4 out of 5 intervals. However, both models had close to zero PPV in all intervals (Table S6 in the Supplement).
We fitted statistical and ML models to individual-level data in the Cohort of Norway and a clinical sample in Saskatoon, Canada. In the general population, we found that mood symptoms, daily smoking, and living in a county with a higher proportion of low income residents predict suicide death. These variables were consistently identified between sexes and by Cox and random survival forest models. In the clinical sample, no variables other than age and male gender predicted suicide at various follow-up intervals despite a longitudinal record of hospital visits.
Long-term suicide prevention
The first implication of our general population result is that smoking abstinence or cessation is important for primary suicide prevention. It has been argued that the smoking-suicide association is spurious and that it can be explained by other causes such as substance abuse and mental disorders . This seems to imply that smoking is a coping mechanism that is not of itself harmful to mood and cognition. An alternative explanation is that smoking is a psychological toxin that is not entirely accounted for by other suicide risks . Several lines of evidence support this view. First is the dose-response relation between the quantity or intensity of smoking and suicide reported by large cohort studies [27, 60, 61]. Second is a Mendelian randomization study concluding that the associations of smoking, schizophrenia, and depression can partly be attributed to a causal effect of smoking . Third, abstinence from smoking is associated with fewer suicide related outcomes, with a longer abstinence associated with lower suicide risk [63–65]. In a study that disentangled the genetic predisposition to smoke and smoking behaviour, a 35-year follow-up of twins in Finland reported that among twins, one of whom smoked and the other did not, death by suicide was more likely for the smoker . Even though smoking is an individual choice, it is a public concern that the warning labels of cigarette boxes tend to focus on cancer risk, while remaining silent about mental health [62, 66]. Using a quasi-experimental approach, Grucza and colleagues  evaluated the impact of cigarette excise taxes and smoke-free air policies on suicide deaths. They concluded that an added $1 dollar excise tax on a pack of cigarettes translates to a 12.4% reduction in suicide risk. This shows that government policies are effective in nudging individuals into healthy behaviors.
The second implication is that mood symptoms should not be ignored, and seeking treatment is part of an individual’s duty of self-care. In both Norway and Canada, seeing a psychiatrist or psychologist is usually free. Unfortunately, there is no shortage of maladaptive beliefs preventing people from seeking help. Patients may fail to recognize their need for treatment or deny their illness [68, 69]. In the United States, where there is no universal health coverage, attitudinal barriers aggravate the limited access to health services . Parents usually serve as gatekeepers to mental health services for their children , so receiving proper care often hinges on parental attitudes. Parents may refuse to seek care for their children for fear of the mental illness label . They did so despite knowing that depression typically does not resolve on its own. People who have attempted suicide are more likely to seek help compared to counterparts who have a mental condition but no attempt . A reason given for not seeking help is the desire to solve the problem by themselves [70, 73]. These behaviours hinder a timely provision of mental health care and ultimately increase the risk for suicide.
The finding that counties with higher proportion of low-income residents have higher suicide rates is consistent with a Norwegian case-control study that studied socio-economic predictors of suicide . The study reported that suicide cases were overrepresented in people earning less than 400,000 NOK annually. Likewise, suicides were overrepresented in people with compulsory education only compared to tertiary education. These results are consistent with a Danish study showing that the lowest quartile of income was associated with higher suicide risk . Low income and low socioeconomic status are known determinants of poor health outcomes . Individuals with low socioeconomic status may have more unhealthy dietary patterns, smoke more, exercise less, are more often overweight and obese, resulting in poorer physical and mental health as a result. Further research is required to understand why these social determinants apply also to generous welfare regimes such as in the Scandinavian countries .
There were other variables that were protective for males only (higher waist-hip ratio) or females only (married). Overall, a higher waist-hip ratio is a risk for obesity, which two previous studies found to be associated with lower suicide risk [34, 35]. This finding needs to be further studied because obesity is an inflammatory condition and inflammation is associated with mood disorders . Our results also identified risk factors for females only (hours in smoke-filled rooms) or males only (triglycerides). Both need to be studied further and it would be premature to interpret them at this time.
Imminent suicide prevention
The main implication of our Saskatoon result is that predicting the timing of suicide is not feasible with hospital-based diagnosis alone and with small numbers of suicide cases. Note that our clinical sample was of reasonable size (N=13,892). However, this presumably high risk sample had too few suicide deaths (n=80) for ML to be effective. There were in fact 149 suicides in Saskatoon during the study period—a number that approximates Canada’s suicide incidence rate of 11.5 per 100,000 people per year . The 69 other suicides never visited a Saskatoon hospital so we had no information about them. They may have had records from general practitioners, the police, social services, and forensic settings. Unfortunately, linking data across these settings is a formidable task in Canada because of inconsistent standards across provinces. Researchers have huge barriers to overcome before being granted access to research data, purportedly for privacy reasons . It is possible that resources with: (1) a wider range of candidate variables, (2) coming from various agencies, (3) aggregated over longer periods, can enable the prediction of imminent suicide with greater accuracy. The SHRINE project in the USA is one such repository. SHRINE aggregates data about various diseases and makes electronic records available for research while preserving the privacy of patients .
Accuracy of our prediction models
Our accuracy measures for the Cohort of Norway (Table S4) and Saskatoon (Table S6) were dismal overall. The areas under the ROC curve for the Cohort of Norway models were in the range: 0.38-0.57—roughly comparable to a random guess of a coin toss outcome. The positive predictive values (PPV) of our Cohort of Norway models were mostly 0, reaching a maximum of about 16 percent for male suicides (Cox model). Compared to the PPVs of 11 studies meta-analyzed by Belsher , ours was second best . In contrast the AUCs of our Cohort of Norway models were below those of 7 studies that reported AUCs. The Saskatoon clinical sample models uniformly had PPVs close to zero. This is not surprising since there were no significant predictors aside from male sex and age, and few suicide cases. More surprising is that several studies [18, 19, 21, 83] with millions of patients also had PPVs close to zero. This does not imply that suicide prediction models are an exercise in futility. PPV depends on disease prevalence, and with suicide being rare, identifying true positives is extremely difficult. In effect, ML models may improve dramatically but their PPV is constrained ultimately by the problem of class imbalance.
The present study is subject to several limitations. First, we did not have data that included both self-reported health measures and healthcare utilization records. Having data that includes variables from both domains would help elucidate how primary and secondary preventive factors interact. Second, although we had a range of other variables in the Saskatoon clinical data, such as area-level deprivation, aboriginal status, highest level of education, and marital status, the missing rates were unacceptably high, so we decided not to use them as predictors. Third, we deviated from the usual practice in suicide studies to combine suicide deaths with those that the coroner ruled as “undetermined intent” so our outcome variable excludes suicides that the coroner could not ascertain. Fourth, non-fatal self-harm episodes (X60-X84) do not distinguish between events with and without an intent to die . This means that emergency room visitors who cut themselves as a form of coping are assigned the same ICD code as visitors who hanged themselves but survived. This most likely diluted the predictive value of self-harm for future suicides. Finally, there may have been some leakage of information from the test to the training set during imputation. This may have resulted in a slight inflation of prediction accuracy.
Suicide prevention probably requires individual actions with governmental incentives. The prediction of imminent suicide remains highly challenging, but machine learning can identify early prevention targets.
Availability of data and materials
The data that support the findings of this study are available from the Norwegian Institute of Public Health and the Saskatchewan Health Authority but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. With the permission of the Norwegian Institute of Public Health and the Saskatchewan Health Authority, the corresponding author will make the data available upon reasonable request.
Bennett J, Stevens G, Mathers C, Bonita R, Rehm J, Kruk M, Riley L, Dain K, Kengne A, Chalkidou K, et al. Ncd countdown 2030: worldwide trends in non-communicable disease mortality and progress towards sustainable development goal target 3.4. Lancet. 2018; 392(10152):1072–88.
Rivera B, Casal B, Currais L. Crisis, suicide and labour productivity losses in spain. Eur J Health Econ. 2017; 18(1):83–96.
Yavorsky J, Kamp Dush C, Schoppe-Sullivan S. The production of inequality: The gender division of labor across the transition to parenthood. J Marriage Fam. 2015; 77(3):662–79.
Mann J, Apter A, Bertolote J, Beautrais A, Currier D, Haas A, Hegerl U, Lonnqvist J, Malone K, Marusic A, et al. Suicide prevention strategies: a systematic review. Jama. 2005; 294(16):2064–74.
Overholser J, Braden A, Dieter L. Understanding suicide risk: Identification of high-risk groups during high-risk times. J Clin Psychol. 2012; 68(3):349–61.
Brodsky B, Spruch-Feiner A, Stanley B. The zero suicide model: applying evidence-based suicide prevention practices to clinical care. Front Psychiatry. 2018; 9:33.
Harrod C, Goss C, Stallones L, DiGuiseppi C. Interventions for primary prevention of suicide in university and other post-secondary educational settings. Cochrane Database Syst Rev. 2014; 10:1–64.
Zalsman G, Hawton K, Wasserman D, van Heeringen K, Arensman E, Sarchiapone M, Carli V, Höschl C, Barzilay R, Balazs J, et al. Suicide prevention strategies revisited: 10-year systematic review. Lancet Psychiatry. 2016; 3(7):646–59.
Arnett D, Blumenthal R, Albert M, Buroker A, Goldberger Z, Hahn E, Himmelfarb C, Khera A, Lloyd-Jones D, McEvoy J, et al. 2019 acc/aha guideline on the primary prevention of cardiovascular disease: a report of the american college of cardiology/american heart association task force on clinical practice guidelines. J Am Coll Cardiol. 2019; 74(10):177–232.
Cook N, Cohen J, Hebert P, Taylor J, Hennekens C. Implications of small reductions in diastolic blood pressure for primary prevention. Arch intern Med. 1995; 155(7):701–9.
Lewis G, Hawton K, Jones P. Strategies for preventing suicide. Br J Psychiatry. 1997; 171(4):351–4.
Harvey S, Øverland S, Hatch S, Wessely S, Mykletun A, Hotopf M. Exercise and the prevention of depression: results of the hunt cohort study. Am J Psychiatry. 2018; 175(1):28–36.
Wang Y, Bhaskaran J, Sareen J, Bolton S-L, Chateau D, Bolton J. Clinician prediction of future suicide attempts: a longitudinal study. Can J Psychiatry. 2016; 61(7):428–32.
Woodford R, Spittal M, Milner A, McGill K, Kapur N, Pirkis J, Mitchell A, Carter G. Accuracy of clinician predictions of future self-harm: a systematic review and meta-analysis of predictive studies. Suicide Life Threat Behav. 2019; 49(1):23–40.
Bruffaerts R, Demyttenaere K, Borges G, Haro J, Chiu W, Hwang I, Karam E, Kessler R, Sampson N, Alonso J, et al. Childhood adversities as risk factors for onset and persistence of suicidal behaviour. Br J Psychiatry. 2010; 197(1):20–7.
Nock M, Millner A, Joiner T, Gutierrez P, Han G, Hwang I, King A, Naifeh J, Sampson N, Zaslavsky A, et al. Risk factors for the transition from suicide ideation to suicide attempt: Results from the army study to assess risk and resilience in servicemembers (army starrs). J Abnorm Psychology. 2018; 127(2):139.
Just M, Pan L, Cherkassky V, McMakin D, Cha C, Nock M, Brent D. Machine learning of neural representations of suicide and emotion concepts identifies suicidal youth. Nat Hum Behav. 2017; 1(12):911–9.
Kessler R, Stein M, Petukhova M, Bliese P, Bossarte R, Bromet E, Fullerton C, Gilman S, Ivany C, Lewandowski-Romps L, et al. Predicting suicides after outpatient mental health visits in the army study to assess risk and resilience in servicemembers (army starrs). Mol Psychiatry. 2017; 22(4):544–51.
Kessler R, Warner C, Ivany C, Petukhova M, Rose S, Bromet E, Brown M, Cai T, Colpe L, Cox K, et al. Predicting suicides after psychiatric hospitalization in us army soldiers: the army study to assess risk and resilience in servicemembers (army starrs). JAMA Psychiatry. 2015; 72(1):49–57.
Passos I, Mwangi B, Cao B, Hamilton J, Wu M-J, Zhang X, Zunta-Soares G, Quevedo J, Kauer-Sant’Anna M, Kapczinski F, et al. Identifying a clinical signature of suicidality among patients with mood disorders: A pilot study using a machine learning approach. J Affect Disord. 2016; 193:109–16.
Simon G, Johnson E, Lawrence J, Rossom R, Ahmedani B, Lynch F, Beck A, Waitzfelder B, Ziebell R, Penfold R, et al. Predicting suicide attempts and suicide deaths following outpatient visits using electronic health records. Am J Psychiatry. 2018; 175(10):951–60.
Walsh C, Ribeiro J, Franklin J. Predicting risk of suicide attempts over time through machine learning. Clin Psychol Sci. 2017; 5(3):457–69.
Mitchell T, et al. Machine Learning. New York: McGraw-hill; 1997.
Sun R. Robust reasoning: integrating rule-based and similarity-based reasoning. Artif Intell. 1995; 75(2):241–95.
Evins A, Korhonen T, Kinnunen T, Kaprio J. Prospective association between tobacco smoking and death by suicide: a competing risks hazard analysis in a large twin cohort with 35-year follow-up. Psychol Med. 2017; 47(12):2143–54.
Lucas M, O’Reilly E, Mirzaei F, Okereke O, Unger L, Miller M, Ascherio A. Cigarette smoking and completed suicide: results from 3 prospective cohorts of american adults. J Affect Disord. 2013; 151(3):1053–8.
Peters E, John A, Bowen R, Baetz M, Balbuena L. Neuroticism and suicide in a general population cohort: results from the uk biobank project. BJPsych open. 2018; 4(2):62–8.
Campbell-Sills L, Kessler R, Ursano R, Sun X, Heeringa S, Nock M, Jain S, Stein M. Nicotine dependence and pre-enlistment suicidal behavior among us army soldiers. Am J Prev Med. 2019; 56(3):420–8.
Evans S, Prossin A, Harrington G, Kamali M, Ellingrod V, Burant C, McInnis M. Fats and factors: lipid profiles associate with personality factors and suicidal history in bipolar subjects. PLoS ONE. 2012; 7(1):29297.
Lee H-J, Kim Y-K. Serum lipid levels and suicide attempts. Acta Psychiatrica Scandinavica. 2003; 108(3):215–21.
Svensson T, Inoue M, Sawada N, Charvat H, Mimura M, Tsugane S, Group J, Iwasaki M, Sasazuki S, Yamaji T, et al. High serum total cholesterol is associated with suicide mortality in japanese women. Acta Psychiatrica Scandinavica. 2017; 136(3):259–68.
Nanri A, Mizoue T, Poudel-Tandukar K, Noda M, Kato M, Kurotani K, Goto A, Oba S, Inoue M, Tsugane S. Dietary patterns and suicide in japanese adults: the japan public health center-based prospective study. Br J Psychiatry. 2013; 203(6):422–7.
Milner A, Page A, LaMontagne A. Long-term unemployment and suicide: a systematic review and meta-analysis. PloS ONE. 2013; 8(1):51333.
Geulayov G, Ferrey A, Hawton K, Hermon C, Reeves G, Green J, Beral V, Floud S, Collaborators M, et al. Body mass index in midlife and risk of attempted suicide and suicide: prospective study of 1 million uk women. Psychol Med. 2019; 49(13):2279–86.
Mukamal K, Kawachi I, Miller M, Rimm E. Body mass index and risk of suicide among men. Arch Intern Med. 2007; 167(5):468–75.
Perera S, Eisen R, Dennis B, Bawor M, Bhatt M, Bhatnagar N, Thabane L, de Souza R, Samaan Z. Body mass index is an important predictor for suicide: results from a systematic review and meta-analysis. Suicide Life Threat Behav. 2016; 46(6):697–736.
Shim R, Koplan C, Langheim F, Manseau M, Powers R, Compton M. The social determinants of mental health: An overview and call to action. Psychiatric Ann. 2014; 44(1):22–6.
Coope C, Donovan J, Wilson C, Barnes M, Metcalfe C, Hollingworth W, Kapur N, Hawton K, Gunnell D. Characteristics of people dying by suicide after job loss, financial difficulties and other economic stressors during a period of recession (2010–2011): A review of coroners records. J Affect Disord. 2015; 183:98–105.
Valenstein M, Kim H, Ganoczy D, McCarthy J, Zivin K, Austin K, Hoggatt K, Eisenberg D, Piette J, Blow F, et al. Higher-risk periods for suicide among va patients receiving depression treatment: prioritizing suicide prevention efforts. J Affect Disord. 2009; 112(1-3):50–8.
DelPozo-Banos M, John A, Petkov N, Berridge D, Southern K, LLoyd K, Jones C, Spencer S, Travieso C. Using neural networks with routine health records to identify suicide risk: feasibility study. JMIR Ment health. 2018; 5(2):10144.
Sanderson M, Bulloch A, Wang J, Williams K, Williamson T, Patten S. Predicting death by suicide following an emergency department visit for parasuicide with administrative health care system data and machine learning. EClinicalMedicine. 2020; 20:100281.
Spittal M, Pirkis J, Miller M, Carter G, Studdert D. The repeated episodes of self-harm (resh) score: A tool for predicting risk of future episodes of self-harm by hospital patients. J Affect Disord. 2014; 161:36–42.
Belsher B, Smolenski D, Pruitt L, Bush N, Beech E, Workman D, Morgan R, Evatt D, Tucker J, Skopp N. Prediction models for suicide attempts and deaths: a systematic review and simulation. JAMA Psychiatry. 2019; 76(6):642–51.
Simon G, Shortreed S, Coley R. Positive predictive values and potential success of suicide prediction models. JAMA Psychiatry. 2019; 76(8):868–9.
Veltri G. Big data is not only about data: The two cultures of modelling. Big Data Soc. 2017; 4(1):2053951717703997.
Szasz T. Fatal Freedom: The Ethics and Politics of Suicide. Westport, CT: Praeger; 1999.
O’Brien R, Ishwaran H. A random forests quantile classifier for class imbalanced data. Pattern Recog. 2019; 90:232–49.
Norwegian Institute of Public Health. Suicide by age and manner of death. Norwegian Institute of Public Health. 2021. Available from: http://statistikkbank.fhi.no/dar/. Accessed 15 July 2021.
Pedersen A, Ellingsen C. Data quality in the causes of death registry. Tidsskrift for Den norske legeforening. 2015; 135(8):768–70.
Kelsall D, Bowes M. No standards: medicolegal investigation of deaths. Can Med Assoc. 2016; 188(3):169.
Næss Ø, Søgaard A, Arnesen E, Beckstrøm A, Bjertness E, Engeland A, Hjort P, Holmen J, Magnus P, Njølstad I, et al. Cohort profile: cohort of norway (conor). Int J epidemiol. 2008; 37(3):481–5.
Derogatis L, Lipman R, Rickels K, Uhlenhuth E, Covi L. The hopkins symptom checklist (hscl): A self-report symptom inventory. Behav Sci. 1974; 19(1):1–15.
Carrozzino D, Vassend O, Bjørndal F, Pignolo C, Olsen L, Bech P. A clinimetric analysis of the hopkins symptom checklist (scl-90-r) in general population studies (denmark, norway, and italy). Nordic J Psychiatry. 2016; 70(5):374–9.
Dalgard O, Mykletun A, Rognerud M, Johansen R, Zahl P. Education, sense of mastery and mental health: results from a nation wide health monitoring study in norway. BMC psychiatry. 2007; 7(1):1–9.
Statistics Norway. Persons in private households with annual after-tax income per consumption unit, below different distances to the median income. EU-scale and OECD-scale (M) (UD) 2005 - 2018 (Table 06947). Statistics Norway. 2016. Available from: https://www.ssb.no/en/statbank/table/06947. Accessed 20 Feb 2019.
Liao S, Lin Y, Kang D, Chandra D, Bon J, Kaminski N, Sciurba F, Tseng G. Missing value imputation in high-dimensional phenomic data: imputable or not, and how?. BMC Bioinforma. 2014; 15(1):1–12.
Altmann A, Toloşi L, Sander O, Lengauer T. Permutation importance: a corrected feature importance measure. Bioinformatics. 2010; 26(10):1340–7.
Kessler R, Berglund P, Borges G, Castilla-Puentes R, Glantz M, Jaeger S, Merikangas K, Nock M, Russo L, Stang P. Smoking and suicidal behaviors in the national comorbidity survey-replication. J Nerv Ment Dis. 2007; 195(5):369.
Hughes J. Smoking and suicide: a brief overview. Drug alcohol Depend. 2008; 98(3):169–78.
Hemmingsson T, Kriebel D. Smoking at age 18–20 and suicide during 26 years of follow-up—how can the association be explained?. Int J Epidemiol. 2003; 32(6):1000–4.
Miller M, Hemenway D, Rimm E. Cigarettes and suicide: a prospective study of 50,000 men. Am J Public Health. 2000; 90(5):768.
Wootton R, Richmond R, Stuijfzand B, Lawn R, Sallis H, Taylor G, Hemani G, Jones H, Zammit S, Smith G, et al. Evidence for causal effects of lifetime smoking on risk for depression and schizophrenia: a mendelian randomisation study. Psychological Med. 2020; 50(14):2435–43.
Balbuena L, Tempier R. Independent association of chronic smoking and abstinence with suicide. Psychiatric Serv. 2015; 66(2):186–92.
Covey L, Berlin I, Hu M-C, Hakes J. Smoking and suicidal behaviours in a sample of us adults with low mood: a retrospective analysis of longitudinal data. BMJ Open. 2012; 2(3).
Yaworski D, Robinson J, Sareen J, Bolton J. The relation between nicotine dependence and suicide attempts in the general population. Canadian J Psychiatry. 2011; 56(3):161–70.
Hammond D, Fong G, McNeill A, Borland R, Cummings K. Effectiveness of cigarette warning labels in informing smokers about the risks of smoking: findings from the international tobacco control (itc) four country survey. Tobacco Control. 2006; 15(suppl 3):19–25.
Grucza R, Plunk A, Krauss M, Cavazos-Rehg P, Deak J, Gebhardt K, Chaloupka F, Bierut L. Probing the smoking–suicide association: do smoking policy interventions affect suicide risk?. Nicotine Tobacco Res. 2014; 16(11):1487–94.
Bedford N, David A. Denial of illness in schizophrenia as a disturbance of self-reflection, self-perception and insight. Schizophrenia Res. 2014; 152(1):89–96.
Mojtabai R, Olfson M, Mechanic D. Perceived need and help-seeking in adults with mood, anxiety, or substance use disorders. Arch Gen Psychiatry. 2002; 59(1):77–84.
Mojtabai R, Olfson M, Sampson N, Jin R, Druss B, Wang P, Wells K, Pincus H, Kessler R. Barriers to mental health treatment: Results from the national comorbidity survey replication (ncs-r). Psychological Med. 2011; 41(8):1751.
Stiffman A, Pescosolido B, Cabassa L. Building a model to understand youth service access: The gateway provider model. Ment Health Serv Res. 2004; 6(4):189–98.
Pescosolido B, Jensen P, Martin J, Perry B, Olafsdottir S, Fettes D. Public knowledge and assessment of child mental health problems: Findings from the national stigma study-children. J Am Acad Child Adolesc Psychiatry. 2008; 47(3):339–49.
Pagura J, Fotti S, Katz L, Sareen J. Help seeking and perceived need for mental health care among individuals in canada with suicidal behaviors. Psychiatric Serv. 2009; 60(7):943–9.
Puzo Q, Mehlum L, Qin P. Socio-economic status and risk for suicide by immigration background in norway: a register-based national study. J Psychiatric Res. 2018; 100:99–106.
Qin P, Agerbo E, Mortensen P. Suicide risk in relation to socioeconomic, demographic, psychiatric, and familial factors: a national register–based study of all suicides in denmark, 1981–1997. Am J Psychiatry. 2003; 160(4):765–72.
Braveman P, Gottlieb L. The social determinants of health: it’s time to consider the causes of the causes. Public Health Rep. 2014; 129(1_suppl2):19–31.
Bambra C. Health inequalities and welfare state regimes: theoretical insights on a public health ‘puzzle’. J Epidemiol Commun Health. 2011; 65(9):740–5.
Rosenblat J, Cha D, Mansur R, McIntyre R. Inflamed moods: a review of the interactions between inflammation and mood disorders. Prog Neuro-Psychopharmacol Biol Psychiatry. 2014; 53:23–34.
Navaneelan T. Health at a glance. Suicide rates: an overview. Catalogue no. 82-624-X. Ottawa: Statistics Canada; 2012.
of Canadian Academies C. Accessing Health and Health-Related Data in Canada. Ottawa: Council of Canadian Academies; 2015.
McMurry A, Murphy S, MacFadden D, Weber G, Simons W, Orechia J, Bickel J, Wattanasin N, Gilbert C, Trevvett P, et al. Shrine: enabling nationally scalable multi-site disease studies. PloS ONE. 2013; 8(3):55811.
Amini P, Ahmadinia H, Poorolajal J, Amiri M. Evaluating the high risk groups for suicide: a comparison of logistic regression, support vector machine, decision tree and artificial neural network. Iranian J Public Health. 2016; 45(9):1179.
McCarthy J, Bossarte R, Katz I, Thompson C, Kemp J, Hannemann C, Nielson C, Schoenbaum M. Predictive modeling and concentration of the risk of suicide: implications for preventive interventions in the us department of veterans affairs. Am J Public Health. 2015; 105(9):1935–42.
We express our gratitude to the Saskatchewan Health Authority, the provincial coroner for Saskatchewan, and the Norwegian Institute of Public Health for access to the data. We are grateful to Dr James Bolton for suggesting relevant papers and to Professor Jon Godwin for statistical advice.
This research was supported by grants to the first author from the Department of Psychiatry, University of Saskatchewan, the Saskatchewan Health Research Foundation, the Royal University Hospital Foundation Community Mental Health Fund, the Google Cloud Platform, and Compute Canada.
Ethics approval and consent to participate
This study received ethical approval from the University of Saskatchewan and the Regionale Komiteer for Medisinsk og Helsefaglig Forskningsetikk. All Cohort of Norway participants provided written consent to link their responses with government registers . Consent to participate was waived for the Saskatoon data by the University of Saskatchewan (Approval number: Bio 17-11). Handling of both Norway and Saskatoon data adheres to the declaration of Helsinki.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Balbuena, L.D., Baetz, M., Sexton, J.A. et al. Identifying long-term and imminent suicide predictors in a general population and a clinical sample with machine learning. BMC Psychiatry 22, 120 (2022). https://doi.org/10.1186/s12888-022-03702-y