The reliability of suicide statistics: a systematic review

Background Reliable suicide statistics are a prerequisite for suicide monitoring and prevention. The aim of this study was to assess the reliability of suicide statistics through a systematic review of the international literature. Methods We searched for relevant publications in EMBASE, Ovid Medline, PubMed, PsycINFO and the Cochrane Library up to October 2010. In addition, we screened related studies and reference lists of identified studies. We included studies published in English, German, French, Spanish, Norwegian, Swedish and Danish that assessed the reliability of suicide statistics. We excluded case reports, editorials, letters, comments, abstracts and statistical analyses. All three authors independently screened the abstracts, and then the relevant full-text articles. Disagreements were resolved through consensus. Results The primary search yielded 127 potential studies, of which 31 studies met the inclusion criteria and were included in the final review. The included studies were published between 1963 and 2009. Twenty were from Europe, seven from North America, two from Asia and two from Oceania. The manner of death had been re-evaluated in 23 studies (40-3,993 cases), and there were six registry studies (195-17,412 cases) and two combined registry and re-evaluation studies. The study conclusions varied, from findings of fairly reliable to poor suicide statistics. Thirteen studies reported fairly reliable suicide statistics or under-reporting of 0-10%. Of the 31 studies during the 46-year period, 52% found more than 10% under-reporting, and 39% found more than 30% under-reporting or poor suicide statistics. Eleven studies reassessed a nationwide representative sample, although these samples were limited to suicide within subgroups. Only two studies compared data from two countries. Conclusions The main finding was that there is a lack of systematic assessment of the reliability of suicide statistics. Few studies have been done, and few countries have been covered. The findings support the general under-reporting of suicide. In particular, nationwide studies and comparisons between countries are lacking.


Background
In recent decades, research on suicide and suicidal behaviour has expanded. Preventing suicide and reducing suicidal behaviour are important targets of the World Health Organization (WHO) [1]. The WHO has estimated that, worldwide, about one million people die by suicide every year, representing a global annual suicide rate of 16 per 100,000 people [2]. In addition, the suicide attempt rate is about 10-15 times more frequent than the suicide rate [3,4]. These suicide estimates are based on national mortality statistics, with suicide rates ranging from no suicides per 100,000 people per year in countries such as Egypt, Haiti and Honduras, to more than 30 suicides per 100,000 people per year in Belarus, the Russian Federation and Lithuania [5].
Most countries in the industrialized world started to register the cause and manner of deaths at the end of the 19th or the beginning of the 20th century. WHO member states use the International Classification of Diseases (ICD) to classify diseases and death certificates. The first edition, known as the International List of Causes of Death, was adopted in 1893. Even with this long tradition of classification, it is difficult to compare statistics between countries and periods because of differences between countries in methods of classification and registration, and because the manner of registration has changed over time.
National mortality registers have been used in the past few decades for surveillance and research on suicide, and can be used to examine the effects of preventive strategies and priorities in health policy. Epidemiological or socio-demographic theories about suicide and the effects of intervention depend on reliable suicide statistics. Many scientists have pointed out this challenge [6,7], but to our knowledge, no systematic research has been done in this field.
The aim of this study was to assess the reliability of suicide statistics through a systematic review of the international literature.

Search strategy
The first author (IMT) searched for relevant literature up to June 2009 in five databases: EMBASE (from 1980), Ovid Medline (from 1950), PsycINFO (from 1806), the Cochrane Library (from 1993) and PubMed (from 1950). The search strategy included subject headings/MeSH terms and free text. MeSH headings and free text included the terms "suicide" combined with "reliability", "test reliability", "validity", "test validity", "reproducibility", "reproducibility of results", "cause of death" and "death certificates". The search was restricted to humans. The search was not restricted by language, publication type or study design (Additional file 1). In addition, related studies and reference lists of identified studies were screened. Update searches were performed in October 2010, but no new studies were found.

Study selection
All abstracts identified using the above search strategy were reviewed. The first author (IMT) excluded studies that were obviously irrelevant to this review. Then, the three authors screened the abstracts for relevancy, and independently reviewed the abstracts of all potentially relevant studies. Studies were included if they met the inclusion criteria of having the aim of studying the reliability of suicide statistics, and being published in English, German, French, Spanish, Norwegian, Swedish or Danish.
Studies were excluded if they were case reports, editorials, letters, comments, statistical analyses or studies only presented as abstracts. Any disagreements or differences in the extracted data between authors were resolved through consensus. If there were any doubts, we included the abstract and read the full text of the article. After excluding articles based on the abstracts, the authors performed a second, stricter screening by examining full-text reports of the remaining records. Disagreements regarding the eligibility were resolved through consensus. Reasons for exclusion were documented. We included all studies on the reliability of suicide statistics. The process of study inclusion is shown in Figure 1.
We did not attempt a meta-analysis because of methodological differences across studies. We assessed the methodological quality of the included studies using six criteria; area studied, population studied, cause and manner of death studied, how the reliability were assessed, the information the re-evaluations were based on, and number of cases included. The criteria of assessment of methodological quality are shown in Figure 2.
Three ways of assessing the reliability of suicide statistics There are three ways to assess the reliability of suicide statistics: re-evaluation studies, registry studies and statistical analyses. Re-evaluation studies are studies where the manner and cause of death were re-evaluated. Registry studies are studies where two cause-of-death registers are compared. Statistical analyses are studies where the suicide rate is calculated by adding other categories of manner and cause of death, usually undetermined deaths, open verdicts, and unintentional poisoning and drowning. We excluded statistical analyses from this review because the choice of which other categories of manner or cause of death to include often relies on registry or re-evaluation studies.

Analyses
Some studies did not calculate the percentage of underreporting, but calculated a study suicide rate and compared it with the official suicide rate. In those studies in which it was possible to estimate the percentage of under-reporting, we calculated the percentage by dividing the difference between the study and official suicide rates by the official suicide rate (under-reporting = (study suicide rate-official suicide rate)/official suicide rate).

Study selection
The primary search yielded 127 potential studies. Of these, 31 studies  met the inclusion criteria, with a population of 46,401 cases in the final review. Three of the studies did not describe the exact number of cases, and were excluded from the total number of cases. Of the 96 excluded articles, 76 were excluded because they did not study the reliability of suicide statistics, 12 were statistical analyses, two were letters, one was a comment and five were excluded because of language (Romanian, Portuguese, Czech, Serbian, Dutch).

Study characteristics Methodologies and sample size
Of the 31 included studies, 23 were re-evaluation studies, with a total population of 11,795 cases (range: 993  two studies compared data from two or more countries. Characteristics of the included studies are presented in Table 1.

Year and location of studies
The included studies were published between 1963 and 2009. Fourteen studies were published between 1963 and 1989, ten between 1990 and 1999 and seven after 2000. Twenty were from Europe, seven from North America, two from Asia, and two from Oceania.

Characteristics of the study population
Of the included studies, one re-evaluated the reliability of suicide statistics within the military system. Two studies examined the causes of death in cohorts: one of young males conscripted for military service and one of twins. The other studies included all deaths within defined time periods, locations or subgroups according to the manner of death. Some studies evaluated only suicides, whereas others included homicides, accidents and undetermined deaths.

Analysis of the included studies
The main conclusions of the studies varied, with findings ranging from fairly reliable suicide statistics to considerable under-reporting. Thirteen studies (42%) reported fairly reliable suicide statistics or under-

Population studied
Area studied Cause and manner of death studied How the reliability are assessed (i.e. reevaluation or registry studies) The information the reevaluations are based on (i.e. death certificates, police reports, autopsy reports etc.)

Number of cases included
Village =1 p City/county = 2 p Country/nationwide = 3 p More than one country = 4 p A selected group = 1 p All = 2 p Only one death category = 1 p Two death categories = 2 p Three or more death categories = 3 Registry studies = 1 p Re-evaluation studies = 2 p Death certificate = 2 p Other information = 1 p per source    reporting of 0-10%. Of the 31 studies from the 46-year period, 52% (16 of 31 studies) found more than 10% under-reporting, and 39% (12 of 31 studies) found more than 30% under-reporting or poor suicide statistics. A summary of the conclusins of the included studies are presented in Table 2.

Summary of main results
The main finding was that few studies on the reliability of suicide statistics have been done in recent years, and few countries have been covered. There were only two studies from Asia and none from Africa, where a large proportion of the global population resides. Thirteen of the 31 studies included in this review concluded with fairly reliable suicide statistics or under-reporting of 0-10%. Of the 31 studies from the 46-year period, 52% found more than 10% under-reporting, and 39% found more than 30% under-reporting or poor suicide statistics. Eleven studies evaluated a nationwide sample, and only two studies compared data from two or more countries. Only three studies got a good quality sum score. It is a trend that studies with high quality sum score concluded with fairly reliable suicide statistics or under-reporting of 0-10%, while studies with poorer quality sum score tends to conclude with more than 30% under-reporting or poor suicide statistics, but too few studies are done to make an absolute conclusion.
We have put most emphasis on the studies with the best methodological quality. These studies support our main findings. We cannot make any conclusions about the reliability of suicide statistics based only on the lack of research. Theoretically, the reliability might be good in spite of the lack of studies. In countries with official suicide rates close to zero, one might argue that the reliability was good. It is important to study the reliability of suicide statistics, and since the data are very different in the various countries, we find it of importance to study both the validity and reliability. As there are few studies, and about half of them concluded with underreporting of suicide, we think that our main finding, that the reliability of suicide statistics is questionable and calls for more studies, is fear. Reliability does not necessarily imply validity. A reliable measure is measuring something consistently, but it In the present study, the causes of death were assessed by official statistics and the researchers had the intention of measuring the same phenomenon. Accordingly, we consider that comparing official suicide statistics and external assessments reflects both reliability and validity. Studying the reliability of suicide statistics is a complex task. First, some suicides might have been missed in the administrative processes of national mortality statistics. Second, in some cases, determining the manner of death (i.e., suicide, accident, undetermined/open verdict, or natural death) requires subjective interpretation of the intention of the deceased. Different methodologies used in the included studies need to be considered, including the main difference between re-evaluation and registry studies, the variations in the cause and manner of deaths studied, the quality of the compared registers, the competence of the re-evaluators, the number of reevaluations of each case and the information the re-evaluations are based on (i.e., death certificates, police reports, autopsy reports, etc.). One can imagine that a greater number of suicides could be found by also examining undetermined deaths/open verdicts and accidents. Some studies (statistical studies) have studied the under-reporting of suicide by comparing the suicide rate with the rate of deaths of undetermined intent [39,40], and in recent years, the UK has added injury/poisoning of undetermined intent and sequelae of intentional selfharm/event of undetermined intent to the official suicide rate, in the belief it will provide a more reliable suicide rate [41]. We excluded statistical studies in this review article, but in a national and longitudinal perspective these studies are important for indicating reliability of suicide statistics, and further, effects of suicide prevention. Studies included in this review were published in many different countries between 1963 and 2009. Hence, different editions of the ICD are used in these studies, which may also affect the results to a certain extent [42].

Strengths and limitations
Some limitations of the present study should be considered. The search strategy, including literature search and reference list screening, was developed by one of the authors, and this search strategy may not have captured all relevant studies. Manually searching reference lists located 33 further studies not captured in the database searches. The selection of keywords and MeSH terms that were used may not have covered all published articles on the reliability of suicide statistics. The choice of databases also needs to be considered. The five selected databases may not have indexed all potential studies, and some relevant studies may not have been included. Medline is the largest component of PubMed, and both databases were selected in the present study because they do not have the same MeSH terms, and therefore more studies were found searching both http://www.nlm.nih.gov/pubs/factsheets/dif_med_pub.html. In retrospect, it is conceivable that using only one of these databases might have saved time, and we  might have found more studies by selecting a different database or manually searched relevant journals. It is possible that our search did not identify all of the relevant original studies [43]; for example, the publications of national statistics bureaus are not indexed in the databases, but we are confident that our research strategy has been good enough to identify the majority of the relevant original studies. Even though we may have missed some studies, we find it unlikely that this would have changed our main conclusions. For practical reasons, only published studies were sourced, but it seems unlikely that publication status would be a source of bias in the present study.
One strength of this review is that related studies and all reference lists of the included studies were screened, minimizing the number of potentially missed studies. Another strength is that the three authors independently screened all abstracts and full-text articles, minimizing the chance of a relevant study being excluded.

Future studies
The fact that few studies have been published in recent years, makes further studies clearly needed, particulary nationwide studies, studies in countries with low suicide rates and other under-investigated countries, and studies including comparisons between countries.

Conclusion
There are only few studies on the reliability of suicide statistics, and based on those studies, we cannot draw firm conclusions about the reliability of existing suicide statistics. Few studies have been published in recent years. Nationwide studies in particular are lacking, and only two studies compared data between countries.
This systematic review conforms to the PRISMA statement [44].

Additional material
Additional file 1: Search terms.