Data
Prior to commencement of the study, Ethics Approval was granted from the Institutional Review Board at John Jay College of Criminal Justice. All methods were performed in accordance with our protocol as well as all other relevant guidelines and regulations. Informed consent was gained from participants prior to the commencement of the study.
We compiled data on police use of force, suspect injury, and suspect mental health status from a total of nine police departments. Data from six departments came from the National Justice Database, a large repository of police data housed at the Center for Policing Equity. Participation in this repository is confidential and none of these six departments are named in this manuscript. All departments included in the analysis are located in moderately sized cities, which ranged in population from approximately 300,000 to approximately 1,000,000 people in 2017. In addition to data from these six anonymous police departments, the analysis included publicly available data on police use of force, suspect injury data, and suspect mental health status from New Orleans, Louisiana, Dallas, Texas, and Los Angeles, California. In total, our dataset constituted 28,649 police use of force events occurring between 2011 and 2017.
Police department in all sample cities had implemented trainings designed to improve police interaction with mentally ill suspects. Literature suggests that when officers have been trained to identify PwSMI they do so with a moderate to high degree of accuracy, and that this crisis intervention training (CIT) significantly improves their ability to identify symptoms associated with SMI [11, 12]. Nevertheless, our reliance on officer classification introduces error into our estimation of disparities. We note, however, that police are more likely to fail to identify PwSMI rather than attribute SMI to individuals without it [11]. This suggests that a disparity which identifies that individuals with SMI are more likely to experience use of force will be conservative, as official police data underestimates use of force against individuals with SMI and overestimates use of force against those without it.
We gathered survey data on SMI from the National Comorbidity Survey Replication (NCS-R), a nationally representative multi-stage survey on the prevalence and correlates of mental disorders in the United States [13, 14]. Interviews were conducted face-to-face between 2001 and 2003 among English-speaking heads of household aged 18 and over. While the NCS-R surveyed 9282 respondents, this analysis utilized a subset of those respondents (N = 5493) who completed the long version of the survey. The longer version of the survey included additional questions measuring demographic characteristics that were used to develop the multivariate logistic regression of SMI.
Aggregate level city and Census tract-level data were pulled from the American Community Survey (ACS). Throughout the analyses, yearly ACS data were pulled to correspond to the year for which disparities in police use of force and suspect injury were being analyzed. For example, disparities estimated from 2016 police data were generated using 2016 ACS data.
Operationalization of serious mental illness
Our operationalization of SMI from the NCS-R followed methodologies from previous research that coded individuals as suffering from SMI if they met criteria for diagnosis, functional impairment, and duration [15, 16]. Accordingly, individuals were coded as suffering from SMI if they met each of the following three criteria. First, individuals must have met the Composite International Diagnostic Interview (CIDI) diagnosis for any of the following mental illnesses: bipolar I, bipolar II, mania, major depressive disorder, agoraphobia, generalized anxiety disorder, post-traumatic stress disorder, hypomania, specific phobia, social phobia, or seriously attempted suicide within the past 12 months. Second, individuals diagnosed with any of these conditions must have: a) reported being unable to function due to that condition for at least 120 out of the past 365 days or b) rated the magnitude of their impairment at work, at home, in relationships and in their social lives due to the condition as at least 7 out of 10. Third individuals must have reported having the disorder for at least 12 months.
Variable coding
The small area estimation strategy used here required that the variables used to generate predictors of SMI in the individual level data set are contained in the aggregate ACS data and coded in the same fashion. Based on extant work, we predicted SMI as a function of age, gender, race, marital status, employment status, educational attainment, and poverty index [15]. Codings are presented in the online Additional file 1. Identical data were pulled from the ACS and were coded in the same fashion as the individual data.
Small area estimation
Statistical estimation of disparities in police use of force relies on a baseline against which to measure disproportionality. Observed disparities in police use of force may reflect the greater exposure that these populations have to the police or other ecological factors (such as crime rates), and statistical estimation of disparity must control for this possibility. No federal agency, however, conducts a census of the share of the population with SMI at the state, county, or tract level. And, while nationally representative surveys provide reliable estimates of the prevalence of SMI across the United States, the sampling frame of these surveys rarely contain enough respondents from any particular city or county to estimate the number of PwSMI at that level.
We used synthetic estimation to approximate the share of the population with and without SMI at the neighborhood, police precinct, and city levels. Synthetic estimation is a procedure for producing estimates at relatively small geographies that applies data from nationally representative surveys to large administrative data [17]. We employed a model-based approach, which first estimated prevalence of SMI among subpopulations defined by race, age, marital status, and other socioeconomic and demographic characteristics using data from a nationally representative survey and then projected these estimates onto aggregate data from the US Census Bureau [15, 16, 18,19,20].
Small area estimation of the share of the population with and without SMI proceeded as follows. First, a multivariate logistic regression using the individual level predictors of SMI was generated from the NCS-R. Regression coefficients from this model were then applied to police-precinct and Census tract level data to generate logit scores. Logit values for these areal units were then converted to probabilities, which were then multiplied by the ACS estimate of the total population over the age of 18 to produce estimates of the population with SMI.
Estimation of disparities
Disparities in use of force and injury between individuals with and without SMIs were estimated using hierarchical negative binomial models, estimated in a Bayesian framework. Specifically, we model:
$$ {y}_{vti}\sim Negative\ Binomial\left({n}_{vti}{e}^{u+{\alpha}_v+{\delta}_t+{\beta}_i},\phi \right) $$
$$ {\beta}_i\sim Normal\left(0,{\sigma}_{\beta}^2\right) $$
where yvti are use of force events/injuries for populations defined by different levels of vulnerability v in different years t across different tracts/police precincts i. We model the distribution of yvti as following a negative binomial distribution and set nvti—the share of the population with and without SMI at each geography in each year—as the corresponding offset or measure of exposure. The parameter ϕ controls the shape of the negative binomial distribution and is estimated from the data. Our key coefficient αv measures the effect that status as a PwSMI has on the likelihood of use of force/injury, while δt measures year specific effects and βi captures location specific effects. We assume these location specific effects follow a normal distribution with mean 0 and standard deviation \( {\sigma}_{\beta}^2 \) (which is estimated from the model). We set weakly informative priors for all model coefficients and estimate the posterior distribution of model parameters in the R programming language using the brms package [21]. Convergence statistics are displayed in the online Additional file 1.
In addition to estimating the disparity in police use of force and injury for PwSMI, we also estimate disparities in police use of force against Black, Latinx, and White residents, so as to better contextualize the size of the disparity. We estimate these disparities using a similar model as above, but where yrti are use of force events/injuries for populations defined by race r and where nrti is the share of the population belonging to each racial group, and αr measures the effect of race on likelihood of use of force/injury.