Sample
The NSS was administered to representative samples of new U.S. Army soldiers beginning Basic Combat Training (BCT) at Fort Benning, GA, Fort Jackson, SC, and Fort Leonard Wood, MO between April 2011 and November 2012. Recruitment began by selecting weekly samples of 200–300 new soldiers at each BCT installation to attend an informed consent presentation within 48 h of reporting for duty. The presenter explained study purposes, confidentiality, and voluntary participation, then answered all attendee questions before seeking written informed consent to give a self-administered computerized questionnaire (SAQ) and neurocognitive tests and to link these data prospectively to the soldier’s administrative records. These study recruitment and consent procedures were approved by the Human Subjects Committees of all Army STARRS collaborating organizations. The 21,790 NSS respondents considered here represent all Regular Army soldiers who completed the SAQ and agreed to administrative data linkage (77.1% response rate). Data were doubly-weighted to adjust for differences in survey responses among the respondents who did versus did not agree to administrative record linkage and differences in administrative data profiles between the latter subsample and the population of all new soldiers. More details on NSS weighting are reported elsewhere [18]. The sample size decreased with duration both because of attrition and because of variation in time between survey and end of the follow-up period. The sample included 18,838 men (decreasing to 16,479 by 12 months, 15,306 by 24 months, and 3729 by 36 months) and 2952 women (decreasing to 2300 by 12 months, 2094 by 24 months, and 687 by 36 months).
Measures
Outcomes
Outcome data were abstracted from Department of Defense criminal justice databases through December 2014 (25–44 follow-up months after NSS completion). Dependent variables were defined as first occurrences of each of the three outcomes for which predictive models had previously been developed from both administrative data and NSS data: major physical violence (i.e., murder-manslaughter, kidnapping, aggravated arson, aggravated violence, or robbery) perpetration by men, sexual violence perpetration by men, and sexual violence victimization of women, each coded according to the Bureau of Justice Statistics National Corrections Reporting Program classification system [19]. The perpetration outcomes were defined from records of “founded” offenses (i.e., where the Army found sufficient evidence to warrant full investigation). The victimization outcome was defined using any officially reported victimization regardless of evidence.
Predictors
As reported in previous publications, separate composite risk scores for each outcome were developed based on models from either the STARRS Historical Administrative Data System (HADS) [8, 9, 12] or the NSS [13]. The details of building the models that generated these scores are reported in the original papers and will not be repeated here other than to say that they involved the use of iterative machine learning methods [20] with internal cross-validation to predict the outcomes over a one-month risk horizon in a discrete-time person-month data array [21]. The HADS models were developed using all the nearly 1 million soldiers on active duty during the years 2004–2009 and were estimated for all years of service rather than only for the first few years of service, whereas the NSS models were developed using the NSS sample. We then applied the coefficients from these models to the data from the soldiers in the present samples to generate composite prediction scores. Thus, each person-month had a single score from each model representing the predicted log odds of the outcome occurring (note that this score changed each month for the HADS models, but remained the same within each person for the NSS models because the NSS was administered only once). Each score was then standardized by a mean of 0 and variance of 1 in the total sample. These composite prediction scores were used as the input in the current analysis. In other words, for each of the models reported here, there were two possible two independent variables (plus their transformations and interactions): the standardized log odds of the event occurring according to the HADS model and the standardized log odds of the event occurring according to the NSS model.
The potential predictors selected for inclusion in the iterative model-building process for the HADS and NSS models operationalized 8 classes of variables found in prior studies to predict the outcomes: socio-demographics (e.g., age, sex, race-ethnicity), mental disorders (self-reported Diagnostic and Statistical Manual of Mental Disorders, 4th edition [DSM-IV] disorders in the NSS and medically recorded International Classification of Diseases [ICD] disorders in the HADS models), suicidality/non-suicidal self-injury (self-reported in the NSS and medically recorded in the HADS models), exposure to stressors (assessed in detail in the NSS models with questions about childhood adversities, other lifetime traumatic stressors, and past-year stressful life events and difficulties; assessed in the HADS models with a small number of available markers of financial, legal, and marital problems, information about deployment and stressful career experiences, and military criminal justice records of prior experiences with crime perpetration and victimization), military career information (for new soldiers, Armed Forces Qualification Test [AFQT] scores; physical profile system [PULHES] scores used to indicate medical, physical, or psychiatric limitations; enlistment military occupational specialty classifications; and a series of indicators of enlistment waivers; and for the HADS models, increasing information over the follow-up period about promotions, demotions, deployments, and other career experiences), personality (only in the NSS models), and social networks (only in the NSS models). Results of performance-based neurocognitive tests administered in conjunction with the NSS were also included in the NSS models [22]. More detailed descriptions of the HADS and NSS predictors, the final form of each model (i.e., the variables that were ultimately selected for inclusion by the algorithms), and predictive performance are presented in the original reports [8, 9, 12, 13].
Analysis methods
Analysis was carried out remotely by Harvard Medical School analysts on the secure University of Michigan Army STARRS Data Coordination Center server. Given that respondents differed in number of months of follow-up, we began by inspecting observed outcome distributions by calculating survival curves using the actuarial method [23] implemented in SAS PROC LIFETEST [24]. We projected morbid risk to 36 months even though some new soldiers were followed for as long as 44 months because the number followed beyond 36 months was too small for stable projection. Discrete-time survival analysis with person-month the unit of analysis and a logistic link function [21] was then used to estimate a series of nested prediction models for first occurrence of each outcome. Models were estimated using SAS PROC LOGISTIC [24].
The model-testing process involved two steps: first, determining the best model using the HADS risk score only, and then finding the optimal strategy for combining NSS data with the best model from the first step. Specifically, we began with a model including only the composite predicted risk score based on the HADS (expressed as a predicted log odds standardized to have a mean of 0 and a variance of 1), controlling (as in all subsequent models) for time in service; we then estimated models including a quadratic effect of HADS risk score, an interaction of the risk score with time, and their combination. In the second step, we tested the effect of adding the NSS composite predicted risk score to the best HADS model, followed by combinations of a quadratic NSS term, an interaction of NSS score with HADS score, and an interaction of NSS risk score with historical time. Importantly, whereas the values of the NSS composite risk score did not change with time in service because the NSS was administered only once, the values of the HADS composite risk score did change due to the addition of new administrative data each month. We tested the significance of interactions between the composite risk scores and time in service to evaluate the assumption that the HADS composite risk score might become more important over time and the NSS composite risk score less important. Design-based Wald χ2 tests based on the Taylor series method [25] were used to select the best-fitting model for each outcome. This method took into consideration the weighting and clustering of the NSS data in calculating significance tests.
Once the best-fitting model for each outcome was selected, we exponentiated the logistic regression coefficients and their design-based standard errors for that model to create odds-ratios (ORs) and 95% confidence intervals (95% CIs). We then divided the sample into 20 separate groups (ventiles), each representing 5% of respondents ranked in terms of their risk scores in the best-fitting models, and calculated concentration of risk for each ventile: the proportions of observed cases of the outcome in each ventile. If the models were strong predictors, we would expect high concentration of risk in the upper ventiles. Concentration of risk was calculated and compared not only for the best-fitting models but also for the HADS-only models to determine the improvement in prediction strength achieved by adding information from the NSS rather than relying exclusively on HADS risk scores. We also calculated concentration of risk for the NSS-only models for comparative purposes. Finally, we calculated positive predictive value: the proportion of soldiers in each ventile that had the outcome over the follow-up period. As with morbid risk, positive predictive value was projected to 36 months using the actuarial method to adjust for the fact that the follow-up period varied across soldiers.