The Avon Longitudinal Study of Parents and Children (ALSPAC) is a prospective birth cohort study, investigating genetic and environmental influences on health and development across the lifespan. All pregnant women residing in the Avon catchment area in South-West England, with an estimated delivery date between April 1991 and December 1992, were eligible for inclusion. Individuals were recruited through media information, community outreach, and promotional material supplied at routine antenatal and maternity health services. Out of 20,248 eligible pregnancies, 14,541 (71.8%) were initially recruited. Of those, 68 had no known birth outcome. The remaining 14,472 pregnancies consisted of 14,676 fetuses, with 14,062 live births, of whom 13,988 were alive at age 12 months. The current sample was restricted to singletons or first-born twins, resulting in an overall sample size of 13,793 participants (51.6% boys). Prior to 2014, questionnaires were sent out to parents/carers by post. If a response was not received within 7 days, two reminder letters were sent and eventually participants were called or visited at their homes. Questionnaires from 2014 onwards were available online or in paper format, and collected and managed using REDCap electronic data capture tools hosted at the University of Bristol . Participants were sent four reminders at three-week intervals. Further details on the cohort can be found elsewhere [32, 33].
Measures: conduct problems
Conduct problems were measured at ages 4, 7, 8, 10, 12, 13, and 17 years, using the parent-rated conduct problems subscale of the Strengths and Difficulties Questionnaire (SDQ) [34, 35]. This widely used scale consists of five items asking about the child’s behavior over the last six months: (1) “often has temper tantrums or hot tempers”; (2) “generally obedient, usually does what adults request” (reverse coded); (3) “often fights with other children or bullies them”; (4) “often lies or cheats”; and (5) “steals from home, school or elsewhere”. All items are rated on a 3-point scale (0–2), from not true to somewhat true and certainly true, yielding overall scores ranging from 0 to 10. Previously reported developmental trajectories of conduct problems from ages 4–13 years in ALSPAC dichotomized the conduct problems subscale as ‘high risk’ versus ‘not high risk’ . In order to maximize variability in conduct problems, but also account for the highly skewed distribution, we used the updated 4-band categorization that has been validated for ages 4–17 years , with scores of 0–2 classified as ‘close to average’, 3 as ‘slightly raised’, 4–5 as ‘high’, and 6–10 as ‘very high’. The mean internal consistency was modest (α = 0.54, range = 0.50–0.59), which may be attributed in part to the scale’s efforts to cover a wide range of problem behaviors across childhood and adolescence. Nonetheless, in their review, Stone et al.  reported a similar value of α = 0.58, and demonstrated acceptable reliability and validity of the SDQ conduct problems subscale on the basis of a more rigorous psychometric assessment.
Validation of derived conduct problem trajectories
We used the Edinburgh Study of Youth Transitions and Crime (ESYTC) questionnaire to validate the derived conduct problem trajectories . The ESYTC was administered via self-report at ages 14 (N = 5604) and 18 (N = 3743) years, and included six items, asking, for example, whether the participant “deliberately damaged or destroyed property” or had “broken into a car or van with intention of stealing something out of it”. Items are rated on a 4-point scale, from not at all, to just once, 2–5 times, and 6 or more times. Cronbach’s alphas were 0.52 and 0.45 at ages 14 and 18 years, respectively. We chose to dichotomize this measure – antisocial behavior was either considered ‘present’ (at least just once for one or more items) or ‘absent’ (not at all for all items) – due to a highly skewed distribution.
Measures: child abuse
We measured physical, psychological, and sexual abuse occurring in childhood (defined as before age 11 years) and adolescence (defined as between ages 11–17 years) at age 22 years by retrospective self-report. The measure has been used previously in the Growing Up Today Study, a US population-based cohort . Since we were interested in time-dependent associations between child abuse and conduct problem trajectories, continuous scales had to be converted into binary variables. Similar to prior research examining the developmental timing of abuse in relation to conduct problems, which distinguished between abuse occurring up to age 11 years and between ages 12–17 years [26,27,28,29], we created three abuse exposure categories. These included childhood-only (i.e., only before the age of 11 years), adolescence-only (i.e., only between ages 11–17 years), and ‘persistent’ abuse (i.e., abuse in both developmental periods). For our primary analysis, we computed an aggregate measure of any abuse (i.e., either physical, psychological, or sexual abuse) as preliminary analyses indicated high correlations between abuse subtypes (see Supplementary Fig. 1 for the correlation matrix), in addition to low frequencies of some abuse subtypes. Nonetheless, we also performed exploratory analyses testing for associations between abuse subtypes and conduct problem trajectories to examine whether certain subtypes were more influential than others.
We used two items to assess physical abuse, asking whether an adult in the family “hit you so hard it left you with bruises or marks?” or “actually kicked, punched, or hit you with something that could hurt you, or physically attacked you in another way?”. Items were rated on a 5-point scale from never to rarely, sometimes, often, and very often. In line with previous studies [40, 41], physical abuse was coded as ‘present’ or ‘absent’.
Four items were used to assess psychological abuse, asking participants whether an adult in the family “shouted at you?”; “said hurtful or insulting things to you?”; “punished you in a way that seemed cruel?”; and “threatened to kick, punch, or hit you with something that could hurt you or physically attack you in another way?”. Again, items were rated on a 5-point scale (0–4), from never to very often. Considering the complex nature of psychological abuse, we followed Roberts et al.  and computed a sum score ranging from 0 to 16, with participants scoring in the top decile (i.e., scores of ≥7 in our sample) being classified as having experienced psychological abuse.
We used two items to assess sexual abuse, including “Were you touched in a sexual way by an adult or an older child or were you forced to touch an adult or older child in a sexual way when you did not want to?” and “Did an adult or an older child force you or attempt to force you into any sexual activity by threatening you or holding you down or hurting you in some way when you did not want to?”. In line with previous work , sexual abuse was coded as ‘present’ or ‘absent’.
Information on all covariates was collected by maternal self-report during pregnancy, except for child sex, which was obtained from the birth certificate. Housing tenure was assessed at 8 weeks gestation. Participants were asked whether their house was bought/mortgaged, owned, rented, or other. We dichotomized this variable into ‘mortgaged/owned’ or ‘other’. Maternal severe depression was assessed at 12 weeks gestation. Participants were asked whether they had ever had severe depression. Yes, had it recently and Yes, in the past, not now was coded as ‘yes’ and No, never was coded as ‘no’. At 18 weeks gestation, mothers were asked whether they had smoked tobacco in the first 3 m of pregnancy. Cigarettes, Cigars, Pipe, and Other were coded as ‘yes’ and No was coded as ‘no’. Maternal education was assessed at 32 weeks gestation using educational qualifications in common use at the time in the UK. Considering different school systems across countries, we coded this variable as ‘no high school’ (CSE/none or vocational), ‘high school’ (O-level), or ‘beyond high school’ (A-level or degree).
Data analysis plan
We applied latent class growth analysis (LCGA) to identify developmental trajectories of conduct problems, using a bias-adjusted 3-step approach [42, 43]. This method accounts for misclassification error rates in latent class membership when estimating the effect of covariates [42, 43].
First, an unconditional latent class model was estimated (i.e., the meaning of classes was exclusively based on the SDQ conduct problems subscale, without being influenced by covariates). We addressed missing data in this model using a full information maximum likelihood estimator with robust standard errors (i.e., parameters were estimated using all available data). This missing data method has been shown to produce unbiased parameter estimates compared to listwise deletion, especially under the missing at random data loss mechanism and where there are higher rates of missing data . We modeled linear, quadratic, and cubic patterns of change, each with between one and six class solutions. The following model fit indices were used to select the optimal class model: Bayesian Information Criterion (BIC) and sample size adjusted BIC (SSABIC), which are used to reduce the risk of overfitting the model to a single sample (lower values indicate a better model fit), and the Lo-Mendell-Rubin Likelihood Ratio Test (LMR-LRT), adjusted LMR-LRT, and Bootstrapped Likelihood Ratio Test (BLRT), which compare two adjacent class models (significant p-values indicate a better fit of the k class model compared to the k-1 class model). We further considered entropy values (0.40, 0.60, and 0.80 represent low, medium, and high class separation, respectively), sample size of the smallest class, and interpretability of each class trajectory .
Second, after the best-fitting model was identified, the class membership information (i.e., most likely class) of each participant and misclassification error rates of each latent class were retrieved.
Third, to preserve the class membership information of the unconditional latent class model (step 1), we used the misclassification error rates obtained in step 2 when examining associations between child abuse and conduct problems trajectory membership. We addressed missing data in this conditional model using inverse probability weighting (IPW). Complete-case analysis may produce biased estimates if excluded cases are systematically different from those which were included. IPW can minimize this bias by allocating sampling weights to complete cases and thereby restoring total sample estimates . IPW has been recommended over other techniques for handling missing data (e.g., multiple imputation) when participants have missing data on entire assessment waves, as opposed to single items, which is especially common in longitudinal research  (see Supplementary Table 1 for information on how weights were derived). We used multinomial logistic regression to estimate the association between childhood-only, adolescence-only, and ‘persistent’ abuse and latent classes of conduct problems. Multinomial logistic regression estimates multinomial odds ratios (or relative risk ratios); however, we refer to effects as odds ratios (usually used for two exhaustive categories) throughout the results section for clarity. We primarily focused on the ‘any abuse’ category, but subsequently tested for associations between abuse subtypes and conduct problem trajectories. All analyses were adjusted for child sex, housing tenure, maternal severe depression, maternal smoking, and maternal education.
The conduct problems trajectory model was based on 10,648 participants (77.2% of the total ALSPAC sample; 51.4% boys), with missing data addressed using full information maximum likelihood. Complete data for physical, psychological, and sexual abuse and all covariates was available for 3127 participants (29.4% of those included in the conduct problems trajectory model; 35.9% boys). Those with versus without missing data on child abuse and/or covariates showed higher rates of conduct problems across all time points, albeit with small effect sizes (rs ranging between 0.08–0.09, all ps < .001). Furthermore, participants with missing data were more likely to be male (OR 2.47) and more likely to be classified as early-onset persistent (OR 1.56) or childhood-limited (OR 1.24), and less likely to be classified in the low conduct problems trajectory (OR 0.78) than participants without missing data (all ps < .01; see Supplementary Table 2 for all pairwise comparisons). The sample sizes in adjusted analyses for any, physical, psychological, and sexual abuse were 3172, 3275, 3295, and 3279, respectively. See Supplementary Figure 2 for the retention flow chart across measures/analyses.