Study design and population
EMBLEM (European Mania in Bipolar Evaluation of Medication) was a 2-year prospective, observational study on the outcome of pharmacological treatment of mania across 14 European countries. A total of 3459 eligible in- and outpatients were enrolled at the discretion of the treating psychiatrist. Patients were eligible for participation if they were at least 18 years old and they initiated/changed oral medication for treatment of acute mania in bipolar disorder (antipsychotics, anticonvulsants and/or lithium; not antidepressants or benzodiazepines) within the standard course of care. During the acute treatment phase, assessments took place at baseline and 1, 2, 3, 6 and 12 weeks after baseline. The maintenance phase consisted of assessments at 6, 12, 18, and 24 months after baseline. ERB approval and patient informed consent were obtained according to local legal requirements. The study design has been described in detail in previous reports [10, 11].
Investigators were requested to assess presence and severity of parkinsonism, akathisia, dystonia and TD that they judged to be associated with medications used to treat bipolar disorder. This assessment was based on the investigator's clinical experience and judgment and rated as follows: 0 = not present; 1 = present, but does not significantly interfere with patient's functioning; 2 = present, and significantly interferes with patient's functioning. Guided by previous analyses using these measures , movement disorder variables were analyzed as dichotomous indicators (0 = not present versus 1 = present; the latter combining scores of '1' and '2'). These assessments were performed at baseline and all subsequent visits.
The scales used to measure extrapyramidal symptoms were simple and did not include instructions on or specific anchors for differential diagnosis. Therefore, in order to avoid diagnostic misclassification, only persistent dystonia and TD were used in the current analyses. Using the persistence measure ensured differentiation from the acute syndromes, as, for example, acute dystonia would be unlikely to persist over two subsequent visits whereas tardive dystonia would. Persistence was defined as the presence of the individual syndromes over at least 2 consecutive visits. Thus, the variable "persistent dystonia" was rated as follows: 0 = no or acute/incidental dystonia; 1 = persistent dystonia. Similarly, "persistent TD" was rated as 0 = no or incidental TD; 1 = persistent TD. Persistent TD and persistent dystonia were analyzed together as a single group (hereafter TDD: 0 = no persistent TD or persistent dystonia present, 1 = persistent TD and/or persistent dystonia present). The rational for combining TD and dystonia comes from (i) their strong association [12, 13], (ii) shared risk factors and mechanisms [14, 15] and (iii) the fact that existing scales measuring tardive syndromes do not differentiate between TD and tardive dystonia [16, 17]. Parkinsonism and akathisia were also compiled into a single variable, hereafter named EPS (0 = neither parkinsonism nor akathisia present, 1 = parkinsonism and/or akathisia present).
Incidence of TDD
Incidence rates of TDD were determined by allocating each patient person-time for the TDD outcome according to the interval from baseline to the visit in which a patient was diagnosed with TDD. If no such diagnosis was made, the interval covered baseline to the final visit of each patient. Nine time bands were constructed (baseline – week 1; week 1 – week 2; week 2 – week 3; week 3 – 6 weeks; 6 weeks – 3 months; 3 months – 6 months; 6 months – 12 months; 12 months – 18 months; 18 months – 24 months). The incidence of TDD was calculated by dividing the total number of incident cases of TDD by the total person-years. The same procedures were followed for calculating separate incidence rates for tardive dystonia and TD. All analyses were conducted in the risk set of patients free of TDD at baseline.
Associations between clinical factors and incident TDD
Cox proportional hazard regression was used to assess survival time without TDD associated with various time-varying clinical variables. The following clinical measures were used as proxies for DA dysregulation:
(i) Psychotic and manic symptoms may be associated with high DA transmission in the mesolimbic pathway [5, 7]. Depression may be associated with lower DA transmission in the same tract, even though different receptor classes or sub-regions may be involved [5, 8]. In the current analyses, the CGI-BP severity of mania, CGI Hallucinations/delusions and CGI-BP depression were regarded as proxy measures for altered DA transmission within the mesolimbic DA tract. In addition, the CGI-BP overall illness was used as an overall measure of dysregulation in this tract. All CGI scores were rated for severity on a seven-point scale  and used at each visit including baseline.
(ii) Both amenorrhea and sexual dysfunction are associated with elevated prolactin levels induced by low DA transmission [19, 20] originating in the tuberoinfundibular DA tract. This link is likely stronger for amenorrhea, as sexual disturbances in patients with schizophrenia are of multifactorial origin, and are therefore only in part attributable to illness- or medication-related prolactin levels [19, 20]. Presence of amenorrhea and sexual dysfunction were used as proxy measures for altered DA transmission in the tuberoinfundibular tract (both defined as 0 = not present; 1 = present; measured at each visit).
(iii) Extrapyramidal symptoms, including TD, have been hypothesized to reflect low DA transmission in the nigrostriatal DA tract in the brain . Research indicates that EPS (defined as parkinsonism, akathisia and acute dystonia) represents a vulnerability to develop tardive movement disorders, in particular tardive dyskinesia, in patients with schizophrenia .
Therefore, presence of EPS as a proxy measure for dysfunctional DA transmission in the nigrostriatal tract, was tested for association with incident TDD.
(iv) Use of antipsychotics (APs) is known to affect dopamine transmission  and in addition is strongly associated with TD . Use of AP was assessed at each visit, and included in the analyses (0 = no AP use, 1 = first generation antipsychotic (FGA), 2 = second generation antipsychotic (SGA)).
The four clusters of proxy measures for DA dysfunction (bipolar symptoms, prolactin-related adverse effects, EPS and use of antipsychotics) were individually included as independent variables in the Cox models in order to determine associations with incident TDD. Finally, all variables were entered simultaneously in the model in order to determine which associations persisted independently of other factors. Effect sizes were expressed as Hazard Ratio's (HR) and 95% confidence intervals. The two-sided significance level was 5%.
Adjustment by propensity score
Analyses were performed with and without confounders (adjusted and unadjusted analyses, respectively). For each analysis, all patients with non-missing values on the dependent and independent variables were included, as well as on all confounding variables in case of adjusted analyses. Confounders were based on a review of the literature within patient populations diagnosed with bipolar disorder, schizophrenia or psychotic disorders in general. The following confounders were introduced in the Cox regression models: social economic status (SES, expressed as educational achievement; 1 = no education, 2 = primary school, 3 = secondary school lower, 4 = secondary school upper, 5 = post-secondary vocational training, 6 = university), country, compliance (0 = no medication prescribed or always complies; 1 = never complies or 50% of the time), age per decade [22, 23], age of onset in years , gender [23, 25] and duration of illness in years . As a decrease in TDD incidence over time was anticipated, analyses were also adjusted for visit number.
It is common practice to increase the dosages of antipsychotics or lithium in response to increased symptom severity. It is widely accepted that antipsychotic use and lithium in itself are associated with an increased risk for developing movement disorders and other adverse effects . Thus, associations between higher symptom severity or the presence of adverse effects with a higher incidence of TDD may represent a confounding effect of AP or lithium use or dose burden. Therefore, except when testing associations between TDD and the use of antipsychotics, multiple treatment-related variables were included as confounders in order to eliminate spurious results for dopamine abnormalities related to the (changes in) use of antipsychotics and lithium. The following treatment-related time-varying variables were included as confounders: (i) use of APs (dichotomous variable: 0 = no use of AP, 1 = use of FGA and/or SGA); (ii) dichotomous variables indicating use (0 = no use, 1 = use) of the following individual treatments; amisulpride, clozapine, haloperidol, olanzapine, quetiapine, risperidone, ziprazidone, other AP or lithium (iii) dose of treatment used, expressed as dose equivalents; (iv) change in dose of treatment with respect to the previous visit, expressed as dose equivalents. In addition, except when testing for an association between the CGI-BP Overall illness and incident TDD, change in CGI-BP Overall illness score relative to the previous visit was included as confounder.
As many confounding variables were included in the models, traditional control for confounding by inclusion of covariates in the model may not be sufficient, as the degree of 'control' afforded by such models depends on the overlap in characteristics between the two outcome groups. The use of the propensity score has been suggested as a means to obtain more complete control in these circumstances . The propensity score for an individual, defined as the conditional probability of (in this case) developing TDD given the individual's covariates, can be used to balance the covariates in observational studies, and thus reduce bias . In other words, by using propensity scores, a collection of covariates is replaced by a single covariate, being a function of the original ones, while minimizing the loss of degrees of freedom. As the propensity score model could not create sufficient balance between the groups due to the variable 'haloperidol', haloperidol was not included in the propensity score model and adjusted for separately in the model, together with the dependent variable, the independent variable under investigation and the propensity score representing the other specified confounders.
Not all countries participated in the maintenance phase of the study (12 weeks onwards), resulting in a decrease in sample size after the 12 weeks (Switzerland, Denmark, Germany and Spain only participated in the acute phase). Apart from the decrease in overall sample size, the samples for the individual analyses varied somewhat on the basis of the availability of complete data for variables included in the separate models. All analyses were performed using the computer package STATA, version 10.0 .
TDD validity: sensitivity analyses
Additionally, sensitivity analyses were conducted using a stricter criterion for incidence in order to exclude any possibility of bias due to carry-over from influences occasioned by factors acting during the period before baseline. To this end, a stricter risk set was defined as the sample of patients free from dystonia or TD at baseline as well as at visit 2 (one week post-baseline). First occurrence of any incident tardive syndrome could therefore occur at visit 3 (two weeks post-baseline), while for the purpose of the current analyses incidence could first occur at visit 4, due to the requirement of persistence of symptoms for at least 2 consecutive visits. Consequently, misclassification of the acute form of dystonia, which usually has an onset within 5 days of new antipsychotic treatment , could be ruled out with even more confidence.