The present paper deals with the analysis of behavioural data of the International Multi-centre ADHD Genetics (IMAGE) project. The main focus is on the impact of the multi-centre design and the diagnostic procedure on the homogeneity of the data. Aggregating data from several recruiting centres is an important research strategy in order to enlarge sample sizes and, thus, to increase statistical power which is needed for generalising results, i.e., for achieving a needed level of significance.
The sample size is essential particularly in genetic linkage analyses of complex traits like ADHD when searching for markers contributing only to a small extent to the risk of ADHD . While statistical power can be enlarged by increasing the sample size, it may be also reduced by factors influencing sample homogeneity by introducing uncontrolled or uncontrollable variance. The following discussion of the results will mainly focus on issues of sample homogeneity.
Despite the identical inclusion criteria (in terms of the numbers of ADHD symptoms) for children of all ages, we found a negative correlation between age and the mean number of hyperactive symptoms in the probands sample: older probands had lower numbers of symptoms than younger probands. At first sight, this could be interpreted as decreasing disease severity with age in our sample.
The interpretation of this age effect, however, must take into account the interplay between population characteristics and the diagnostic procedure. Previous studies have shown that age is an important factor moderating the symptoms of ADHD, resulting in a general symptom decline . This is clearly underlined by normative sample data used in the IMAGE project, e.g., the CTRS : a six year old girl having a score of thirteen on the CTRS DSM-IV hyperactivity scale deviates two standard deviations from the mean (T = 70), whereas a sixteen years old girl having the same score deviates twice as much, i.e. four standard deviations, from the mean (T = 90). Age effects in inattentive scores of the normative sample are less pronounced, but in the same direction. Thus, many adolescents probably had more hyperactive/impulsive or inattentive symptoms when they were younger.
These age effects have two important consequences in terms of disease severity: First, adolescents may deviate to a stronger extent from the normative mean in comparison to young children with the same number of ADHD symptoms present. Secondly, some evidence for genotype differences in groups differing in age but not in the number of ADHD symptoms can be derived. Therefore, if ADHD is seen as a quantitative trait, a probabilistic positive association may be assumed between the degree a phenotype (e.g. measured by dimensional questionnaire scores) deviates from the population mean and the number of alleles present, which are associated with the trait [19, 64]. If a sample of individuals with an identical number of ADHD symptoms, but of different age, is virtually retraced to the age of five years, the mean number of ADHD symptoms in the older individuals would probably become smaller due to the negative correlation between age and symptoms in the population. In this virtual sample, as the quantitative trait hypothesis states, individuals with lower symptom numbers, i.e. a less deviating phenotype, have a lower probability of carrying an allele associated with the trait . Because this is true for all susceptibility genes, it may be argued that individuals with fewer symptoms have a lower overall probability of carrying alleles associated with ADHD than individuals with more symptoms, irrespective of genetic interactions and environmental factors. As a final consequence, adolescents of our proband sample may have a higher genetic risk for ADHD than young probands with the same number of ADHD symptoms.
This implicit age effect inferred from epidemiological studies and a normative sample was moderated by the (small) negative correlation between age and the number of hyperactivity symptoms in our sample. As a consequence of these two features, one has to assume that, on the one hand, adolescents in our proband sample differed on average to a smaller extent from young children than inferred from the normative sample. On the other hand, individual differences between adolescents and young children with an identical number of symptoms remained. Therefore, we must conclude that the disease severity in terms of the deviation from normality increased with age, and that this effect was not represented in the number of ADHD symptoms present.
We found no age differences between boys and girls in the proband sample. But among the 11 centres there were ten significant pairwise differences in mean age ranging from 1.4 to 2.6 years, with effect sizes between 0.9 and 0.5. As a consequence, one would expect that centres with rather young probands, e.g. ESP_V, would have a relatively lower mean number of hyperactive symptoms, indicated by a lower rank after age correction, and vice versa (see Figure S2 in the Additional file 2). In fact, the change of rank position in only two centres (ESP_V and GER_E) was consistent with this hypothesis regarding the direction of rank change. However, the mean number of hyperactivity symptoms in these two centres was almost identical, so that the centre effects on hyperactive symptoms were moderated only marginally by age, probably at least partly due to the restricted range of symptom numbers in probands. Again, the effect of age differences between centres on hyperactive symptom number differences between centres was smaller than expected due to the moderating effect of declining hyperactive symptom numbers with age in our proband sample.
Gender was an additional source of heterogeneity with respect to ADHD symptoms and the comorbid conditions which are usually more frequent in boys . Again, the normative sample underlying the DSM-IV scores of the CTRS  illustrates the differences attributable to gender: a T-score of 70 in the DSM-IV inattention scale is associated with a raw score of 16 in six year old boys, but with a raw score of only 8 in girls of the same age; in the DSM-IV hyperactivity/impulsivity scale the analogue scores are 18 in boys but only 10 in girls.
The proband sample of the present study had a homogeneous gender structure due to an absence of age differences between boys and girls and equal gender ratios across centres. Consequently, we can exclude that centre effects or gender effects on dependent variables were confounded by age effects.
The investigation of gender effects in the probands revealed no direct effects in most of the variables associated with the diagnostic procedure (i.e., the number of hyperactive symptoms, the age at inattention and hyperactivity detection, medication, all PACS ADHD symptoms, sixteen out of eighteen CPRS symptoms, and three of the four comprehensively assessed comorbidities, namely CD, ODD, and ANX). Exceptions were higher frequencies of inattentive symptoms (PACS and CTRS combined) in boys compared to girls and higher frequencies in boys for two thirds of the ADHD symptoms in the CTRS.
These differences were consistent with a meta-analysis reporting higher ADHD symptoms in boys compared to girls . Concordant with gender differences in the normative sample, we conclude that girls in our proband sample deviated to a greater extent from normality than boys, even though the girls' symptom counts were similar or slightly lower than those of the boys - and that the use of equal symptom criteria in boys and girls introduced heterogeneity into the probands sample.
The multi-centre design is another possible source of sample heterogeneity. Analyses of the proband sample showed that centre effects played a more important role than age or gender effects. The centres differed significantly in age, in both inattentive and hyperactive/impulsive symptom numbers, in age of detection of both inattentive and hyperactive/impulsive symptoms, in fifteen out of eighteen ADHD symptoms in the PACS interview, in seventeen out of eighteen ADHD symptoms in the CPRS, in all ADHD symptoms in the CTRS, in five out of nine combined (PACS and CTRS) inattentive, and six out of nine combined hyperactive/impulsive symptoms, and in all comprehensively assessed comorbid conditions (CD, ODD, ANX, and MOOD).
Even if we would assume that centres did not differ with respect to genetic, socio-cultural, and methodological aspects, differences in gender and age ratios between centres, combined with differing sample sizes, could enhance the sample variance and introduce additional heterogeneity due to variables that were associated with gender or age. Such indirect effects were, however, either absent in the proband sample (gender), or only played a minor role with respect to ADHD symptoms (age), as discussed above. Consequently, we must assume that other factors caused the differences in psychopathology measures between centres (e.g. genotype differences , socio-cultural population differences, regional demographic factors, or specific health care structures leading to specific recruiting strategies). Not at least, different implicit normative backgrounds associated with sociocultural factors may have led to different ratings of objectively identical behaviour.
The hypothesis of a genotypic north-south factor  had no evident phenotypic equivalent in the probands with respect to symptom numbers, age of symptom detection, and frequencies of individual diagnostic symptoms. Centre differences in any of these variables did not build recognizable geographic patterns and neighbouring centres did not cluster more than distant centres did. Even national clusters were not recognisable to an extent that would justify dividing our sample into units of countries instead of centres. For example, there were some considerable and significant age differences (adjusted for multiple testing) between the two centres from both Germany and Israel, within-country-differences in the mean number of inattentive symptoms in Israel and the Netherlands, differences in the mean number of hyperactive symptoms (Netherlands, Israel, Germany), and also for age at symptom detection (Israel).
In summary, there were notable differences between centres in ADHD and comorbid symptoms. Although the variations of ADHD symptoms across centres remained within the diagnostic boundaries of ADHD-CT, we conclude that the significant centre differences result in a broader phenotypic range compared to a hypothetical sample of the same size with a single recruiting centre only. In particular with respect to genetic analyses and analyses of endophenotypes, power is, on the one hand, increased by expanding the sample size but, on the other hand, decreased by using a multi-centre recruiting strategy. Choosing a single-centre strategy, even if more time is needed for recruiting, is probably still the favourite strategy with respect to statistical power.
We investigated the differential influence of diagnostic symptoms in the diagnostic process. The higher discriminatory weights of hyperactive symptoms compared to inattentive symptoms in a binary logistic regression indicate that only few and predominantly hyperactive symptoms were needed to discriminate between probands and controls. Concordant with other findings, this result challenges the diagnostic system of the DSM-IV weighing all symptoms equally in an additive algorithm [66–68]. Interpretations going beyond this general statement, however, are neither appropriate nor intended due to methodological restrictions of the present study. For example, the 79 siblings of the control sample were part of those 339 siblings, who underwent the full diagnostic procedure due to suspected ADHD. Even if they did not reach the diagnostic threshold, many of them probably were subclinical cases, as indicated by a mean ADHD-symptom frequency of 51%, which corresponded to 9 positive symptoms out of 18 on average.
To investigate informant effects and instrument effects, we compared the two diagnostic sources PACS and CTRS and compared them to a third source, the CPRS, which was not implemented in the diagnostic procedure, but was used for screening. We found higher symptom frequencies in the parents' ratings compared to the teachers' ratings in the proband sample, independent of whether the CPRS or the PACS was compared to the CTRS. These findings are in accordance with known contrasts often seen in parent ratings that result from the direct comparison between two children and lead to a relative overestimation of the probands' symptoms compared to the siblings' symptoms [69, 70]. In addition, medication may play a role, because some of the children were medicated continuously at school, but not at home.
Within the parents' ratings, the interview led to higher frequencies for 13/18 symptoms than the questionnaire. This higher sensitivity of the PACS for 2/3 of the symptoms may be a consequence of the more objective diagnostic conceptualization of the PACS, which assesses symptoms to a lesser extent by an implicit deviance rating, as the questionnaire does, but rather by asking how frequent and how intensive a symptom occurs. In contrast to this general tendency, the ability to sustain attention (IA2) is recorded much more frequently by the CPRS (84% in boys, 79% in girls) than by the PACS (46%, 42%). A comparison between symptoms and between the three diagnostic sources suggests that the sensitivity of the PACS is too weak for this symptom. The higher general sensitivity of the interview compared to the questionnaire, however, does not imply a lack in the utility of questionnaires as screening instruments. In general, the cut-off criteria of screening instruments were set far below the diagnostic threshold (we used a T-score of 63), so that all subjects, who reach the diagnostic threshold of the interview, but not of the questionnaire, were positively screened in all cases.
Informant and contrast effects were analysed and discussed on the basis of continuous data in the second part of this contribution, which concentrates on questionnaire scores .
Some further comments on the diagnostic procedure with respect to heterogeneity in the proband sample should be made. The diagnostic procedure of the IMAGE study used a teachers' questionnaire (CTRS) and a parental interview (PACS) in combination, by counting a symptom as present if it was present either in the CTRS or in the PACS. To prevent diagnoses based on a single informant only, at least two of the symptoms had to be present in both settings. This algorithm allows children to be positively diagnosed, even if their symptoms level is below the diagnostic threshold at home, at school, and even in both settings. This effect becomes evident when the frequencies of combined (PACS or CTRS) diagnostic symptoms (Table 3) is compared to the frequencies of each source alone (Figure 1). Whereas the most infrequent symptom occurs in about 80% of the probands when PACS and CTRS are combined, the lowest frequencies in PACS alone (<50%), CPRS alone (<60%), and CTRS alone (<40%) are clearly lower. The applied procedure may have excluded children without pervasive problems, but, on the other hand, also broadened the variety of symptom patterns in the proband sample and included some cases classified as subclinical in one or both settings. If we assume that some genetic variants interact differentially with the environmental conditions (home or school), the applied diagnostic algorithm classifying symptoms independently of their environmental condition and the type of informant may have introduced an uncontrolled variance in the phenotype.
Finally, some comments should be made concerning the sibling sample of the present study. The sample of 339 diagnosed siblings was heterogeneous in various ways. The subtypes differed across centres and between genders. There were, in particular, higher rates of ADHD-CT in boys than in girls. The mean number of both inattentive and hyperactive/impulsive symptoms differed between gender (higher frequencies in boys) and between centres. Differences in implementing the criteria for conducting the sibling interview (e.g. 'clinical suspicion of ADHD') may have introduced a bias leading to the large differences in subtype frequencies across centres. Additionally, differences in personnel resources, in combination with the declared purpose of the sibling interview (excluding ADHD cases in the sibling sample), may have introduced a centre bias. For these reasons, we do not further discuss the findings on the selected siblings but refer to the second part of this contribution dealing with analyses of the complete sample of 1446 siblings based on questionnaire data .