Association study in the 5q31-32 linkage region for schizophrenia using pooled DNA genotyping
© Zaharieva et al. 2008
Received: 31 October 2007
Accepted: 25 February 2008
Published: 25 February 2008
Skip to main content
© Zaharieva et al. 2008
Received: 31 October 2007
Accepted: 25 February 2008
Published: 25 February 2008
Several linkage studies suggest that chromosome 5q31-32 might contain risk loci for schizophrenia (SZ). We wanted to identify susceptibility genes for schizophrenia within this region.
We saturated the interval between markers D5S666 and D5S436 with 90 polymorphic microsatellite markers and genotyped two sets of DNA pools consisting of 300 SZ patients of Bulgarian origin and their 600 parents. Positive associations were followed-up with SNP genotyping.
Nominally significant evidence for association (p < 0.05) was found for seven markers (D5S0023i, IL9, RH60252, 5Q3133_33, D5S2017, D5S1481, D5S0711i) which were then individually genotyped in the trios. The predicted associations were confirmed for two of the markers: D5S2017, localised in the SPRY4-FGF1 locus (p = 0.004) and IL9, localized within the IL9 gene (p = 0.014). Fine mapping was performed using single nucleotide polymorphisms (SNPs) around D5S2017 and IL9. In each region four SNPs were chosen and individually genotyped in our full sample of 615 SZ trios. Two SNPs showed significant evidence for association: rs7715300 (p = 0.001) and rs6897690 (p = 0.032). Rs7715300 is localised between the TGFBI and SMAD5 genes and rs6897690 is within the SPRY4 gene.
Our screening of 5q31-32 implicates three potential candidate genes for SZ: SMAD5, TGFBI and SPRY4.
Schizophrenia (SZ) is a common, severe and disabling disorder that in most cases requires a long-term medical and social care. The lifetime risk for SZ in the population worldwide is around 1%. Family, adoption and twin studies have shown conclusively that a genetic component plays the most important role in its aetiology . At present, the number of susceptibility loci, the disease risk conferred by each locus and the degree of interaction between them remain unknown . The mode of transmission is complex and non-Mendelian and is probably contributed by a small number of genes of moderate effect, or by many genes of small effect, or a mixture of the two .
We concentrated upon a minimal region of interest between markers D5S666 and D5S436 as it includes five of the regions showing linkage to schizophrenia [5–9]. We decided to concentrate on this region, rather than on the full region, as it is the most likely region to contain susceptibility genes, due to the concentration of five linkage findings. We would have been unable to provide similar dense coverage of the whole interval with the funding we received for this project. This interval is ~14 Mb long and contains ~330 genes (UCSC built 35, May 2004), of which 52 constitute the protocadherin α, β, γ clusters. Protocadherins are expressed throughout the nervous system and are involved in synapse formation, specification and maintaining, which make them potential candidate genes for schizophrenia. This group of genes and their relation to schizophrenia has being investigated by several research groups [10–13]. Other promising candidate genes in this region are NRG2 (Neuregulin 2) and IL9 (P40 cytokine). The neuregulins are a family of growth and differentiation factors with a wide range of functions in the nervous system . Neuregulin signalling plays an important role in many neurological disorders including multiple sclerosis, traumatic brain and spinal cord injury, peripheral neuropathy, and possibly schizophrenia [14–16]. According to the glial growth factors deficiency and synaptic destabilization hypothesis of SZ, functional deficiency of glial growth factors and of growth factors such as neuregulin, insulin-like growth factor I, insulin, epidermal growth factor, neurotrophic growth factors, erbB receptors and others, are among the distal causes in the genotype-to-phenotype chain leading to the development of SZ . Cytokines are key molecules regulating immune/inflammatory reactions. They are involved in brain development, regulation of dopaminergic and GABAergic differentiation, and synaptic maturation. Certain cytokines are postulated to have a central role in the neurodevelopmental defects in SZ [18, 19].
The systematic association analysis of complex disorders requires genotyping of numerous genetic markers over particular genomic regions, or more recently the entire genome, in large samples. The cost of such studies is prohibitive for most laboratories. DNA pooling is a way to decrease the cost, time and labour that are involved in a large-scale genotyping . Briefly, in DNA pooling equimolar amounts of DNA are taken from each individual mixed to form two sets of pools, cases and controls. Predicted allele frequencies are then estimated on the basis of the intensities produced in each pool. DNA pooling is capable of detecting loci with small effect sizes and decreases the cost of the analysis by orders of magnitude. The power of pooling studies is approximately the same as for individual genotyping of affected and non-affected individuals, with a mean error rate of pooled analysis reported for different pooling techniques in the region of <2% [20–23].
In the present study, the initial DNA pooling and individual genotyping was performed using microsatellite markers, and fine mapping was performed with single nucleotide polymorphisms (SNPs). We reasoned that microsatellite markers have several advantages over SNPs for initial screening: they are highly polymorphic with corresponding high degree of heretozygosity (on average ~70%) and they can detect linkage disequilibrium (LD) over larger distances than SNPs (~100 kb range compared to ~30 kb range for SNPs) [24, 25] probably due to the fact that they have a high mutation rate, making it possible for certain of their alleles to capture associations of more recent origin. Therefore, we reasoned that we could cover the 14 Mb interval with fewer microsatellite markers, than with SNPs, thus making it possible to conduct the study within the budget of our research project.
We chose to use an initial screening step of applying pooled analysis to polymorphic microsatellite spanning 5q31-32, and a follow up stage of individual genotyping in the same samples from which the pools were constructed for markers showing nominally significant evidence for association (p < 0.05). SNPs in regions containing a confirmed nominally significant microsatellite marker (p < 0.05) were then examined further for association with SNPs in order to refine the position of the association signal.
For the initial screen, we used two parent-proband DNA pools: one from 300 SZ patients and the second one from their 600 parents. All trios are of Bulgarian origin. They were either inpatients from five different psychiatric hospitals, or outpatients from four of the largest psychiatric dispensaries in Bulgaria. All had a history of hospitalization for a schizophrenic episode. Each proband was interviewed with an abbreviated version of the SCAN instrument (Schedules for Clinical Assessment in Neuropsychiatry) . Consensus best-estimate diagnoses were made according to DSM-IV criteria (Diagnostic and Statistical Manual of Mental Disorders, 4th edn.1994)  by two raters using information from the interview and hospital notes. All patients included in the study met DSM-IV criteria for schizophrenia. Local ethics committee approval was obtained from all the regions where patients were recruited. All patients and their parents were given information sheets and provided written informed consent for participation in genetic association studies. DNA was extracted by standard phenol-chloroform method from peripheral blood. The selected microsatellite markers for individual genotyping were genotyped in the same sample of 300 trios families that were included in the pools, while the SNPs were genotyped in the full sample of 615 SZ trios that have been recruited by the team in Bulgaria. To inform our choice of SNPs for fine-mapping, we used data from another pooling study on 574 SZ trios (explained in more detail below). This larger pool includes the original pool of 300 trios used in the main microsatellite study, and a second pool of 274 trios from our full sample of 615 trios.
Microsatellite markers were selected from JBIRC database , developed on the basis of analysis of microsatellite markers covering the whole human genome , and the UCSC Genome Browser (built 35, May 2004) . We aimed to place markers at an average density of one every ~100 kb. We first selected validated polymorphic markers within our region of interest using all available information from JBIRC. In cases where the distance between validated markers was greater than 100 kb, we selected non-validated di-, tri- tetra- and penta-nucleotide repeats from UCSC and validated them ourselves by examining whether they were polymorphic. We selected a total of 140 microsatellite markers from the two databases. 50 markers were not included in the analysis because they produced heavy stutter bands or were not polymorphic. We successfully analysed 90 markers, of which 24 were chosen from JBIRC and 66 from UCSC database (39 are novel markers that have not been validated before). The average distance between the analysed 90 polymorphic markers was ~150 kb and 72% of the microsatellite markers were di-nucleotide repeats, the remaining were tri-, tetra- or penta-nucleotide repeats.
In order to improve our chances of identifying the most significant SNPs around significant microsatellites for the fine mapping, we selected SNPs from the pooling data of a genome-wide association (GWA) study of schizophrenia in the same Bulgarian trios, conducted by the Department of Psychological Medicine, Cardiff University towards the end of the microsatellite project (manuscript in press) . The results of the study on Illumina arrays became available just at the time when we started selecting SNPs for fine mapping. We wanted to saturate an extended area of ~400 kb around positive microsatellites, in order to identify the peak of the maximum significance in each region. The latest versions of the HapMap database contain hundreds of SNPs in these regions, making it impossible for us to provide dense coverage with the funding available. This is why we decided to make use of our Illumina data. We reasoned that SNPs which are not significantly associated in that study of overlapping samples, were most likely to remain negative after individual genotyping, therefore we could only target those SNPs that produced significant results (at a predicted p-value of ≤ 0.05). The GWA pooling study was carried out with Illumina HumanHap550 Genotyping BeadChip technology using pooled DNA from 574 SZ patients and a pool from all the parents of the cases . The pool of 574 trios includes the pool of 300 trios prepared for the microsatellite association study, plus an additional pool of 274 SZ trios that was prepared from the available SZ trios recruited by our team in Bulgaria. Although the complete number of trios available for individual genotyping was 615, only 574 trios are included in the pools, for the following reasons. 1) A small number of samples had lower DNA concentration, making them unsuitable for pooling. 2) A number of families are multiply affected, providing more than one trio for the analysis of individual genotyping. However, only one trio per family was included in pools.
For microsatellites we designed flanking primers labelled at their 5' end with a fluorescent dye. PCR primers were designed with the Primer 3 program . All primers were checked with BLAT in order to make sure their sequences were unique. Fragments were amplified using a standard PCR touch-down protocol on thermal cycler DNA Engine Tetrad machines (MJ Research, USA). Fragments were separated on an ABI3100 capillary sequencer (Applied Biosystems, USA). Microsatellite markers for individual genotyping were amplified in a multiplex PCR reaction using the same protocol as above.
SNPs were genotyped using the Sequenom MassARRAY™ iPlex™ chemistry (Sequenom, San Diego, California, USA)  or Amplifluor™ SNP Genotyping Systems (Serologicals Corporation, USA) [33, 34] according to the recommendations of the manufacturers.
The principles of pooled DNA genotyping have been described before [21, 22]. Briefly, an equimolar amount of DNA was taken from every individual from one sample set and put into a single tube (one pool). The DNA from every sample was quantified using the PicoGreen ds DNA Quantification Reagent (Molecular Probes, Eugene, Oregon, USA). Each pooled DNA was amplified in triplicate and the PCR products were separated on an ABI3100 capillary sequencer. If any of the replicates gave more than 3% difference in any one allele, the experiment was repeated, or this replicate excluded. The peak heights of the signal representing each of the alleles were measured using GENOTYPER 2.5 software. The peak heights were used to estimate the relative allele frequencies in the pools, assuming that peak height is directly proportional to the concentration of that allele in the pool. Thus, the allele frequencies were estimated from the peak height for each allele divided by the sum of the peak heights for all alleles. We didn't apply correction for stutter bands and differential amplification, as described before .
For analysis of pooling results we used the CLUMP program  to compare the predicted allele frequencies of probands with the non-transmitted parental alleles (our pseudo-controls). For each marker we ran 1000 simulations and estimated the nominal p-value. Individual genotyping results were analyzed with the Extended Transmission Disequilibrium Test (ETDT) for multiallelic markers and the Transmission Disequilibrium Test for SNPs [36–38]. Analyses of linkage disequilibrium (LD) between SNPs (r 2 and D') were performed using Haploview [39, 40].
Pooling and individual genotyping results for microsatellite markers showing suggestive evidence for association.
Position in bp (UCSC May, 04)
Number of repeats
Individual genotyping p-value
Summarized individual genotyping results for SNPs and TDT results.
Total sample (N trios)
Individual genotyping total sample p-value
Individual genotyping 300 trios pool p-value
Individual genotyping in the total sample of 615 parent-proband trios confirmed one SNP in each region as significantly associated with SZ: rs7715300 (p = 0.001) and rs6897690 (p = 0.032). One more SNP in each region approached significance (p = 0.06). Rs7715300 is located between two genes: 30 kb 3'of TGFBI (transforming growth factor, beta-induced, 68 kDa) and 39.5 kb 5' of SMAD5 (SMAD, mothers against DPP homolog 5). Rs6897690 is located in intron1 of SPRY4 (sprouty homolog 4). All markers were in Hardy-Weinberg equilibrium. No significant linkage disequilibrium (LD) between the studied SNPs was observed (r 2 < 0.1). In order to provide the full information, we also present in the Table the p-values provided by the 300 trios in the original pools used for the microsatellite study, which demonstrate some fluctuation from those in the full sample.
Pooling is a fast and cost-effective approach used for systematic screening of complex disease associations, where many markers need to be genotyped in large samples. The power of pooling is approximately the same as individual genotyping and has proved to be an accurate method for detecting allele differences using microsatellite markers or SNPs, with a mean error of <2% for the pooled analysis [21–23, 42, 43].
In the present study we wanted to investigate the chromosomal region 5q31.1-q32 between markers D5S666 and D5S436 because it includes five reported linkage regions for SZ [5–9] and a number of good candidate genes. We started with saturating the 14 Mb region with microsatellite markers using DNA pooling, an approach proven successful for other complex diseases [25, 44, 45]. Based on the knowledge that the average length of LD around microsatellite markers is approximately 100 kb, we covered the region with 140 microsatellite markers of which 90 turned out to be polymorphic and gave reliable traces. We found a suggestive evidence for association with SZ for seven markers. Individual genotyping confirmed two of them to be significantly associated: D5S2017 (p = 0.004) and IL9 (p = 0.014). Marker IL9 is located within the IL9 gene and is the same microsatellite previously reported by Schwab et al  to produce a max LOD score of 1.8 in 14 SZ pedigrees.
We then performed fine mapping with SNPs within an area of ~400 kb surrounding the IL9 and D5S2017 microsatellites. In order to improve our chances of identifying significant association, we chose to genotype only promising SNPs from another pooling study in SZ on Illumina HumanHap550 arrays . That study used an extended sample of 574 Bulgarian trios and was finished just at the time when we were selecting SNPs to follow-up (all 300 trios used in the microsatellite stage of the study were part of that larger sample). We selected 8 SNPs (4 for each region), that had shown nominal significance (p = 0.05) in pools hybridised on Illumina arrays. This considerably reduced the cost of our project, as the two regions contained nearly 200 SNPs on the Illumina arrays (and even more in the HapMap database). Individual genotyping of the SNPs in the full 615 parent-proband trios confirmed the pooling results for two SNPs: rs7715300 (p = 0.001) and rs6897690 (p = 0.032). SNPs rs17169180 and rs7443175 showed only trends toward association (p = 0.06). To show the validity of the pooling approach, we also report data on the 300 trios in our original pools in the Table 2. However our aim was to identify susceptibility loci for SZ which is clearly best achieved by genotyping as large a sample as possible, therefore the full data on the 615 trios constitutes our primary analysis.
Clearly, the strongest signal identified in our study is at rs7715300, located between two genes: 30 kb 3'of TGFBI and 39.5 kb 5' of SMAD5 (Figure 2). We cannot speculate whether rs7715300 is a causal variant itself, or it is in LD with a marker within any of the two genes (however no SNP within the two genes is in high r 2 with rs7715300). A Bonferroni correction for 90 microsatellite markers and 8 SNPs, would however make this result not significant (p = 0.1) indicating the need for replication in other samples, especially from populations that have shown linkage to this region. No Bulgarian sample has been investigated for linkage to SZ, and we don't know if linkage to 5q is present in that population.
SMAD proteins are intracellular mediators in the bone morphogenetic proteins (BMPs) signalling pathway. SMADs are the only known BMP receptor substrates capable of signal transduction. Activated by phosphorilation SMADs move to the nucleus where they assemble complexes directly involved in the control of gene expression. SMAD5 is expressed in early differentiated granule neurons of the developing cerebellar cortex. BMP2 signalling activity is directly mediated by SMAD5 and its expression is sufficient to trigger granule cell precursor differentiation [46, 47]. It can be speculated that dysfunctional SMAD genes can cause disruption in BMP signalling pathway which is directly linked to the transcriptional control of the oligodendrogenesis .
Defects in TGFBI are the cause of several types of corneal dystrophies. TGFBI protein binds to different types of collagen and is important for the cell- collagen interactions in cartilage. The function of the protein and its low expression in the brain make TGFBI a less plausible candidate gene for schizophrenia .
The second region includes rs6897690 and rs7443175, which are located in intron 1 and intron 2 in the SPRY4 gene. Indeed the signal we identify is not strong, but it points to a plausible candidate gene. SPRY4 encodes for a SPROUTY protein, which is widely expressed, including in brain. SPRY4 was identified as the evolutionarily conserved target of WNT/β-catenin signalling pathway involved in numerous processes during vertebrate CNS development.
It is possible that we have missed to detect a stronger association within the 5q31-33 region, as our coverage with microsatellite markers was still quite sparse and relied on assumptions about strong uniformly-spread LD. We have 13 intervals with intermarker distance over 300 kb due to the absence of polymorphic markers, or failure of the chosen ones to be analysed confidently. Therefore, we might have missed to detect the presence of a disease susceptibility locus due to the large distance exceeding the assumed presence of LD between any markers.
In summary, our screening of the 5q31-32 linkage region using DNA pooling implicates three possible candidate genes: SMAD5, TGFBI and SPRY4. While none of the findings would survive correction for multiple testing, these results were found by a systematic analysis of one of the strongest schizophrenia linkage regions, making them good candidates to study in other samples, especially samples that have shown linkage to this region.
= single nucleotide polymorphisms
= linkage disequilibrium
= genome-wide association
= transmission disequilibrium test
= extended transmission disequilibrium test
= Central Nervous System.
This work was funded by the International Centre for Genetic Engineering and Biotechnology, Trieste grant to the Department of Medical Genetics, Sofia (ref CRP/BUL04-01), a Schizophrenia programme grant from the MRC and by an NIMH Silvio O Conte Centre For The Neuroscience of Mental Disorders Grant to the Department of Psychological Medicine, School of Medicine, Cardiff University (ref G9309834). The collection of families in Bulgaria was funded by the Janssen Research Foundation, Belgium/USA. We would like to thank the psychiatrists in Bulgaria involved in the recruitment and the patients and their families who took part in the study.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.