Application of microarray outlier detection methodology to psychiatric research

Background Most microarray data processing methods negate extreme expression values or alter them so that they do not lie outside the mean level of variation of the system. While microarrays generate a substantial amount of false positive and spurious results, some of the extreme expression values may be valid and could represent true biological findings. Methods We propose a simple method to screen brain microarray data to detect individual differences across a psychiatric sample set. We demonstrate in two different samples how this method can be applied. Results This method targets high-throughput technology to psychiatric research on a subject-specific basis. Conclusion Assessing microarray data for both mean group effects and individual effects can lead to more robust findings in psychiatric genetics.


Background
Currently used psychiatric nosology is based on a compilation of clinical symptoms into categories based primarily on symptom clustering and course. Diagnostic systems such as the current version of the DSM, allow for certain flexibility in the definition of diagnostic categories, with no assumption that each category of mental disorder is a completely discrete entity. As such, individuals diagnosed under a certain diagnostic class are not clinically homogeneous, there are no clear boundaries between classes, and different classes are not mutually exclusive. It is, therefore, unrealistic to expect that all subjects diagnosed with a given disorder will share a common psychopathological process, which would be associated with a common underlying biological process.
Most research efforts in psychiatry are directed towards the identification of group effects, negating the fact that significant etiological heterogeneity may exist. This limitation is particularly true for microarray research in psychiatry, where gene expression from different brain areas has been assessed comparing all affected subjects to nonaffected subjects. A possible solution would be to carry out studies aiming at the identification of biologically meaningful effects focusing on single individuals or subgroups. This approach would mimic fruitful efforts in the identification of genetic factors underlying heterogeneous conditions such as, among others, spinocerebellar ataxia [1] and Alzheimer's disease [2].
Microarray data from psychiatric subjects can be investigated for individual or subgroup effects that may be of genuine biological significance. Specifically, we hypothesize that specific subgroups can be identified through microarray data screening for extreme expression values. Three previous studies have described how microarray data can be investigated on a subject-specific basis when analyzing data from cancer studies [3][4][5]. We suggest here that microarray experiments using brain-gene expression levels from psychiatric experiments (e.g. schizophrenia group vs non-schizophrenia group) can utilize microarray data not only for group mean effects (i.e. standard microarray analysis) but also, whenever possible, should evaluate expression levels by individual subjects.
Most microarray projects in psychiatry involve examining more than one neural region [6][7][8][9]. Specifically, researchers studying gene expression in brain tend to analyze more than one brain region on more than one array. This leaves researchers with gene expression data from multiple brain regions for each subject. This offers the possibility of using data from different arrays as confirmations of findings which may appear to be outliers. Any outlier on one chip that is also an outlier on a different chip may represent a valid finding. Human brain has been categorized in two main ways: either by gross anatomical structure, the preference of imaging specialists, or by Brodmann region, the preference of neuro-anatomists. Irrespective of how the human brain is categorized anatomically, what is less obvious is whether gene expression varies between neighboring regions. Two recent, replicating studies suggest that brain gene expression of samples from the same individual, while non-identical, are biologically-related [9,10]. This sharing of similar expression patterns across samples allows for the exclusion of extreme values in the microarray data due to noise. This provides a potential to validate microarray data, particularly for variables that are extreme values, across chips. Those extreme values present across chips for the same probe set and the same individual may represent a true biological effect.
We have designed a method that can assess extreme values and utilizes expression data across chips from the same individual. The method will allow for the detection of any subjects that have probe set values that differ drastically from a mean and outside of a certain threshold (e.g. Standard deviations from a mean). This method, termed Extreme Values Analysis (EVA), takes into account the complex and heterogeneous nature of psychiatric diseases. We illustrate this approach in two different situa-tions. First, in a publicly available sample where extreme values were simulated, and second, in a sample of subjects who died by suicide and sudden death controls. EVA functions to screen microarray data individual-by-individual in search for any extreme values that may signify some abnormality.

Methods
We used a publicly available data set to first evaluate EVA. The data set comprises 9 subjects screened over 20 regions of the CNS and can be found here [9]. In this dataset, one of the authors (AB) inputted simulated extreme values for two subjects across all CNS regions for one randomly selected probe set each. The expression values were multiplied by 4 (1+0.25Z) for one of the probe sets and by 0.25 (1+0.25Z) for the other, where Z is a standard normal random variable which was generated independently for each CNS region. Another of the authors (CE), blinded to the experimental manipulation, applied the method to detect the inputted value(s). The rationale for this experiment is to determine if EVA can detect an artificially generated extreme value in one probe set from > 11 million different data points (10 subjects X 20 regions X~55,000 probe sets).
We assessed EVA in a second sample that comprised a group of suicide completers and sudden death controls. Information on the subjects, clinical variables, and microarray data quality of the suicide and sudden death controls can be found in Sequeira et al., [6].
EVA can be applied under a control:experimental design (suicide and sudden death controls example) or in a one sample design (CNS screening example). We describe the control:experimental setting, although the description applies also to the one sample design. In the one sample design, all individual values are compared to the group to which they belong.
The mean and standard deviation (SD) of log 2 -transformed expression level is computed in the experimental group for all probe sets in every region. In our example, this was done in 2 cortical brain regions from suicide subjects. Log transformation stabilizes the variance, allowing comparison of SD across probe sets. After this step, the probe sets with the highest SD values were selected for further analysis. We used only those probe sets in the top 5% of SD values. We reasoned that these probe sets likely have individual values that are extreme, which accounts for a high SD value.
To buffer against detecting mathematical artifacts, EVA selects only those probe sets with high SD values in all regions. In our example, we selected probe sets that were common across both cortical regions. Next, we assess whether the same subject is responsible for the high SD value across brain regions. We set as criterion for an extreme expression a value of ± 3 fold greater than the mean expression level of the specific probe set among the control group (in our example, the sudden death controls). This approach operates on the assumption that neighboring brain regions are not discrete units and that gene expression should not vary widely from one cortical region to another. Even if brain region-specific expression is more common, it is not expected that a subject that is an outlier in one region is necessarily an outlier in a neighboring region. In other words, extreme values that are detected across multiple brain regions are more likely to represent real biological phenomena. We note that this method is conservative.
Individual expression values also have to be outside of 1.5 SD's of the control group, after having met the above criteria. While we selected 1.5 SD's from the mean of the opposite group, this number can be changed depending on the false discovery level acceptable to the experimenter. Manipulating the SD threshold establishes the false discovery rate (FDR) of the experiment.
The statistical significance of each identified outlier can be assessed by computing the p-value of the subject's expression values for a probe set in the multiple brain regions compared to the multivariate distribution of the expression values in the control group. The null distribution of the log 2 -transformed probe set-specific expression is estimated by fitting a normal mixed model where the subject effect is random. Letting, X ij be the probe set-specific expression of the i th subject in the j th brain region, and Y ij = log 2 (X ij ), this model has the form: where μ j is the region-specific mean expression, a i is the subject random effect and e ij is the residual. We fit such a model by restricted maximum likelihood (REML) using the maanova package [11] for the R statistical software [12]. The subject random effect captures the expected correlation between expression in different brain regions of the same subject. The p-value for the observed deviation of the log 2 -transformed expression level of the i th subject from the mean of the group of reference , j = 1,..., J (or observed fold change on the original scale) is given by which we compute using a multivariate t-distribution with the covariance matrix estimated under the normal mixed model.

EVA in partially simulated data
We tested EVA in a sample data set that included 20 different CNS regions [9]. This dataset was selected because A) we could test how the method works with the RMA algorithm and B) we could demonstrate the method in a onesample case.
We began by computing the standard deviation (SD) for three of the 20 CNS regions described in this data set. The probe sets in the top 5% of SD values was selected for each of three regions and those probe sets that were common to all regions were selected. Five hundred forty-five probe sets were common to all three regions. Next, we screened for any individual values that lay outside of ± 1.5 SD's and was three-fold different from the mean. There were 14 genes that were found to be 3-fold greater than the mean and outside of +1.5SD's and 245 values that were three fold below the mean and outside of -1.5SDs. Each of these values was then cross-referenced across all 20 CNS regions. Two probe sets were found that met all criteria (1 above the mean for one subject and one below the mean for another subject). These were the probe sets that had been artificially altered ( Figure 1).

EVA in real microarray data
To demonstrate this technique, we used a sample that included 6 control subjects and 8 suicide completers with microarray data from BA 8/9 and BA 11. We first screened all expression values for MAS 5.0 present/absent call leaving 14,896 probe sets in BA 8/9 and 14,412 probe sets in BA 11. We next calculated a standard deviation for all probe sets from suicide subjects. This was done using log 2transformed expression values. We then selected the probe sets with the highest SD values (top 5%) from both BA 8/9 and BA11.
Any probe sets that was identical to both BA 8/9 and BA 11 after SD filtering was selected. There were 180 probe sets that were common to both regions. Next, to account for the variability of expression in control values, we searched the data for any suicide data point greater than 3-fold from the control mean and outside of 1.5 SD's. We reasoned that an extreme value across all regions for the same subject(s) could represent a biologically relevant event.
Beginning in BA 8/9, we filtered the 180 probe sets for those probe sets from suicide completers outside of 3-fold from the control mean. There were 20 probe sets where X ij (a particular expression value from a particular subject in a given brain region) was not outside of 3-fold from the control mean. From the 160 remaining probe sets, 108 probe sets were also outside of 1.5 SD's in BA 8/9. Probe sets from BA 11 were then filtered for these probe sets.
Microarray expression values demonstrating two subjects who passed all EVA criteria in a one sample case Figure 1 Microarray expression values demonstrating two subjects who passed all EVA criteria in a one sample case. A) Extreme low expressor (light blue trace) compared to other subjects in the same sample for one probe set identified across multiple CNS regions. Each subject is represented by a letter (A, B, C...). B) Extreme high expressor (purple trace) compared to other subjects in the same sample for one probe set across multiple CNS regions.   Table 1 lists the probe sets across the eight suicide completers and the individual p-values associated with each subject. From 108 probe sets that passed all EVA criteria in BA 8/9, 69 passed all EVA criteria in BA 11. Included in this list of probe sets are a number of genes that have been linked to suicide before including the FGF family [13], NTRK2 [14], and members of the ubiquitin family [15]. Of note, from the table, is that for a number of probe sets there is more than one subject who has an extreme expression value reaching a significance level.

Discussion
The extreme values analysis, or EVA, is a method to detect individual or subsets of outliers for a given probe set in microarray experiments. The rationale for this type of experiment is that psychopathology is not necessarily group specific but more likely sub-group or subject specific. The method outlined here uses log-transformed data to determine which probe sets have the highest variance and screens out those probe sets with little variation. This step is intended to select those probe sets with values that deviate widely from the mean. Next the method compares individual data points to a control mean, and searches for any 3-fold changes. Selected values also have to be outside of 1.5 SD's from the mean. These values were considered extreme expression values. These extreme expression values were next verified in one other cortical region to determine if they were extreme expression values in other cortical regions as well. We reasoned that the use of other cortical regions functioned as replicate experiments and enforced the finding.
We also evaluated this method in a one sample case after inputting artificial values for one probe set across all CNS regions in RMA data. EVA was able to detect the inputted value; the only difference between the control:experimental case and one sample case is the mean value used: In the one sample case the mean used includes the extreme value while in the control:experimental case it does not.
The use of multiple cortical regions as within-subject replicates is a way to detect true extreme expression values in individual subjects. Operating under the assumption the gene expression in one cortical region is generally similar in neighboring cortical regions, we propose that different chips for the same subject can be used as replicate experiments, if probe set outliers on an individual specific basis are being investigated. If an observed outlier is a real biological event, it is very probable that the same subject on the same probe set will also be an outlier in a neighboring region. Consider, for example, the family with a deletion in the MAOA gene [16]. Had this family undergone postmortem microarray analysis as a part of a larger sample of subjects, EVA would have detected the MAOA decrease in expression whereas microarray analysis using mean group effects would not have. Using multiple brain regions as replicates does undermine the idea that gene expression is different across different brain regions, which it is [10,17]; however, it means that if an effect is detected, it is likely real and robust.

Comparison to PPST method
The PPST method [5] counts the number of subjects in both control and experimental group outside of the 95 th percentile of the opposite group. The FDR is therefore controlled by altering the percentile threshold. EVA uses the SD from the opposite group and counts the number of subjects that are outside a given SD value (± 1.5 SDs in this study). Selecting more stringent SD values allows for direct manipulation of the FDR. In this study a liberal cutoff was chosen (outside of 1.5SD's). The FDR among the detected outliers could be estimated from the p-value of the subject's expression values using standard methods such as that of Reiner et al. [18] Comparison to COPA method Cancer outlier profile analysis (COPA) is another outlier detection that has proved fruitful in the past [4]. This technique normalizes all probe sets (one sample design) and  Table 1: HG-U133 plus 2 probe sets that met all EVA criteria. Numbers represent p-values generated for each probe set across each subject (S). Subjects who met EVA criteria have p-values underlined. Note that p-values are generated from the number of SD's from the mean, therefore some subjects with very small p-values may be outside of a given number of SD's but < 3-fold different than the mean. (Continued) calculates the 75 th , 90 th , and 95 th percentiles for each probe set and rank-orders them by percentile score. A prioritized list of probe set with some subjects that have extreme expression values is then investigated. Tibshirani and Hastie [3] introduce the outlier-sum statistic in their paper to improve on the COPA method. Their method differs from COPA by the standardization procedure of each probe set expression level using the median and median absolute deviation.
There are some caveats to be aware of before proceeding with this approach to screen microarray data. Firstly, the method is very conservative and likely has a high beta error rate. It is very likely that there were a number of true positives that were not detected because of the rigidity of the design. Some parameters may need to be adjusted to allow more probe sets to pass filtering (e.g. top 10% of SD values instead of the top 5% being used). Second, this method has the disadvantage of requiring a number of replicates per individual, a component that could be costprohibitive. Third, the method can only be used to study genes whose expression levels are similar across brain regions. Finally, we note that all probe-level microarray algorithms dampen extreme values at the scanner. This method is conservative and could only be used to investigate extreme values after initial processing.
Our view for this technique is as another analysis technique to further explore microarray data, in conjunction with more mainstream techniques [19]. This method, termed Extreme Values Analysis, can detect extreme differences in gene expression on a subject-by-subject basis from microarray data across different chips. The method uses high-throughput technology in a non-biased way to understand psychiatric disease for each subject investigated.