We conducted an extensive psychometric evaluation of the GHQ-28 subscales in a large community multi-ethnic maternal cohort in the UK. Our results are important because this is the first large scale investigation in both a maternal population and in South Asian women, where there is uncertainty about measurement equivalence of mental health
[33–36]. For each subscale an item reduction strategy was necessary to fit all our defined subpopulations, and there was evidence of differential item functioning in many of the pairwise comparisons. Exploration of the factor structure indicates that this was caused by variation in the concepts being measured, with the most obvious differences visible between groups of women who completed the questionnaire in English and non-English. For example, Anxiety and Insomnia in the Urdu respondents and Severe Depression in the Punjabi respondents did not appear to be related to the same item clusters as women of any ethnicity completing the questionnaire in English. The implication is that the meaning of the underlying concepts for some items differs according to language of administration and between ethnic groups; this may be related to any number of factors such as acculturation, translation or cultural differences in concept or interpretation. Our goal was to define a measurement model to compare symptom severity in each domain across subgroups; our findings indicate that due to lack of invariance we cannot recommend such comparisons across this cohort.
Research indicates the concept (if not the nomenclature) of postnatal distress has recognition and relevance globally e.g.
[37, 38]. However, internal construction of causality, symptom experience and illness resolution can vary greatly between cultures
. For example, in one UK study, women originating from the Punjab who had ‘life troubles’ reported symptoms of sadness and grief that tallied with the notion of depression, but conceptualised their problems as an illness manifesting physically as ‘heavy in the heart’
. Notably, there have been few studies exploring the meaning of depression in pregnant, not postnatal, South Asian women.
Given such potential for variation, it is perhaps unsurprising that we found differences in the attribution of a specific symptom to particular construct of mental distress between the groups in our sample. Our results indicated several interesting points between the relationship of symptoms and mental health during the maternal period, and also between ethnic groups.
Irrespective of cultural background, it is common for people with depression to initially present with somatic symptoms e.g.
[14, 41]. Somatisation of psychological distress is of interest in maternal populations where new and perhaps unfamiliar bodily changes coincide with any onset of distress. Such simultaneous physical and hormonal changes may complicate self and clinical recognition of potential affective distress. For example, somatic dysfunction might be construed as causative of distress, distress could be overshadowed by physical symptoms that may be considered to have more serious implications for the baby’s health, or body symptoms may simply co-exist alongside with distress. Neither is the concept of somatisation uni-dimensional. Simon et al.
 define three different presentations; patients with psychological distress who initially present somatic symptoms, those distressed who present with medically unexplained somatic symptoms and those who present somatic symptoms and deny psychological distress. Bhui et al.
 adds a fourth; presentation of somatic symptoms made significantly made worse by feeling low, stressed or anxious. The topic has generated much theoretical interest for South Asian cultures where somatisation has sometimes
, but not universally
[13, 41], been reported to be more frequently endorsed as a symptom of depression. Indeed some data indicate that initial presentation with somatic symptoms might be a function of the patient-doctor interaction rather than a cultural phenomenon
Our data show that broadly, across the maternal population, two concepts related to somatic symptomology were evident; the first comprised of generalised somatic symptoms and the second of symptoms related to the head. A principle components evaluation of a non-maternal European sample with rheumatoid arthritis
 found a similar split in structure, but a study of pregnant Nigerian women
 reported that all seven somatic items clustered together. Although there are differences in methodology, this indicates that the split between general and specific somatic symptoms may be related to factors other than maternity, or female gender, and in our study these elements appear stable regardless of ethnic background, language of administration or pregnancy/postnatal status. We suggest that this hypothesis is tested in other population samples.
Anxiety and insomnia subscale
Antenatal anxiety commonly co-occurs with depression and is antecedent to postnatal anxiety and depression
[9, 44–46], and our EFA implicated this factor as the largest symptom cluster for most groups. However, the invariance testing indicated some significant problems with comparisons involving the Urdu group, which the EFA revealed was likely due to a split in the underlying concept.
Social dysfunction subscale
For all groups except the Urdu language groups, the concept of Social Dysfunction was related to all its hypothesised items, confirming the findings in a Nigerian antenatal sample
. Excluding comparisons with the Urdu group, this factor also appeared to indicate pairwise invariance. However, the clinical relevance of this subscale is not well researched
, which limits its relevance in distinguishing psychiatric morbidity from the range of normal changes during pregnancy.
Severe depression subscale
As noted, anxiety and depression are commonly co-morbid and these two GHQ-28 factors are unsurprisingly correlated, although the depression subscale has been found to garner some additional information
. Here it is noteworthy that this subscale measures severe depression with three questions relating to suicidal ideation; notably absent are enquiries into dysphoric mood. Measurement of such a dimension is of interest inter-culturally; Bhugra and colleagues have enumerated that in London, young South Asian women are at higher risk for presenting with attempted suicide than White women
[48, 49] with cultural and family conflict the actual and perceived causes of such attempts
[48, 50]. However, the utility of this subscale to measure the concept of suicidality might be limited, as although for the antenatal English language and Urdu respondents the questions seemed unified and the factor important, this was not the case in the Mirpuri group, and there was evidence of invariance between groups. Furthermore, only one of the suicidality questions (item 25) was invariant between groups. Model estimation difficulties that may have been related to low endorsement of these severe items precluded analysis of postnatal data.
After reducing items to create factors which appeared to have reasonable fit across all the subpopulations, the iterative process of invariance testing revealed systematic differences in how the different subpopulations rated themselves on the measurement scales. We would be able to solve the problem of systematic differences in scale response if, as in most CFA analyses, there were just two populations to compare; but due to both cultural and language variation we identified five distinct groups, and as the DIF varied within sub-group pairs, systematic correction is unfeasible. While some of the differences are small and would have a negligible impact on mean scores, some differentials are up to half a point (on a four-point scale) which has the potential to lead to spurious conclusions after comparison.
Interpretation of the analysis into any systematic differences in structure between antenatal and postnatal administration were limited due to difficulties with model estimation, particularly in the Severe Depression subscale.
Strengths and limitations
Our sample is representative of the maternal community in Bradford, and included a large number of South Asian minority women for whom relatively little is known about mental health in pregnancy. Further, we applied a rigorous approach to our analysis; however, our study does have some shortcomings.
Ethnic and cultural classifications
We used limited classifications of ethnicity which may be overly general
[14, 51] and can only serve as a proxy for more defined distinction of culture and custom
. Such is the compromise when epidemiological rather than anthropological methods are used to classify people
. Analysing at the level of an arbitrary subgroup may lead to category fallacy
 with loss of subtle individual effects such as acculturation and financial and social resources; indeed there may be as much variation within groups as there is between. In particular, we combined the group of women of all Other ethnicities into one heterogeneous reference group, which limits decomposition by ethnicity and culture. We split our sample into five (BiB cohort) and four (BiB1000) reference groups by ethno-cultural classification and language of questionnaire, although women within these groups were likely to have different levels of acculturation. Without a specific measure of acculturation it is impossible to assess values, beliefs, expectations, norms and practices of the new culture and the extent of their acquisition, and how much retention of original culture is still present
. Acculturation may have affected how women answered the GHQ-28 questions, for example it may have imposed some unmeasured variation in our estimates, or it could have potentially explained some of the differences we found.
Ethno-cultural instrument adaptation
The participatory translation process was rigorous and the translated versions had good semantic, content and conceptual equivalence to the English instrument. An Urdu translation of the GHQ-28 assessed in a bilingual (English and Urdu) population in Pakistan found reasonable semantic, conceptual and scale validity
. However, in our study there was no formal assessment of criterion or technical equivalence, necessary to establish whether the GHQ-28 performs similarly across cultures regardless of administration verbally or via paper, or whether the interpretation of measurement of mental health remains the same when compared to norms of both cultures
. We did not know which women were bilingually fluent, if we did we could have used their selection of language as a basis to disentangle any variance associated with the translation from that of cultural differences in interpretation and differential item functioning
. Of note, there may have been unmeasured administration bias as the administration to non-English speakers was verbal and responses that were potentially audible to family members or friends accompanying the women may have affected the way these women answered the questions.
As discussed in the analysis section, we treated Likert scale data as continuous for the purposes of analysis. Whilst this has the advantages that we described in that section it is problematic in that DIF cannot be described in terms of the scoring of the scale
[28, 29]. However, such an approach may be more appropriate for determining invariance in the underlying psychological constructs. In CFA, one item in a factor must be held constant (mean of 0 and variance of 1), and because this item’s variability is not calculated, it can lead to spurious conclusions of invariance if the reference item is the source of DIF
. This may be relevant as we held the first item in any one cluster as the reference item. In addition, the lack of standardised diagnostic interview to confirm or exclude depression is a limitation to the interpretation of assessment of relevance of the subscales to clinical criteria in this maternal population.