Assessing the validity of the ICECAP-A capability measure for adults with depression

Background Effectiveness and cost-effectiveness are increasingly important considerations in determining which mental health services are funded. Questions have been raised concerning the validity of generic health status instruments used in economic evaluation for assessing mental health problems such as depression; measuring capability wellbeing offers a possible alternative. The aim of this study is to assess the validity of the ICECAP-A capability instrument for individuals with depression. Methods Hypotheses were developed using concept mapping. Validity tests and multivariable regression analysis were applied to data from a cross-sectional dataset to assess the performance of ICECAP-A in individuals who reported having a primary condition of depression. The ICECAP-A was collected alongside instruments used to measure: 1. depression using the depression scale of the Depression, Anxiety and Stress Scale (DASS-D of DASS-21); 2. mental health using the Kessler Psychological Distress Scale (K10); 3. generic health status using a common measure collected for use in economic evaluations, the five level version of EQ-5D (EQ-5D-5L). Results Hypothesised associations between the ICECAP-A (items and index scores) and depression constructs were fully supported in statistical tests. In the multivariable analysis, instruments designed specifically to measure depression and mental health explained a greater proportion of the variation in ICECAP-A than the EQ-5D-5L. Conclusion The ICECAP-A instrument appears to be suitable for assessing outcome in adults with depression for resource allocation purposes. Further research is required on its responsiveness and use in economic evaluation. Using a capability perspective when assessing cost-effectiveness could potentially re-orientate resource provision across physical and mental health care services. Electronic supplementary material The online version of this article (doi:10.1186/s12888-017-1211-8) contains supplementary material, which is available to authorized users.


Background
For health services to prioritise mental health interventions, both effectiveness and cost-effectiveness are key considerations. Regulatory bodies such as the National Institute for Health and Care Excellence (NICE) in England and Wales have evolved methods for establishing cost-effectiveness across both health and mental health interventions with the use of economic evaluations [1]. The recommended approach in measuring quality of life patient benefits for economic evaluations has been to use generic, health focused patient reported outcome measures like EQ-5D [2,3]. Once patients have completed such measures and population preferences have been attached to patient reported health states, health related quality of life (HRQoL) scores are used to generate quality-adjusted life years (QALYs), a composite measure accounting for the benefits of improved morbidity and reduced mortality [4].
Depression results in the second largest disease burden in terms of disability-adjusted life years (DALYs) globally (DALYs are a similar, but slightly different measure to QALYs [5]), with the condition also a major determinant in suicide and associated with ischemic heart disease [6]. Therefore, it is crucial for any provision of health care that instruments used in deciding how to allocate resources are able to assess the impact of depression on a person's quality of life and the improvements from providing treatment and preventing the illness. Although recent findings show HRQoL measures fare better in patients with depression compared to other mental health groups such as patients with bipolar disorder and schizophrenia, there is an acknowledgement of the restricted coverage of themes important to all mental health patients in the most commonly used HRQoL measures: EQ-5D and SF-6D [7].
There are many critiques of QALYs. These include worries about the impact on equity of using QALY maximisation as the objective for economic evaluation [8][9][10] and concerns about whether QALYs are fully able to capture broader patient value from health care [11][12][13][14]. This latter topic has received particular focus in the past decade with unease about a sole reliance of just health related aspects of quality of life, emanating from a number of different research groups [15][16][17]. Nevertheless, the ability to use QALYs to compare across very different health situations has meant that it has retained its position of prominence, in particular QALYs using EQ-5D [18]. A recent study into the methods for assessing cost-effectiveness by NICE found that the current standard approach leads to a negative impact on QALYs forgone in a number of areas including forgone gains from mental health services [19].
An alternative, relatively new approach for assessing outcomes for individuals has emerged which focuses on people's capabilities [20]. The capability approach, developed most notably by Nobel prize winning economist Amartya Sen, is primarily concerned with the evaluation of individual advantage based on a person's ability to achieve 'functionings' in life that are valuable to them [20]. Examples of 'functionings' in the capability approach range from basic attainments such as nourishment to more complex attainments such as having self-respect. In essence, the capability approach attempts to provide a more encompassing picture of individuals when assessing the impact of policy decisions upon them, than approaches that prefer to focus on "objects of convenience", such as income when assessing household and national wellbeing [20].
The capability approach has been promoted by bioethicists [21,22], philosophers [23,24], and health service researchers [25,26] who see the approach as a method to assess impacts on an individual in an alternative evaluative space than that of preference-based HRQoL measures used to produce QALYs. This perspective has also proved helpful in conceptualising mental health problems. Hopper [27] found the capability perspective accommodating when trying to determine social recovery for patients with schizophrenia [27]. More recently, other researchers have assessed Community Treatment Orders for patients with mental health disorders (predominantly schizophrenia) through a capability lens by developing a 16 item questionnaire [28] based on a list of ten central human capabilities conceptualised by eminent philosopher Martha Nussbaum [29].
The capability approach has gained recognition by NICE, which has added capability measures to its reference case for conducting economic evaluations where non-health benefits are likely to accrue [1]. Currently, two capability measures are recommended by NICE, an instrument for social care known as Adult Social Care Outcomes Toolkit (ASCOT) [30], and one of a generic family of instruments developed originally from a grant called Investigating Choice Experiences for the Preferences of Older People (ICEpop), that has developed capability (CAP) instruments for Adults (ICECAP-A), Older people (ICECAP-O) and a Supportive Care Measure (ICECAP-SCM) for those in need of palliative care (www.birmingham.ac.uk/icecap), with the earliest developed measure (ICECAP-O) currently recommended. If the same principles of comparability across diseases are used for justifying the need for generic HRQoL instruments when allocating resources across a health service, a similar argument can be made for using the generic ICECAP-A, the only index of capability currently developed that could be applied across a broad range of patient groups from populations of adults aged 18 years and older [31].
As a newly developed measure, evidence for the measurement properties of the ICECAP-A in patient groups is so far limited. It is important that any new outcome measure, including those used in economic evaluation, demonstrates satisfactory measurement properties [32]. One test of whether the measure is valid (i.e. it measures what it purports to measure) is "construct validity". This involves testing mini-theories that are developed to explain the relationship between the characteristic of interest (in this case, capability) and other relevant characteristics of the individual [33].
The overall aim of this work is to assess the validity of the ICECAP-A measure in a sample of individuals suffering from depression. Validity in this study is assessed in three ways. First, concept mapping, a qualitative construction of pathways or constructs between capability and depression related items is developed based on a synthesis of qualitative research in a mental health population [34]. The concept mapping framework then allows for the formation of hypotheses about where a relationship is expected between capability and depression related items. Second, discriminant validity, the ability of a measure to discriminate between identifiable groups, is tested by using information about the ICECAP-A overall score in relation to clinical cutoffs on two condition-specific questionnaires. Finally, multivariable regression is conducted to compare the explanatory power of mental health items and overall scores attached to the measure of capability (ICECAP-A) and the NICE recommended HRQoL measure for economic evaluation, the EQ-5D-5L [3]. This enables an assessment of how a generic measure of capability compares with the recommended HRQoL measure in economic evaluation. Even though the measures are developed using different underlying constructs, regression analysis provides a means of comparing to current methodology recommended by NICE for economic evaluation. We hypothesise that more capability (through ICECAP-A) than health status (through EQ-5D-5L) will be explained by depression related items, because we expect the items on a capability measure capturing "psychosocial well-being" will have more in common for individuals with depression than the "physical functioning" focused EQ-5D [35]. The three methods of validity assessment use two measures closely related to depression, the depression anxiety and stress scale (DASS-21) [36] and the Kessler (K10) psychological distress scale [37].

Data set and collection
The study uses data from a Multi Instrument Comparison (MIC) cross sectional survey of individuals in eight health categories (http://www.aqol.com.au) conducted between November 2011 and May 2012. The survey was conducted in six countries of which the four English speaking nations (Australia, Canada, United Kingdom and United States) were chosen for the present analyses (as the choice of these countries did not raise issues associated with translation of the measures). As well as a healthy population, seven broad health condition populations were targeted. This study utilises data from individuals who reported depression as their primary condition, as well as the healthy population, from the four English speaking nations [38].
The MIC survey was conducted online with panel members using a global survey company, CINT Pty Ltd. The personal and medical details recorded by the company were used to recruit individuals from the health condition groups. Quota sampling was conducted for health condition groups to reach a total number in each health condition group irrespective of age, sex and education. The survey sought a sample of 150 individuals in each health condition area per country to ensure statistical power. Individuals were asked to complete a relevant disease specific questionnaire (two instruments for those with a diagnosis of depression) to confirm the existence of the illness and to measure its severity. Choice of clinical measures for each health condition was informed by expert opinion of commonly used measures in the source country of the MIC survey (i.e. Australia).
Ethics approval was obtained from Monash University Human Research Ethics Committee (MUHREC Approval CF11/1758: 2011 00074). At the start of the MIC survey, a Participant Information and Consent form was provided. Proceeding with the survey was deemed as provision of consent by the participant.

Measures
Capability measure ICECAP-A The descriptive system for the ICECAP-A instrument was developed in the UK using qualitative methods [31]. The five capabilities captured by ICECAP-A are phrased in terms of "being able to be" or "can have" and are stability ("settled and secure"), attachment ("love, friendship and support"), autonomy ("independent"), achievement ("achieve and progress") and enjoyment ("enjoyment and pleasure") (ICECAP-A available from www.birmingham.ac.uk/icecap). These five items make up the battery of questions in the ICECAP-A and they attempt to capture broad concepts related to people's capability to live a life that they value. The stability item concerns informants' desire for continuity in their lives when it came to friends, work and location. The attachment item emphasises how informants placed emphasis on love, support and social contact. The autonomy attribute came out of a desire to be one's own person and not a liability to others. The achievement attribute represents how informants placed value on moving forward in life and attaining their goals. Finally, the enjoyment attribute tried to capture everyday enjoyment that people want to be able to have in their lives [31]. The attributes aim to capture the capabilities that people value as distinct from the factors that determine capability (e.g. income, health) [31]. Each capability item has four levels of responses. Once a response level for each attribute is selected by the patient, general population values can then be attached to the patient's current state. The use of population values, as opposed to patient values, is the preferred approach for health guidance bodies such as NICE [1]. The general population approach to valuing states means that all possible individual states across a health service can be, in theory, compared to one another. UK population values have been developed for the ICECAP-A [39]. The measure is anchored at 1 (full capability) and 0 (no capability). Values can range from 0 to 1.
The ICECAP-A has been validated for the general adult UK population [40]. Additionally, a qualitative 'thinkaloud' investigation has been conducted to assess the ease of interpretation and completion of the attributes of capability captured on the ICECAP-A [41]. Content validity [42] and test-retest reliability of the instrument in the general population have also been examined [43]. Responsiveness of the ICECAP-A has been assessed in an osteoarthritis patient group [44]. As of January 2017, the ICECAP-A was registered for use in 64 studies across 10 countries: Australia, Canada, China, Denmark, Ireland, Mexico, the Netherlands, New Zealand, the UK and the USA. Originally developed in English, studies have or are attempting to translate the measure into seven other languages (Chinese, Dutch, French, German, Spanish, Turkish and Welsh).

Condition-specific measures
Depression Anxiety and Stress Scale 21 item (DASS-21) The Depression Anxiety and Stress Scale 21 item (DASS-21) is an abbreviated version of the original DASS 42 item questionnaire [36]. It is a set of three self-report scales (7 items per scale) that was designed to measure the negative emotional states of depression, anxiety and stress over the past week. The DASS-21 has been validated in clinical [45] and non-clinical [46] samples. The depression sub-scale (DASS-D), which is of primary interest in this study, contains seven questions relating to the dimensional syndromes associated with depression: dysphoria (state of unease), hopelessness, devaluation of life, self-depreciation, lack of interest/involvement, anhedonia (inability to experience pleasure) and inertia [36]. Each item is rated on a 4-part Likert scale, with scores ranging from 0 to 3 per item, with higher scores reflecting more severe states. Five severity levels (normal, mild, moderate, severe and very severe) have been developed  and those who score more highly display a number of characteristics, such as being self-disparaging; dispirited, gloomy and blue; convinced that life has no meaning; pessimistic about the future; unable to experience enjoyment and satisfaction; unable to become interested or involved; and slow, lacking in initiative [36].
Kessler Psychological Distress Scale (K10) The Kessler Psychological Distress Scale 10 item (K10) was developed for the US National Health Interview Survey to identify people with a serious mental illness [37]. The questionnaire consists of 10 items measuring 'psycho-social distress' [37]. There is also a shortened 6 item version (K6) [37]. There are five response levels for each question, with scores ranging from 1 to 5 and higher scores representing higher psychological distress. There are four severity levels on the K10 (well, mild depression and/or anxiety disorder, moderate depression and/or anxiety disorder and severe depression and/or anxiety disorder). While primarily used in non-clinical settings [47,48], the K10 has also been tested for detecting depression and anxiety disorders in primary care [49].

Generic health status measure
EQ-5D-5L The EQ-5D-5L is an updated version of the original EuroQol (EQ-5D) generic measure of health status [2,50]. The instrument is recognised as one of the most widely used generic measures of health status. It has been translated into 169 different languages. EQ-5D data are routinely collected in some countries as well as being used to inform healthcare decision-making [51]. The instrument has five dimensions (mobility, self-care, usual activities, pain/discomfort and anxiety/depression). Each dimension has five response levels [3]. Preliminary general population values for the EQ-5D-5L have been developed from the three level version for a number of countries [52], with research ongoing for new value sets for the five level version. The measure is anchored at 1 (best health) and 0 (death), with a minimum value of -0·594 for the UK value set. Values thus range from -0·594 to 1. There has been one major study to date assessing the validity of EQ-5D-5L compared to the original instrument across a variety of chronic conditions including depression [53].

Concept mapping from condition-specific and capability items
Concept mapping is an approach used to organise and analyse qualitative research findings and has been previously applied in mental health research [54]. We built a conceptual framework to establish 'pathways' between depression items and capability items and used concept mapping to organise qualitative information into testable quantitative hypotheses based on instrument completion by people with depression. Data collected on capability (using ICECAP-A) and depression (using DASS-D and K10 condition-specific measures) from people reporting depression as their primary condition were then used to test these hypotheses and establish how depression was likely to influence capability. The pathways between condition-specific items and ICECAP-A items were informed by a systematic review of qualitative research in people with mental health problems [34], to identify how mental health affected quality of life and a person's capability in people with depression. In their study, Connell and colleagues [34] identified six themes of quality of life that were of importance for adults with mental health problems: well-being and ill-being; control, autonomy and choice; self-perception; belonging; activity; hope and hopelessness [34]. These six themes each comprised 3 to 10 attributes that contributed to these respective aspects of quality of life. For example, the self-perception domain consisted of four related attributes: selfidentity/sense of self; self-efficacy; self-esteem; and selfacceptance/self-stigma [34]. To establish whether these intermediary mental health themes identified by Connell et al. [34] related to capability and depression, we used judgement informed by expert opinion (SB, WK). Based on the qualitative findings from the development of the ICECAP-A, three to four key attributes per item were identified from Connell and colleagues' synthesis and attached to the five capability items by PM, HA and JC (see Table 1) [31]. The items of DASS-D and K10 were also compared to Connell's 6 themes. PM used these qualitative themes to develop hypotheses about likely relationships between the ICECAP-A and the condition-specific instruments.
First, the ICECAP-A, DASS-D and K10 items were individually compared to the quality of life themes and attributes of the themes for people with mental health problems identified by Connell and colleagues [34].
Step 1 assigned pathways between attributes from quality of life themes and items on measures that were determined by PM to be related to one another.
Step 2 drew upon the mappings established in Step 1 from Connell's mental health themes to the three instrument items, by then mapping between the ICECAP-A items with the seven DASS-D items and the ten items on the K10. A pathway between a depression item and a capability item was hypothesised when a depression item and capability item were linked to at least half of intermediaries in common as identified by Connell and colleagues. So, for example, an ICECAP-A item that conceptually mapped onto two of the six themes identified by Connell and colleagues [34], say 'well-being, ill-being' and 'belonging' , would have an expected relationship with a depression item on DASS-D mapped onto three mental health themes, where two of these three themes included the same two themes as mapped onto the ICECAP-A item (80% or 4 of 5 themes in common). A relationship between items on different measures would not hold if the depression item, mapped onto three themes, only had one of two themes in common with the capability item (40% or 2 of 5 themes in common) (detailed explanation in Additional file 1). This process was conducted initially by PM and checked by SB and WK to assess whether the face validity of these relationships would hold in a clinical  sample of patients with depression. The associations between all ICECAP-A items and items on the conditionspecific questionnaires were assessed using chi-squared tests (categorical variables) to assess these hypothesised pathways.

Discriminant validity
Discriminant validity is used to test the ability of the ICECAP-A to differentiate individuals classified with different existing severity levels on the condition-specific questionnaires. It was expected that the ICECAP-A index score would differ significantly between those classified in different depression severity levels on the DASS-D (5 levels) and the K10 (4 levels). The difference in means between severity groups is tested using the Mann-Whitney U test.

Multivariable regression analysis
Finally multivariable regression analysis, using Ordinary Least Squares (OLS), was used to assess the degree to which overall capability (ICECAP-A) and health utility (EQ-5D-5L) was explained by the items of (i) the DASS-D; (ii) K10 and (iii) K6 (derived from the K10 responses).
It was expected that, if capability is better at capturing the depression-related attributes of adults with depression than the EQ-5D-5L, the resultant R 2 would be higher for the overall capability models than for the health status models. The R 2 coefficient is a statistic that shows, in this case, how well depression-related items explain capability or health utility on a 0-1 scale, where 1 is capability fully explained by depression-related items and 0 represents no relationship between depression items and capability.

Results
In total, 617 individuals who reported depression as their primary condition were included in the analysis. Table 2 shows the distribution of the ICECAP-A responses across the sample. As a comparator, responses from a representative healthy population sample across the same four countries (n = 965) are also included. In terms of percentages, a higher proportion of the depression sample recorded responses on the lowest two levels of capability across all five ICECAP-A items and a higher proportion of the healthy population recorded responses in the top two levels. Table 3 presents socio-demographic information for this sample. The average ICECAP-A score (0.637) is  considerably lower for people with depression in this sample than that from a general UK adult population (≈0.83) [39,40], and also lower than the "healthy population" sample collected simultaneously in the same dataset across the four countries (≈0.89) [55].

Concept mapping
In Fig. 1, based on the qualitative mapping exercise (see Additional file 1 for full details), the expected pathways between the five ICECAP-A items and the 7-item depression scale on the DASS-21 questionnaire are shown. Out of the 35 potential mappings from the depression scale onto the ICECAP-A attributes (7x5 = 35), 32 were expected to produce significant relationships (Table 4). A similar process was conducted for the K10 and ICECAP-A attributes. From the 50 potential relationships, 40 were predicted through the mapping process to produce significant relationships ( Table 5). All 85 potential relationships were tested using chi-squared tests, with all 85 relationships resulting in significant results reporting a p-value of less than 0.05. In Tables 6 and 7, the models of capability and health status from items on the DASS-D (Table 6), K10 and K6 (Table 7) are presented. The R 2 as a method of explaining capability and health status reveals that the condition-specific items contribute more to the estimation of capability scores/values than health status scores/values. In particular, 42.7% of total capability value can be explained through the DASS-D items, whereas the same DASS-D items explains 30.2% of variation in health status in terms of the EQ-5D-5L overall index score.

Discriminant validity
Scoring less well on the anhedonia and selfdepreciation items on DASS-D appears to contribute the most to both poorer capability and health status values. Scoring lower on the hopelessness item and higher on the devaluation of life item also makes an important contribution to the capability score. The highest and lowest scores on the inertia item significantly contribute to overall health status scores.

Discussion
This paper has attempted to assess the validity of a capability instrument for use in economic evaluation for adults with depression, using two depression-related instruments and a widely used generic health status instrument. All conceptually mapped relationships between condition-specific items and capability attributes were found to hold in statistical tests. The ICECAP-A scores were also able to differentiate individuals with depression of different severity, using clinical cut-offs from the normal to mild levels and moderate to severe levels. Further, the depression specific items appeared to more fully explain the ICECAP-A capability values than the health status values obtained through the EQ-5D-5L, as demonstrated in the multivariable regression analysis. Taken together, these analyses suggest that using capability measures provides one appropriate means of assessing depression. This study represents the first attempt to assess the validity of using the ICECAP-A instrument specifically for individuals with depression. The data are drawn from four different nations, suggesting these results are generalisable, at least across English speaking countries. The study does have some limitations. The data collected contain only information on individuals at one point in time, so the responsiveness of the ICECAP-A instrument could not be tested here. Due to the quota sampling strategy undertaken in this study, it is unclear whether the individuals analysed here are representative of people with depression in the four countries included, but they do provide a sample with a broad range of severities of depression, which is helpful in assessing validity. This study does not conceptually map onto the EQ-5D items, as the main aim of this study was to validate the use of ICECAP-A in people with depression. The performance of EQ-5D in patients with depression and mental health more generally has been comprehensively studied previously [7]. Indeed, the key mental health themes we conceptually map the ICECAP-A items on are influenced by a study that was conducted in response to perceived inadequacy of commonly used generic health measures like EQ-5D [34]. Nonetheless, it is a limitation of this study that we did not perform a similar concept mapping exercise onto EQ-5D items.
One difficulty with a newly developed questionnaire outside traditional outcome measures, focusing on broadly defined concepts such as capability, is that there is no "gold standard" against which to assess criterion validity. Measures of capability wellbeing, like ICECAP-A, ultimately have a different normative basis to measures of health, like EQ-5D, and any decision about what measure to use to guide healthcare resource allocation must take this normative basis into account, as well as the performance of the measure. A slight concern could be raised with the similar scores seen for mild and moderate depression in adults, although there is no reason why changes in capability wellbeing should be linearly related to depression. Indeed, an alternative interpretation could be that these data indicate where the greatest impacts on capability are faced by people with depression: between no depression and mild depression; and between moderate depression and severe depression. Further research is required on the responsiveness of the ICECAP-A measure for assessing potential benefits of new interventions for patients with depression. It would also be of benefit to assess the validity of the ICECAP-A against alternative commonly used measures of depression (e.g. Beck Depression Inventory [56]) and against other aspects of mental health. Based on the findings from the multivariable regression analysis presented here, further research into the extent to which generic capability and health status instruments provide complementary information would be useful [35,57]. Finally, how measures of capability wellbeing, such as the ICECAP-A, can be used in economic evaluation to aid decision-making has been subject to recent research [58,59]. Even though progress has been made with bodies like NICE now recommending capability measures for economic evaluations in social care and in the Netherlands for long-term conditions [60], more research is necessary for such measures to be used both within and beyond social care and also in applied economic evaluations. There appears to be growing recognition of the need to move beyond the current QALY approach in health economics [17], with the capability approach offering one viable alternative.

Conclusion
The ICECAP-A measure of capability wellbeing appears to be suitable for assessing outcome in adults with depression. This study offers a number of possible policy implications. If a capability perspective were adopted in economic evaluations across a health service, this study suggests that changes in depression are more likely to be captured and valued using a generic capability instrument than the commonly used generic health status instrument, the EQ-5D (Tables 6 and 7). Provision for services preventing and treating depression, and mental health service provision more generally, is felt to play a secondary and unequal role to physical health service provision globally [61]. Using a capability perspective when assessing cost-effectiveness could potentially reorientate resource provision across physical and mental health care services.