The reliability and validity of the E-ADL-Test were examined using a sample of 139 residents in 5 nursing homes in Bavaria who were recruited to investigate the MAKS therapy, a non-drug intervention for dementia patients . For this reason, dementia patients were excluded if they were blind, deaf, bedridden, or severely dependent on care (3rd care level). With respect to validity, this limitation of the current study might lead to an underestimation of the validity parameters because the variance was upward limited.
The results differed in part from the results of the initial validation study . The correlation coefficients with the NOSGER subtests largely agreed, but the correlation coefficient with the MMSE, by contrast, was considerably lower, which was probably largely due to the different inclusion criteria (i.e., no exclusion criteria in the initial study depending on severity of illness).
To date, the everyday practical capabilities of dementia patients have mostly been assessed by others (e.g., family members or nurses). This procedure must, however, be critically considered, especially for research but also for care. On the one hand, assessments by others are always dependent on the rater. There is evidence of systematic over- or under-estimation, particularly of the everyday practical capabilities of dementia patients [7–9]. A second point is the impossibility of blinding in data recording. This is, however, one of the main quality criteria in clinical research and can only be achieved using independent performance tests. The great advantage of assessments of others in contrast to performance-based tests is that performance-based tests cover only one time point and therefore are dependent on context variables and the condition of the patient on that day. Nevertheless, performance tests for recording cognitive capabilities have already become completely established; the greater the focus falls on independence in everyday practical capabilities, the greater need there is for development of performance tests in this area.
Existing tests offer a broad selection of relevant tasks mostly for patients with milder forms of dementia, but they have methodological deficiencies. The validation studies are based on small samples of 12 to 27 dementia patients who usually suffered only from mild to moderate dementia [10–12], or else they were not developed specifically for dementia patients but rather for “elderly people” as were the revised DAFS  and the ILS .
Compared to other validation studies of performance tests for dementia patients, we thus have a relatively large sample [11, 32, 33]. The calculated Cronbach’s alpha of .70 is in the range of other procedures (TEFA subscales: .61-.94 ; DAFS: between .23 and .67; revised: .67 ). Apart from the TEFA, all other performance tests that are used to examine the everyday practical capabilities of dementia patients take from 40 [10, 33] to 90  minutes and are thus hardly suitable for use in routine care or in research. With an average performance time of 8 minutes , the E-ADL-Test is the only procedure that is characterised by great test economy with similarly high reliability (see above) and validity (correlations of the TEFA and DAFS with other ADL assessments: TEFA: .41 ; DAFS: .61 ). Administration of the TEFA  takes only about 15 minutes and can thus be considered economical; but its very high correlation with the MMSE (.90) and its low correlation with an instrument assessed by others for recording everyday practical capabilities in dementia patients  give rise to the assumption that it measures to a greater degree the cognitive component of everyday practical capabilities than the everyday practical capabilities themselves. The development and validation of the E-ADL-Test thus closes an important gap in current research on performance tests by measuring everyday practical capabilities in dementia patients.
Detailed item analysis shows, however, that the E-ADL-Test delineates in particular the deficits in ADL of persons with moderate to severe dementia. The difficulty indices of the 5 E-ADL-Test items range from .34 to .77 for patients with severe dementia, and from .67 to .88 for patients with moderate dementia; the discrimination power lies between .33 and .61. For patients with mild dementia, the items tend to be too easy (.83-.91). This is reflected in a low discrimination power (−.05-.37). In particular, Item 1 seems to be too easy for all degrees of severity. This leads to a ceiling effect, which is also reflected in the poor discrimination power of Item 1 for mild dementias. All other items’ severity indices decrease with an increase in the degree of dementia severity. With a recommended discrimination power of r > .3 , their discrimination power indices are acceptable, given a dementia severity that is at least moderate.
Development of more difficult tasks to expand the E-ADL-Test for valid measurement of deficits in ADL in mild dementia would be a meaningful extension. Because all items are too easy for mild dementia, the development of an additional test for mild dementias could be useful. In any case, one should pay special attention to a good operationalisation of cognition and IADL to diminish the amount of overlap. Additionally, an examination of the interrater reliability of the E-ADL-Test is still missing; this should be implemented in the next study. Another limitation of the E-ADL – as for every performance test - is that the personal interests and habits of the dementia patient, the variation across time, and variances in the social and physical environment are not covered. Research on ADL instruments should focus on these problems .
The hypotheses on criterion-related validity were supported – the results of the E-ADL-Test showed a correlation of .54 with care level after 22 months. In addition, persons with and without an increase in care level differ significantly in their 12-month difference of the E-ADL score. A decrease in the E-ADL-Test thus has high predictive power for an increase in the need for care. This makes it possible to identify and provide support for persons who are at risk for a future decrease in everyday practical capabilities. In addition to the obvious benefits for the residents resulting from a maintenance of ADL functioning [see ], this also results in cost savings for the health care system. In the German health care system, the “care level” is assessed by trained raters of the MDK (Health Insurance Medical Service) who visit the patient at home or in a nursing home. They assess the amount of time each individual needs for help with 21 specifically described tasks in the areas of personal hygiene, nutrition, mobility, and housekeeping. The length of time of help needed in minutes defines the care level as I, II, or III. The strength of the criterion “care level” is that it was registered absolutely independently of our study and of the E-ADL measurement. The limitations are that it is ordinally scaled and there are no empirical data available with regard to interrater reliability.
The hypotheses on construct validity were confirmed for 8 of the 10 available parameters. Convergent and discriminant validity were verified by high correlation coefficients with other ADL/IADL measures and by low correlation coefficients with measures of mood and disturbing behaviour. This is independent of whether the reference test was a performance test (ADAS-cog) or an instrument for the assessment by others (NOSGER and BI). Only the correlation coefficient with the Barthel-Index was lower than expected. This may possibly be explained by the fact that each test had a different focus/emphasis: The Barthel-Index mainly records fundamental ADLs (e.g., urinary control, bed-chair transfer), and half of these ADLs depend on the ability to walk or stand. For the E-ADL, only the upper extremities need to be used, which was also an inclusion criterion. Therefore, people with a low Barthel-Index were able to perform some E-ADL tasks. This hypothesis should be revised in future studies.
The second outlier refers to the NOSGER subscale “Social Behaviour”. Although, as expected, the E-ADL-Test was not correlated with mood and disturbing behaviour (the confidence intervals included 0), there was a correlation of .39 with the “Social Behaviour” subscale, which is as high as the correlations with the cognitive parameters. This becomes feasible considering the inter-correlations of the NOSGER subscales. Here, the subscale “Social Behaviour” was more highly correlated with the subscales “Memory”, “ADL”, and “IADL” (each at .6) than with the subscales “Disturbing behaviour” and “Mood” (each at .2). Thus cognitive and everyday practical capabilities appear to be included in the subscale “Social Behaviour”, too.
In addition, the E-ADL-Test enables a moderately reliable differentiation of the severity of the dementia syndrome, which concurs with the classification criteria of the dementia syndrome in ICD-10  and DSM-IV-R , which enclose the decline of IADL/ADL functioning as an important criterion for the differentiation of the severity of the dementia syndrome.
Validation in a representative sample of an expanded E-ADL-Test including items with lower difficulty indices (i.e., items that are more difficult for patients with mild dementia) is thus recommended for future research.