Reliability and validity of the Japanese version of the INSPIRE measure of staff support for personal recovery in community mental health service users in Japan

Background Supporting personal recovery in people with mental health difficulties is central to mental health services. This study aimed to develop the Japanese version of INSPIRE and Brief INSPIRE measure of staff support for personal recovery and to evaluate its reliability and validity. Methods A questionnaire survey was conducted from October to December 2015. The authors asked users to participate in the survey of 14 community mental health services in the Kanto region of Japan. The service users completed self-administered questionnaires that include the Japanese version of INSPIRE, the Recovery Assessment Scale, the Client Satisfaction Questionnaire, the patient version of the Scale to Assess Therapeutic Relationship in Community Mental Health care and the Short Form Health Survey. Internal consistency was assessed using Cronbach’s alpha coefficient, and test-retest reliability was assessed using the intraclass correlation coefficient (ICC) and weighted kappa. Convergent validity was examined by assessing correlation with other scales. Factor validity was evaluated by exploratory factor analysis (EFA) with generalized least-squares mean and oblimin rotation. In addition, confirmatory factor analysis was used to check the fitness of the factor structure models derived from the EFA. Results A total of 195 out of 212 users gave written informed consent and participated in the study. Data from 190 respondents were analyzed (response rate 89.6%). INSPIRE, Brief INSPIRE, and the subscales all showed Cronbach’s alpha coefficient over 0.78. ICC and weighted kappa derived more than 0.92 for subscales and Brief INSPIRE. These numerical values indicated good reliability. The convergent validity of Brief INSPIRE and the subscales was significantly positively correlated with the other scales. Different from the previous study, the factor structure was extracted using EFA. Both factor structures were checked by CFA, but the degree of fitness index was not good in either. Therefore, the factor analysis did not show goodness of fit. Conclusions This study found the Japanese version of INSPIRE and Brief INSPIRE to be reliable and valid for use among community mental health service users in Japan.


Background
In 1998, recovery began to be introduced in Japan [1][2][3]. Supporting personal recovery is a key aim for mental health services in many countries, including Japan [4,5]. In 2004, the Headquarters for Mental Health and Welfare of the Ministry of Health, Labour and Welfare proposed a conversion from hospital-based medical treatment to community-based care as a reform vision for mental health welfare [6]. Various community-based mental health services are currently being provided in Japan (e.g., day care, psychiatric home-visit nursing, transition support for employment) [7,8]. Around the same time, a program of self-empowerment, Tojisya Kenkyu was initiated at Urakawa Bethel House in Japan [9]. Additionally, programs aimed toward recovery similar to those developed in the west, such as Assertive Community Treatment, Wellness Recovery Action Plan, Illness Management and Recovery, and Recovery College are widely practiced in Japan [10][11][12][13][14]. For the evaluation of practice, the psychometric properties of the Japanese version of recovery-related scales in which service users measure their personal recovery and service providers measure their recovery knowledge and attitude have been verified [15][16][17][18][19][20]. As described above, programs including personal recovery are being used. There are teaching materials and lectures that include the idea of personal recovery, such as Assertive Community Treatment training for multi-disciplinary, psychiatric nursing process training using the strength model, and lectures on recovery-oriented services by service users and peer staff [21][22][23]. However, there is no specialized training program for mental health practitioners to support service users' personal recovery in Japan. Although there are various services, the recovery orientation of the service has not been evaluated. CHIME was developed as a conceptual framework of personal recovery consisting of five factors: connectedness; hope and optimism about the future; identity; meaning in life; and empowerment (giving the acronym CHIME) [24]. Importantly, it is valuable to evaluate mental health services from a service user's point of view [25]. Thus, IN-SPIRE was developed and is the only measurement tool that fits well with CHIME, is sufficiently reliable and validated, and evaluates the recovery orientation of the service by the service user [26][27][28].
This study aimed to examine the internal consistency, reliability, test-retest reliability, and convergent validity of the Japanese version of INSPIRE and Brief INSPIRE, and the construct validity (including the factor-based validity) of INSPIRE for users of community mental health services in Japan.

Methods
The aims and procedures of this study were approved by the Ethical Committee of the Graduate School of Medicine at the University of Tokyo in Japan (submission #10890-(1)).
We explained verbally and in writing the aims, procedures, the voluntary nature of participation, anonymity, and assurance that there was no disadvantage in nonparticipation. Written informed consent was obtained from all participants.

Participants
We sent requests to organizations in the Kanto area between August and November 2015. Organizations were selected via opportunistic sampling. A questionnaire survey was given to users with mental health difficulties at 14 community mental health service centers in the Kanto region of Japan. The services offered included rehabilitation, employment transition support, support for continuous employment, and community activity support centers. A self-administered questionnaire was provided to service users in their center, and questions covered the level of support they receive from mental health care workers. To maintain confidentiality, workers in the center did not collect the questionnaires; instead, the researcher collected the completed questionnaire directly, and/or gave the participant a stamped return envelope and instructed them to send back the questionnaire to the researcher. Data was collected from October to December 2015.
A service user met the following inclusion criteria during the survey period: (1) use of community mental health services, (2) age 18 years or older. Before explaining the request for participation in the study, we asked the service center's staff whether there were users who felt burdened by the explanation. For example, on the day of study, it was determined that users who were over 18 years of age and at the service center, but who were in the resting room were not asked to participate in the study. The first to fifth authors brought questionnaires to the service centers and explained this study. After written informed consent was obtained, the questionnaire was answered by the participants themselves. The directors of the two centers agreed to cooperate in the re-test to verify the test-retest reliability. We asked users at the two centers to fill out the questionnaire 2 weeks after the first response.

Development of the Japanese version of INSPIRE
There are two versions of INSPIRE: a 27-item full version (INSPIRE) and a 5-item short version (Brief INSPIRE). IN-SPIRE is a 27-item assessment of a service user's experiences of the professional support they receive in their recovery [27]. INSPIRE is comprised of two subscales: a 20item support subscale and a 7-item relationship subscale. The items in the support subscale include five domains: connectedness (items S1-S4), hope (items S5-S8), identity (items S9-S12), meaning and purpose (items S13-S16), and empowerment (items S17-S20). Support items, e.g., "I feel supported by other people," are first rated as to whether the individual considers it important for their recovery (yes/no). For the items that are important (i.e., yes), the amount of support that they received from a mental health worker is then rated on a 5-point Likert scale, ranging from 0 (not at all) to 4 (very much). Relationship subscale items, e.g., "I feel listened to by my caseworker," are rated on a 5point Likert scale, ranging from 0 (strongly disagree) to 4 (strongly agree). No total score is given for INSPIRE; instead, a score is calculated for each subscale, with 20 items for support, and 7 items for relationship. The support subscale can be calculated when at least one item is rated, and the relationship subscale only when all 7 items are done. Scoring for both subscales comprises the mean of all the item ratings and is converted to a percentage, ranging from 0 to 100. Higher scores for the support subscale indicate more support, and higher relationship subscale scores indicate greater helpfulness for personal recovery.
Brief INSPIRE is a 5-item tool used to assess the level of support for recovery provided by a mental health care worker. Five items are selected from each of the different domains (connectedness, hope, identity, meaning and purpose, and empowerment). Unlike INSPIRE, Brief INSPIRE does not ask if each item is important to the respondent and can be calculated only when all items are answered. Both versions of INSPIRE were found to be valid and reliable in the UK [27].
The English version of the 27-item full version was translated into Japanese, in accordance with guidelines for the translation and adaptation of psychometric scales [29], and was done in five steps. (1) Forward translation: after permission to translate and use of INSPIRE was obtained from the original authors. Three of the researchers carried out independent translations of INSPIRE from English to Japanese. (2) Reconciliation: 11 mental health professionals (nurses, psychiatrists, and social workers) who are also mental health researchers, reached a consensus on a draft Japanese translation of INSPIRE that best reflected the literal and conceptual content of the original INSPIRE.
(3) Back-translation: a professional translator, a native English speaker, who did not know about the original English version of INSPIRE, did a back-translation of the Japanese version into English. (4) The back-translation was reviewed and harmonized: the original INSPIRE developer and the authors reviewed the back-translations against the source instrument and ensured the translation was conceptually equivalent to the original. Additionally, the original INSPIRE developer suggested the service user rate the person they see most often or have the closest working relationship with, instead of naming a particular worker, because most mental health service users in Japan do not have a dedicated provider. We adopted this suggestion for improved ease of use in the Japanese population.
(5) Cognitive debriefing and finalization: two people who were using community mental health services in Japan tested INSPIRE, and the research team confirmed the level of comprehensibility and cognitive equivalence of the translation. The wording of INSPIRE is deliberately generic [27]. Similarly, the Japanese version of INSPIRE was translated to be simple and comprehensible, so as to enhance the usability of INSPIRE across other services in all processes (Additional file 1).

Recovery assessment scale
Mental health recovery was assessed using the Recovery Assessment Scale (RAS), a 24-item measure of selfreported recovery. Items such as "I have a desire to succeed" are rated on a 5-point Likert scale, ranging from 1 (strongly disagree) to 5 (strongly agree). The total score ranges 24-120, with higher scores indicating greater recovery [30,31].

Client satisfaction questionnaire
Service user satisfaction was assessed using the 8-item Client Satisfaction Questionnaire (CSQ-8), with higher scores indicating greater satisfaction. Items such as "How would you rate the quality of service you have received," are rated on a 4-point Likert scale, ranging from 1 (poor) to 4 (excellent). The total score ranges from 8 to 32 [32].
Patient version of the scale to assess therapeutic relationships in community mental health care The relationship between service user and mental health worker was assessed using the patient version of the Scale to Assess Therapeutic Relationships in Community Mental Health Care (STAR-P), a 12-item self-report measure of level of relationship. Items such as "My clinician speaks with me about my personal goals and thoughts about treatment" are rated on a 5-point Likert scale ranging from 0 (never) to 4 (always). The total score ranges from 0 to 48, with higher scores indicating a stronger relationship between the user and the worker [33]. There was no Japanese version of STAR-P. After obtaining permission from the original authors, we replaced the word 'clinician' with 'staff worker' and translated STAR-P into Japanese.

Short form health survey
Quality of life (QOL) was assessed using Short Form 8 (SF-8), an 8-item measure of general aspects of health-related QOL. The SF-8 is comprised of a physical component summary (PCS) and a mental component summary (MCS). Summary scores are calculated in accordance with scoring rules [34], with higher scores indicating better QOL.

Demographic variables
Demographic variables included sex, age, marital status, cohabitation, diagnosis (schizophrenia, mood disorder, or other), and length of current service use.

Statistical analysis
Responses with at least one completed item were included in the analysis. For reliability, the internal consistency for the support subscale as a whole and for each of its five domains, as well as for the relationship subscale and Brief INSPIRE, was assessed using Cronbach's alpha [37]. Alpha coefficients greater than or equal to 0.70 were considered satisfactory [38].
Test-retest reliability was assessed in a subsample of respondents who were surveyed a second time 2 weeks later. The support subscale and items from Brief INSPIRE with changes in ratings of importance were tested by examining the kappa statistic, and the Likert scale scores were tested by examining the weighted linear kappa for each item. The relationship subscale was examined to calculate a weighted linear kappa for each item. A kappa statistic less than or equal to 0.20 was considered as indicating poor to slight agreement, 0.21-0.40 fair agreement, 0.41-0.60 moderate agreement, 0.61-0.80 substantial agreement, and greater than 0.80 almost perfect agreement [39]. The total scores of the support subscale, the relationship subscale, and Brief INSPIRE were examined to calculate the intraclass correlation coefficient (ICC). An ICC that was greater than 0.80 was considered as indicating excellent agreement [40].
The convergent validity of the support subscale was assessed by correlation with the CSQ-8J and RAS, the relationship subscale by correlation with STAR-P and RAS, and Brief INSPIRE by correlation with CSQ-8J, STAR-P, and RAS using the Pearson product-moment correlation [41]. Pearson's correlations were classified as poor (≤ 0.40), moderate (0.40-0.70), or strong (> 0.70).
For further analysis, the factor validity of each of the two INSPIRE subscales was assessed using participant responses that answered yes to all 20 items in the support subscale and all seven items in the relationship subscale. The factor validity of each subscale was assessed using exploratory factor analysis (EFA). The suitability of the data for factor analysis was first examined using the Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy and Bartlett's test of sphericity, with a p value less than 0.05 indicating significance for each subscale. The KMO indicator was then compared with adequacy standards (0.80 < meritorious) [42]. The EFA for each subscale among all the respondents of a Likert scale was performed using generalized least-squares means and an oblimin rotation, which used eigenvalues > 1.00 to determine the number of factors. Because INSPIRE was developed based on a theory, the generalized leastsquares mean method was selected [43]. In addition, oblimin rotation was chosen due to the presumption that the five domains of the support subscale are correlated with one another. Confirmatory factor analysis (CFA) was implemented to test the fitness of the data to the factor structure extracted from the EFA. Based on theoretical notions, five underlying factors were expected. A 5-factor model was then defined as model 1, and the EFA extracted in this study model was defined as model 2. Model fit was assessed using a combination of fit indices, including the ratio of χ2 to df (≤ 2), the Goodness of Fit Index (GFI; > 0.95), the Adjusted Goodness of Fit Index (AGFI; > 0.95), the Comparative Fit Index (CFI; > 0.95), the root mean square error of approximation (RMSEA; < 0.07), and the Akaike Information Criterion (AIC), where smaller is better [44,45]. Statistical analyses except for CFA were conducted using SPSS, version 22.0 for Windows, and CFA was conducted using Amos, version 22. Two-tailed values of p less than 0.05 were considered statistically significant.

Respondent characteristics
Five of the 195 responses were excluded because there was no response to the INSPIRE questionnaire. The remaining 190 responses were included in the analysis (89.6% of the initial 212 service users). We returned to the centers 2 weeks later and asked 15 users who agreed to fill out the re-test questionnaire. The 10 users who were able to connect to the initial questionnaire were included in the test-retest reliability analysis. The sociodemographic data and average score for each scale of the respondents are shown in Table 1. There were more male than female respondents. The age range was 18-75 years (mean ± SD 42.5 ± 11.5), and more than 70% had never been married. Half of the participants (50%) had received a diagnosis of schizophrenia, and 30% had been diagnosed with a mood disorder. About 65% of participants had used the current service for longer than 1 year.

Descriptions of ratings in INSPIRE
The item-level ratings of the support subscale and the relationship subscale are shown in Table 2 and Table 3, respectively. Four support subscale items (S4, S9, S11, and S12) were rated as not important for recovery by more than 15% of respondents. More nonresponses were found in item S12 ("Having my ethnic/cultural/racial/identity respected"), compared to other items. There were few unanswered items in relationship subscale. Except for two INSPIRE items (S9 and S12), all other items were found to have ceiling effects. Most participants agreed to receive recovery-oriented services from their workers regarding these items, and the item score distribution was skewed disproportionately higher.

Test-retest reliability
The ICC for the total scores of the support subscale, relationship subscale, and Brief INSPIRE were 0.95, 0.96, and 0.92, respectively. The kappa statistic of change in rating of importance (yes/no) for the support subscale and Brief INSPIRE were 0.48 and 0.39, respectively. The weighted linear kappa for the support subscale, relationship subscale, and Brief INSPIRE were 0.96, 0.92, and 0.96, respectively.

Convergent validity
The total scores of the support subscale, relationship subscale, and Brief INSPIRE were significantly positively correlated with STAR-P, CSQ-8J, and RAS (Table 4).

Factor validity
To assess the factor validity of the INSPIRE support subscale, the 106 responses in which yes was given to all 20 items in the support subscale (50.0% of the initial 212) were utilized. Similarly, for the relationship subscale, the 183 responses that answered all 7 items in the relationship subscale (86.3% of the initial 212) were utilized.
For the support subscale, the KMO score was 0.93 and Bartlett's test of sphericity was significant (χ2 = 1544.57, df = 190, p < 0.001), thereby indicating that the factor analysis was appropriate. EFA was conducted among the 106 respondents and yielded three factors based on the  criteria of eigenvalues greater than 1.00 (Table 2). Three factors were considered to represent connectedness, hope and internal value, and meaning, purpose, and empowerment in life. Items S3, S8, S9, S12, S13, and S19 had factor loadings greater than 0.30 for two factors ( Table 2). CFA showed both models were useful for nested fit (χ2 /df = 1.5 for model 1 and 1.6 for model 2), but other goodness-of-fit indexes were nonoptimal fits for each model. In a comparison of the two models, model 1 was better than model 2 in scores for all indexes, especially for the AIC of model 1, which was smaller than that for model 2 ( Table 5). The KMO score for the relationship subscale was 0.87, and Bartlett's test of sphericity was significant (χ2 = 765.83, df = 21, p < 0.001), showing factor analysis was appropriate. EFA was conducted among the 183 respondents, and a 1factor solution explaining 59.3% of the variance was found. One factor (eigenvalue 4.5) was found (Table 3).

Discussion
This study evaluated the reliability and validity of the Japanese version of INSPIRE and Brief INSPIRE among users of community mental health services in Japan. IN-SPIRE and Brief INSPIRE were found to have high internal consistency reliability, test-retest reliability, and convergent validity, as well as reasonable factor validity, among users of mental health services in Japan. However, the CFA did not show goodness of fit. Bold figures indicate factor loading > 0.3 a items of Brief INSPIRE (5-items; S1, S8, S10, S14, S17) b Respondents that answered "Yes" to all the 20 item in support sub-scale

Reliability of the INSPIRE and brief INSPIRE
Internal consistency reliability was found to be acceptable (Cronbach's alpha variation 0.78-0.96) [38]. These coefficients were found to be good in previous studies [27,46]. As in a previous study in Sweden, the alpha coefficient of the total score of the support subscale (0.96) was extremely high [46]. This suggests that the support subscale has redundancies and reduces the items [47]. In the development of the original INSPIRE, an item not considered important for recovery by more than half of the respondents was deleted [27]. In this study, as shown in Table 2, more than 80% of respondents indicated important for recovery in all items. Importantly, personal recovery is deeply personal [1]. Thus, we did not delete any items to ensure that the service users could choose what was important for their recovery. The test-retest reliability of the ICC and weighted linear kappa showed superior agreement for the total score of each subscale and Brief INSPIRE [39,40].

Validity of INSPIRE and brief INSPIRE
Convergent validity was found to be moderate (Pearson's correlation 0.49-0.80). These coefficients were similar to those of the previous study in the UK [27]. For the support subscale, the 3-factor structure was extracted using EFA. This 3-factor structure was different from the structure of the theory base in the previous study in which the original INSPIRE was tested [27]. CFA, to test the fitness of the data to the factor structure, revealed both models to be nonoptimal fits. In a comparison of two models, model 1 (theoretical 5-factor model) was better than model 2 (3-factor model). According to the AIC scores, model 1 showed a better fit than model 2. There are two reasons underlying this. First, almost all of the items had ceiling effects. We confirmed the ceiling effect on all items of the INSPIRE, except for S9 and S12. This may indicate bias in the data, and the exclusion of items was considered before factor analysis. However, factor analysis was performed without deleting the items because INSPIRE is a desirable scale for higher scores and the items were created from the CHIME framework [27,48]. Second, four support subscale items (S4, S9, S11, and S12) were rated as not important for recovery by more than 15% of respondents. These items are included in the domain of connectedness and identity. While Asian cultures focus on building harmonious interdependence with others, American culture tries to maintain independence by paying attention to oneself and through the discovery and expression of one's own intrinsic inner attributes [49]. Thus, Western and Asian cultures may perceive different relationships and identities. Further studies are needed to investigate the construct of personal recovery in Japan.
In the relationship subscale, the 1-factor structure was extracted using EFA. This was consistent with the structure of the original INSPIRE [27]. Further studies are needed to clarify factor structure validity with a larger number of participants.

Limitations
There are four main limitations to this study. First, the stability of test-retest reliability is very insufficient because of the small sample size (n = 10). Future studies with a large sample size would be needed to clarify test-  Respondents that answered all 7 items in relationship sub-scale retest reliability. Second, the convergent validity of the relationship subscale uses the Japanese version of STAR-P, which does not examine psychometric properties. As such, there is a limit to the validity evaluation of the relationship subscale. After completion of this study, the psychometric properties of the Japanese version of STAR were verified in 2019 [50]. Further studies are needed to validate the relationship subscale with the Japanese version of STAR-P that evaluates the psychometric properties. Third, generalization of the findings should be done with caution because the participants were selected only from specific areas and were using specific, limited types of services. Therefore, further research is required and should include diverse services such as visiting care. Fourth, some service centers that cooperated with this study might provide a higher level of support for personal recovery than others, and the participants that responded to the questionnaire did so in accordance with the level of service they received. This could have skewed the data, with overly positive ratings due to bias.

Research and clinical implications
The research on to how mental health service providers can support personal recovery is developing [51,52]. The original INSPIRE developer suggested using IN-SPIRE "as a benchmarking tool for comparison between groups of service users" [27]. INSPIRE will be a valuable tool to determine how a user feels about the services provided. Moreover, the use of INSPIRE by a service provider and user to look back together about the service will be an opportunity to develop better services and relationships. INSPIRE measures the quality of services, and provides a tool to compare Japan's services internationally. Brief INSPIRE includes the concept of CHIME and can be used as a simple evaluation. However, the brief version does not include individual preference for different types of support or assess the relationship with staff. Therefore, for simple evaluations, we recommend the brief version, and we recommend INSPIRE for more comprehensive evaluations. In Japan, there are previous studies on personal recovery evaluation of service users and recovery knowledge and attitudes of professionals [53][54][55]. Meanwhile, however, the evaluation of the recovery orientation of the service by the service user has not yet been done. INSPIRE could contribute to a framework in which service users are able to evaluate the recovery-oriented focus of mental health services in Japan. Moreover, INSPIRE can facilitate the development of training programs for mental health practitioners to support the personal recovery of service users in Japan. In addition to evaluating individual mental health services, INSPIRE can be compared with a variety of other mental health services both in Japan and abroad, and thus, can be useful for research to improve the recovery orientation of mental health services.

Conclusion
This study confirmed the internal consistency, test-retest reliability, and convergent and factor validity of the Japanese version of INSPIRE and Brief INSPIRE among users of community mental health services in Japan (Additional files 1 and 2). INSPIRE and Brief INSPIRE may be useful as patient self-report measures of staff support for personal recovery among Japanese people using community mental health services.

Supplementary information
Supplementary information accompanies this paper at https://doi.org/10. 1186/s12888-020-2467-y.  YM, and NK were involved in drafting the manuscript. All authors gave final approval of the version to be published and agreed to be accountable for the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Funding
This study was partially supported by JSPS KAKENHI Grant Number JP16H06395, JP16H06398, JP16H06399, JP16K21720, JP19K10923, and 26671003. This research was also partially supported by Research and Development Grants for Comprehensive Research for Persons with Disabilities from Japan Agency for Medical Research and development, AMED (Grant Number: 18dk0307066 and 18dk0307059), and Practical Research Project for Rare/Intractable Diseases from AMED (Grant Number: 18ek0109369). This work was also supported in part by UTokyo Center for Integrative Science of Human Behavior (CiSHuB) and by the International Research Center for Neurointelligence (WPI-IRCN) at The University of Tokyo Institutes for Advanced Study (UTIAS). The funders had no role in the design of the study and collection, analysis, and interpretation of data and writing of the manuscript.

Availability of data and materials
The datasets generated and analyzed during the current study are not publicly available as permission was not obtained from the participants to publicly share anonymized participant data but are available from the corresponding author on reasonable request.