Using video-annotation software to identify interactions in group therapies for schizophrenia: assessing reliability and associations with outcomes

Background Research has shown that interactions in group therapies for people with schizophrenia are associated with a reduction in negative symptoms. However, it is unclear which specific interactions in groups are linked with these improvements. The aims of this exploratory study were to i) develop and test the reliability of using video-annotation software to measure interactions in group therapies in schizophrenia and ii) explore the relationship between interactions in group therapies for schizophrenia with clinically relevant changes in negative symptoms. Methods Video-annotation software was used to annotate interactions from participants selected across nine video-recorded out-patient therapy groups (N = 81). Using the Individual Group Member Interpersonal Process Scale, interactions were coded from participants who demonstrated either a clinically significant improvement (N = 9) or no change (N = 8) in negative symptoms at the end of therapy. Interactions were measured from the first and last sessions of attendance (>25 h of therapy). Inter-rater reliability between two independent raters was measured. Binary logistic regression analysis was used to explore the association between the frequency of interactive behaviors and changes in negative symptoms, assessed using the Positive and Negative Syndrome Scale. Results Of the 1275 statements that were annotated using ELAN, 1191 (93%) had sufficient audio and visual quality to be coded using the Individual Group Member Interpersonal Process Scale. Rater-agreement was high across all interaction categories (>95% average agreement). A higher frequency of self-initiated statements measured in the first session was associated with improvements in negative symptoms. The frequency of questions and giving advice measured in the first session of attendance was associated with improvements in negative symptoms; although this was only a trend. Conclusion Video-annotation software can be used to reliably identify interactive behaviors in groups for schizophrenia. The results suggest that proactive communicative gestures, as assessed by the video-analysis, predict outcomes. Future research should use this novel method in larger and clinically different samples to explore which aspects of therapy facilitate such proactive communication early on in therapy. Electronic supplementary material The online version of this article (doi:10.1186/s12888-017-1217-2) contains supplementary material, which is available to authorized users.


Background
Negative symptoms of schizophrenia are categorized along two dimensions: a reduction in expression, including lack of speech (alogia) and reduced facial expression (blunt affect); and a deficit in experiencing motivation (asociality) and pleasure (anhedonia) [1,2]. A meta-analysis [3] found that psychological interventions delivered in groups improve these symptoms compared to treatment as usual. It was therefore concluded that interactions in a group format are clinically advantageous in the treatment of negative symptoms of schizophrenia and should be explored further [3]. If specifically helpful or unhelpful group interactions can be identified, which are linked to changes in negative symptoms, these interactions can be encouraged, or discouraged, to improve the optimal effectiveness of group therapies.
Little research has sought to investigate interactions on a moment-to-moment basis in group therapies for schizophrenia; an approach referred to as 'interactional analysis' (IA) [4]. In a study by Kanas and colleagues [5], the Hill Interaction Matrix [6] was used to code group interactions from verbally spoken statements. In doing so, they identified beneficial interactions fostered within the group as expression of emotions, reality testing and advice giving. However, there are at least three methodological limitations with this study. First, measurement reliability was not tested, as ratings were limited to a single researcher. Second, no attempt was made to statistically explore the relationship between interactions and clinical outcomes. Third, ratings were made in realtime through a one-way mirror, and therefore a more finegrained analysis of group interactions was not possible.
Beck and Lewis [4] argue that the accuracy and practicality of IA can be improved by new video technologies. Research on psychiatrist-patient communication [7,8] highlights the benefit of using video-annotation software such as ELAN [9]. This free software can be used flexibly to annotate digitally recorded data from small, inexpensive and commercially available 2D video recording devices. Crucially, annotations in ELAN can be made with a precision of up to 50 frames per second. Hence it is feasible to assume that using ELAN, verbal interactions can be accurately annotated and rated in the context of subtle nonverbal cues associated with clinical outcomes in schizophrenia [10][11][12][13].
There were two aims of this study. First was to develop and test the reliability of combining IA and ELAN video-annotation software to measure interactions in a group therapy for schizophrenia. In doing so, we sought to combine ELAN with the Individual Group Member Interpersonal Process Scale (IGMIPS) [14][15][16]. The second aim of this study was to assess the link between group interactions and changes in negative symptoms. This aspect of the study was exploratory. Hence interactions were compared across participants with either a clinically relevant improvement or no change in negative symptoms, and there were no fixed predictions about which IGMIPS categories would predict outcomes.
Readily available video recordings of diagnostically homogenous group therapies for individuals with schizophrenia were used in this study. These included Body-Oriented Psychotherapy (BPT) groups and Cognitive Behavioral Therapy (CBT) groups. BPT is a manualized intervention broadly aimed at reducing negative symptoms by refocusing cognitive and emotional awareness towards the body to stimulate activity [17,18]. CBT is also a manual intervention, broadly aimed at addressing negative symptoms through thoughts, demoralized feelings and behaviors that lead to social isolation and social apathy [19]. The high audio and visual quality of the digitally recorded data was expected to contribute to high inter-rater agreement of the ratings between two independent raters.

Methods
This study used video recordings and pre-post symptom measurements from a pool of 81 participants who attended nine separate group therapies for schizophrenia. Data from participants attending eight of the nine groups were collected as part of the NESS Trial. The NESS trial is a multi-centred randomized controlled trial (RCT) that examined the effectiveness of BPT for negative symptoms of schizophrenia in comparison to an active control group (trial registration ISRCTN84216587). Data from participants attending one of the nine groups was collected from a CBT group for this study. The BPT groups consisted of 20 sessions over 10 weeks and the CBT was 14 sessions over 10 weeks. All sessions lasted 90 min. Further details of BPT and CBT treatment groups are outlined elsewhere [17][18][19][20].

Sample
Participants were recruited from out-patient mental health services between 2012 and 2014. Participants were aged between 18 and 65 years, had a diagnosis of schizophrenia, scored 18 or above on the negative subscale of the Positive and Negative Syndrome Scale (PANSS) [21], were willing to participate and were able to provide written informed consent.
Interactions were assessed from participants with clinically relevant changes in negative symptoms, i.e. those who met the criteria of being an 'improver' , and participants who met the criteria for being a 'no-changer'. Improvers were participants who had a clinically significant reduction on the PANSS negative symptoms subscale from the baseline to end of treatment assessment phase, defined as a clinically relevant reduction of at least 20% [22]. No-changers were participants who had either no change in the PANSS negative symptoms subscale, a reduction of just one point from baseline to end of treatment assessment phase, or an improvement of just one point from baseline to end of treatment assessment phase.
Participants were excluded if they had an insufficient command of English, a physical disability that impaired participation in groups, did not attend at least one session within the first and last five sessions of the group, and did not meet the criteria of an 'improver' or 'no-changer'. Participants who did not have a 20% negative symptom reduction in the Clinical Assessment Interview for Negative Symptoms (CAINS) [2], a more specific measure of negative symptoms, were also excluded from the 'improver' category. Participants who demonstrated a reduction of more than 10% in negative symptoms measured by the CAINS were excluded from the 'no changer' category.

Outcome measures
Positive and negative symptom scale [21] The PANSS is a semi-structured interview, consisting of 30-items designed to measure positive, negative and general symptoms of schizophrenia. The PANSS negative symptom subscale was used for the primary outcome 'improver status'a binary outcome, which indicated whether the participants were improvers or no-changers. The negative symptom subscale includes seven items related to a difficulty in abstract thinking, poor rapport, emotional withdrawal, passive social withdrawal, lack of speech, stereotyped thinking and blunt affect.
Individual group member interpersonal process scale [14] The IGMIPS [14] provides a structured coding format from which interactive behaviors made by individual group members are rated. 'Frequency ratings' , indicating the presence of each interactive behavioral category, were assessed as the main primary independent variable of interest. A summary of the IGMIPS categories is outlined in Table 1. Frequency ratings for each category were measured as a binary outcome (yes or no). 'Significance ratings' , rated on a Likert-scale, were also made to determine the intensity of a given interactive behavior. Furthermore, 'where ratings' were described in accordance to the 'locational focus' of a statement, along with 'who' statements referring to whom and/or what the statement is made about.

Procedure
Negative symptoms were measured before and after treatment. Baseline assessments were made within 4 weeks of the BPT and CBT groups starting and endof-treatment assessments were made within 4 weeks of the BPT and CBT groups finishing. Six independent research assistants who had no involvement in the treatment conducted assessments. The trial through which the BPT groups were set up was a double-blind RCT. Hence, the researchers who collected this data were blind to treatment allocation, clinical outcomes and group attendance. Assessments for participants attending the CBT group were made by an unblinded researcher (SO). Hence a portion of these assessments were video-recorded and re-assessed by an independent blinded assessor. It was found that there was high interrater reliability (>0.82) between the two assessors before beginning the analysis of group interactions.
The analysis of group interactions occurred in two stages. In stage one, 'individual statements' were transcribed from participants who were identified as improvers and no-changers. These verbal statements were the primary source of material from which interactions were measured. Single 'statements' were defined as an utterance of three or more words, bound either by a pause of more than 10 s or by an interruption by another group member. All statements were transcribed from the video-recorded group sessions using ELAN annotation software [9]. Two video-recordings, from opposite angles in the group therapy room, were used when transcribing individual statements (see Fig. 1). Both video-recordings included their own audio file, which were used interchangeably in ELAN depending on the audio quality at each angle.
In the second stage of analysis, the group interactions were judged from the transcribed statements, using the structured coding format outlined in the IGMIPS. An adapted version of the original IGMIPS-III manual was developed for this study by the first author (SO) -see Additional file 1: Table S1 for a detailed outline of this adapted manual. Using the video-recordings of the sessions, statements were rated in the context of nonverbal cues such as tone, gaze and hand gestures. Individual statements were rated in accordance to whom the statement was being made to (i.e. self, therapist, group as a whole or other group member), the locational focus of the statement (i.e. life outside the group vs. inside the group) and whether the statement was self-initiated or elicited.
All interactions were first coded by SO. Thirty percent of statements were then rated by an independent researcher (SA), who received 3 days of training. The statements rated by SA were chosen at random and were stratified by participant. During this period a sufficient inter-rater reliability (over 80% rater agreement) was achieved.

Statistical analysis
All analyses were conducted in STATA/SE version 12.0 (StatCorp. 2011), where p < 0.05 was taken to indicate statistical significance and p < 0.10 was taken to indicate a statistical trend. Frequency ratings, indicating the presence of interactive behaviors, were calculated as sample population percentage scores for each improver and nochanger. The sample population percentage was calculated by dividing the total number of statements rated 'yes' (i.e. present) for each IGMIPS category, by the total number of statements made by each participant. Scores were calculated for statements made in the first and last sessions of attendance, each of which were 90 min long.
Rater agreement between SO and SA was calculated to ensure sufficient coding reliability: this was calculated as the percentage of categories coded by both SO and SA as 'present'. An alternative statistical test of inter-rater reliability that can account for any rater agreement that occurs due to chance, such as Cohen's Kappa, was considered. However, the type of data measured within the IGMIPS categories varied across the different IGMIPS categories; including binary outcome data (yes/no for the 'frequency of interactive behavior' ratings), continuous outcome data (for the 'significance' ratings) and nominal outcome data (for the 'who' ratings). Therefore no statistical test of inter-rater reliability that could account for this variation was deemed suitable. Furthermore, the actual percentage agreement was deemed as more informative than a statistical test of inter-rater reliability, which relies on the prevalence of the measure (i.e. the number of statements rated per person).
The relationship between the frequency of interactive behavior and improver-status was explored through a series of binary logistic regression analyses. Separate analyses were conducted for each category outlined in the IGMIPS. Furthermore, separate analyses were conducted

Sample characteristics
Of the 81 participants recruited to the nine therapy groups included in this study, 17 participants met the inclusion criteria. Nine participants met the criteria for 'improvers' and eight participants met the criteria for 'no-changers'. The characteristics of these participants are outlined in Table 2.
In total, 1275 individual statements were coded in accordance with the categories outlined in the IGMIPS. Eighty-four of these statements (7%) were excluded from the analysis as they were rated as 'less than 50% audible' -i.e. less than 50% of the statement could be annotated due to inaudibility of the statement. The frequency of interactive behaviors are summarized in Table 3.

Inter-rater reliability
Across all IGMIPS categories, there was 96% agreement between the two independent ratings made by SO and SA. There was 92% agreement for all frequency categories rated either 'yes' or 'no'. In addition, there was 98, 99 and 95% agreement for significance, locational and who items respectively. See Table 4 for a summary of the percentage agreement for each of the categories.

Linking frequency ratings of interactive behaviors and improver-status
Findings from the binary logistic regression analyses, which explored the relationship between the frequency of interactive behaviors and improver-status, are summarized in Table 5.
There was a significant association between selfinitiated statements and improver-status in the first session of attendance (95% CI = 1.00 to 1.13, p < 0.05); where a higher frequency of self-initiated statements in the first session of attendance was associated with being an 'improver' at the end of treatment. For statements occurring in the last session of attendance, there was a statistical trend between the frequency of self-initiated statements and improver-status (OR = 1.08, 95% CI = 0.99 to 1.18, p = 0.083). There was also a trend for an association between the frequency of questions (95% CI = 1.00  First = number of statements made in first session, Last = number of statements made in the last session, All = number of statements across both first and last sessions to 1.14, p = 0.067) and improver-status, and giving advice (95% CI = 1.00 to 2.07, p = 0.067) and improver-status; where a higher frequency of statements involving questions or giving advice in the first session of attendance was associated with being an 'improver' at the end of treatment.
There was no statistically significant evidence for a relationship between these variables in the last session of attendance. There was no statistically significant evidence for any of the other IGMIPS categories and improver status.

Summary of findings
In line with our first hypothesis, the overall agreement between the two researchers was high. This supports the feasibility of combining a group behavioral coding scale, such as the IGMIPS, with ELAN video-annotation software. There was minimal disagreement between raters on coding the IGMIPS categories, further supporting its feasibility in measuring a range of individual group member interactions.
There was also evidence that group interactions, measured on a moment-to-moment basis using videoannotation software, were predictive of negative symptom outcomes. A higher frequency of self-initiated statements in the first session of attendance was associated with a clinically significant improvement in negative symptoms following participation in group therapy. There was also evidence for a statistical trend between more questions and advice giving in the first session of attendance, and improved odds of clinical improvements in negative symptoms.

Strengths and limitations
This study had a number of strengths. First, stringent measures were taken to ensure that observer-rated group interactions were measured reliably, with minimal rater-bias. Participant statements were transcribed by two trained researchers to ensure minimal errors in annotating individual statements. Interactions were measured from group sessions that were recorded in high-definition with two microphone sources, for optimal visual and audio quality. Furthermore, coding reliability was assessed from a randomly selected proportion of statements coded by an independent researcher. Second, assessments of negative symptoms on 15 of the 17 participants were conducted by blinded researchers as part of a randomized controlled trial. Third, interactions were measured across multiple group psychological therapies, which have varying therapeutic orientations. Fourth, the impact of group interactions was explored in participants with clinically relevant outcomes, selected from a large pool of participants who attended multiple groups for schizophrenia. One limitation is multiple testing. Given that separate logistic regression analyses were conducted for each of the IGMIPS categories, the chances of finding a false positive were high. A further limitation was the small sample included in the logistic regression analyses. This meant that these analyses lacked the power to detect a significant change in the odds of being an improver or no-changer following group attendance. Hence it is not possible to conclude with confidence that an effect of the frequency of interactions on improver status did not exist where there was no statistical evidence for an effect. Despite careful planning, there was also inevitably a degree of opportunism, such that data included in this study depended on the availability and quality of data that could be obtained from the NESS trial. Furthermore, the generalizability of the findings is arguably questionable, since eight of the nine groups from which data were collected were BPT groups. Given that therapist behavior wasn't rated, it is not known how the therapeutic orientation of the intervention affected group interactions.

Interpretation of findings
The results from this study suggest that modern videorecording devices and ELAN video-annotation software can be used to identify moment-to-moment group interactions [4]. Researchers can improve the feasibility of this approach by focusing their resources on the interactive categories identified as being most important in this study. Annotating the verbally spoken statements from the video-recorded sessions was notably the most time-consuming process. To date, clinical research has focused on advancing technologies that automatically annotate nonverbal interactive behaviors, for example 3D motion detection [23] or motion energy analysis [24]. Until recently, these approaches have only been possible to use in laboratory or restricted spaces, and therefore lack ecological validity. We therefore propose that future research would benefit more from exploring whether automatic voice recognition and transcription technologies [25] can be used with the approach described in this study. Where 100 Significance 100 Gives advice 96 Where 100 A score of '100' for participant 1 in the 'self' category means that there was 100% agreement between the two independent raters for statements rated as 'self' In particular, whether prosodic features of speech shown to be linked to negative symptoms of schizophrenia [26] can be used to improve the accuracy and feasibility of assessing important group interactions. Nonetheless, the research conditions, in particular the video recording equipment, did not appear to impact the clinical integrity of the treatment. Whilst this supports the feasibility of using this approach in common clinical practice, future research is needed to test the potential ease of transferability of the method described.
The results from this study support the hypothesis that interactions between group members in group therapies are linked to improvements in negative symptoms [3]. Negative symptoms are difficult to treat with conventional psychological and pharmacological medications and are linked with poor quality of life and impaired social functioning [27][28][29]. Hence our findings, which give insight into what type of group interactions are linked to clinically relevant improvements in this symptom domain, have important clinical implications. For example, clinicians may want to consider emphasizing activities aimed at promoting interactions related to initiating statements, asking questions or giving advice in group therapies.
The finding that self-initiation, advice giving and question asking are associated with improvements in negative symptoms is in line with research within the field of conversational analysis in schizophrenia. Within this literature, studies have found that 'proactive' communicative behaviors are associated with outcomes [7,8,23]. For example, in doctor-patient consultations, proactive gestures and asking questions have been linked with improved clinical decision making and treatment adherence [7]. Based on this literature and the results from this study, future research should explore the clinical impact of actively enhancing these types of interactions in the treatment of schizophrenia.
The method described may also be useful in identifying beneficial group interactions from the very first session of therapy. In accordance with research on group therapeutic processes [30,31], individualized psychotherapy [32] and pharmacological treatment [33], the findings from this study highlight the importance of an initial positive response to therapy. Future research should therefore explore the impact of promoting beneficial group interactions from the very first session. In doing so, baseline participant characteristics shown to be related to clinical outcomes in schizophrenia, for example cognitive performance [34], should also be measured and accounted for.

Conclusions
This study highlights the reliability of using videoannotation software to assess moment-to-moment N/Ano statements made by participants in the no-changer category were rated as 'disconnection' or 'enhanced awareness' in the first session, therefore logistic regressions were not possible for these IGMIPS categories. SE standard error, CI confidence interval interactions in a naturalistic group therapy setting for schizophrenia. Moreover, the findings suggest that behaviors assessed by this novel method are relevant for outcomes in therapies for patients with negative symptoms of schizophrenia. In particular, proactive communication identifiable from the very initial session, including self-initiated (rather than elicited) statements, advice giving and asking questions, appeared to be linked with clinically significant improvements at the end of treatment. Clinicians may therefore want to consider emphasizing activities aimed at promoting interactions related to proactive communication. Future research should explore what aspects of therapy facilitate such proactive communication early on in therapy.

Additional file
Additional file 1: Table S1.