- Research
- Open Access
- Published:
Reduced learning bias towards the reward context in medication-naive first-episode schizophrenia patients
BMC Psychiatry volume 22, Article number: 123 (2022)
Abstract
Background
Reinforcement learning has been proposed to contribute to the development of amotivation in individuals with schizophrenia (SZ). Accumulating evidence suggests dysfunctional learning in individuals with SZ in Go/NoGo learning and expected value representation. However, previous findings might have been confounded by the effects of antipsychotic exposure. Moreover, reinforcement learning also rely on the learning context. Few studies have examined the learning performance in reward and loss-avoidance context separately in medication-naïve individuals with first-episode SZ. This study aimed to explore the behaviour profile of reinforcement learning performance in medication-naïve individuals with first-episode SZ, including the contextual performance, the Go/NoGo learning and the expected value representation performance.
Methods
Twenty-nine medication-naïve individuals with first-episode SZ and 40 healthy controls (HCs) who have no significant difference in age and gender, completed the Gain and Loss Avoidance Task, a reinforcement learning task involving stimulus pairs presented in both the reward and loss-avoidance context. We assessed the group difference in accuracy in the reward and loss-avoidance context, the Go/NoGo learning and the expected value representation. The correlations between learning performance and the negative symptom severity were examined.
Results
Individuals with SZ showed significantly lower accuracy when learning under the reward than the loss-avoidance context as compared to HCs. The accuracies under the reward context (90%win- 10%win) in the Acquisition phase was significantly and negatively correlated with the Scale for the Assessment of Negative Symptoms (SANS) avolition scores in individuals with SZ. On the other hand, individuals with SZ showed spared ability of Go/NoGo learning and expected value representation.
Conclusions
Despite our small sample size and relatively modest findings, our results suggest possible reduced learning bias towards reward context among medication-naïve individuals with first-episode SZ. The reward learning performance was correlated with amotivation symptoms. This finding may facilitate our understanding of the underlying mechanism of negative symptoms. Reinforcement learning performance under the reward context may be important to better predict and prevent the development of schizophrenia patients’ negative symptom, especially amotivation.
Background
Amotivation is a core negative symptom of schizophrenia (SZ) [1] and closely correlated with poor clinical and functional outcomes [2,3,4]. Reinforcement learning (RL) involves assigning values to stimuli for driving motivated behaviours, which is believed to contribute to the underlying mechanisms for amotivation in SZ [5, 6].
Prediction Error (PE) signal and the expected value (EV) representation are essential in the operation of RL [7]. PE signals indicate the difference between the expected reward value and the received reward value [7]. Dopamine (DA) neurons located in the basal ganglia pathway generate PE signals by increasing phasic firing rates if the actual outcome is better than the expected outcome (positive PE), or decreasing phasic firing rates if the actual outcome is worse than the expected outcome (negative PE). After a serial trials, PE signals could enhance (Go learning) or reduce (NoGo learning) the association strength between the stimulus and action [7]. Defective operation of PE signals may result in dysfunctional value assignment. Indeed, patients with SZ have been found to exhibit altered Go learning but preserved NoGo learning [8,9,10,11]. Neuroimaging studies have also demonstrated blunted neural responses towards positive PEs in the striatum, the midbrain and other limbic regions [12,13,14]. Impaired Go learning, coupled with intact NoGo learning, appears to characterize the underpinning of amotivation in SZ [6].
The flexible inner representations of the expected value of the stimuli mainly involve the prefrontal cortex [7]. Clinical patients such as those with SZ have impaired prefrontal functions, and may assign the same EV to all positive PEs regardless of whether PEs are associated with reward or loss-avoidance [15]. In medicated patients with SZ, impaired EV representation has been found at both the behavioural [16,17,18,19] and brain functioning levels [20].
Despite the important role of Go/NoGo learning and the EV representation, the context in which RL is initiated is also an important factor in determining RL performance, since the context value sets the “reference point” to which an outcome would be compared, for updating and modifying value assignment [21]. For example, in contexts which entail an overall negative value (i.e., losses), successful trials of loss-avoidance will result in positive PEs. Evidence from behavioural sciences suggests that the different weightings of loss and reward are taken as a hardwired feature of people’s decision making [22]. Various biological mechanisms have been found to underlie reward−loss asymmetry, including genotypes [23], hormonal levels [24] and brain activation during reward processing [25]. Consequently, if one fails to adopt a context-dependent strategy, dysfunctional RL may occur. Indeed, one functional Magnetic Resonance Imaging (fMRI) study found reduced PE responses in unmedicated SZ patients in reward but not loss contexts within various regions including the medial prefrontal cortex, the striatum, and the medial temporal lobe [21].
The above studies have suggested that SZ patients may have impaired RL performance and the impairments may contribute to amotivation symptoms. However, findings on RL in SZ patients have been confounded by effects of medications which are DA-blocking agents. Evidence supports that antipsychotic medications exposure can affect the DA system and thus RL. Eisenegger et al. [26] demonstrated that sulpiride, a D2-like DA antagonist, can disrupt approaching behaviour towards rewards in healthy volunteers, whereas their loss-avoidance behaviour was unaffected [26]. This is also supported by one fMRI study showing that SZ patients receiving higher dosages of antipsychotic medications exhibited lower PE signals in the basal ganglia [27]. Moreover, previous studies on medicated patients with SZ revealed an association between negative symptoms and RL impairment [6, 8, 11, 17, 28], while studies recruiting unmedicated patients with SZ failed to find an association of negative symptoms with RL [29].
To address these limitations, this study examined RL performance in medication-naïve patients with first-episode SZ, using the well-validated paradigm of the Gain and Loss-Avoidance (GLA) task [17]. The GLA task taps into all the above three important aspects of RL. It should be noted that, in the majority of previous studies, the Go/NoGo learning index was conflated with the reward/loss-avoidance context. The Go learning was associated with reward receipt (reward context) and the NoGo learning was associated with loss (loss-avoidance context) [9]. The GLA task enables us to disentangle these two indexes. Given that the previous studies which found impaired Go learning but intact NoGo learning failed to differentiate the effects of reward/loss-avoidance contexts on RL, we hypothesized that SZ patients would could be showing impaired RL in the reward context coupled with intact RL in the loss-avoidance context, but not impaired Go learning coupled with intact NoGo learning. For EV representation, based on previous findings in medicated sample [6, 8, 16, 30], we hypothesized that medication-naïve patients with first-episode SZ would exhibit deficits in representing EV. We also hypothesized that greater RL impairment would be correlated with severe negative symptoms, predominantly in the amotivation dimension in SZ patients.
Methods
Participants
Twenty-nine patients with medication-native first-episode SZ diagnosed according to Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM–IV) [31] criteria were recruited from the Shanghai Mental Health Centre. None of clinical participants were taking any antipsychotic medications. The exclusion criteria were 1) history of other psychiatric disorder; 2) history of any neurological disorders; 3) acute exacerbations of psychotic symptoms; and 4) history of substance abuse in the past 30 days. Forty healthy individuals were recruited as healthy controls (HCs) from the community via distributing fliers and social media platforms. The exclusion criteria for HCs were 1) history of psychosis or neurological disorders; 2) family history of psychotic disorder; and 3) lifetime history of substance abuse. The study was approved by the Ethics Committee of the Shanghai Mental Health Centre (2017-19R). All participants provided written informed consent.
Gain and loss-avoidance task
The adapted version of the GLA task was developed based on Gold and colleagues’s [17] paradigm. In our GLA paradigm, eight landscape pictures (Figure 1) were used as stimuli.
Stimuli and feedbacks in the Acquisition Phase of the GLA task. a Feedback delivered after a correct choice (indicated by a red border) in the reward trials. b Feedback delivered following an incorrect choice in the reward trials. c Feedback delivered following a correct choice in the loss-avoidance trials. d Feedback delivered following an incorrect choice in the loss-avoidance trials
These pictorial cues were chosen as stimuli (cue) because we tested the arousal and valence ratings they could induce in 16 college students, and the results showed that the 8 pictures were comparable (see Additional file 1 Table S1).
The GLA contained two phases: the acquisition phase and the transfer phase. In the acquisition phase, four different pairs of cues were presented pseudo-randomly. Two pairs were associated with potential rewards, the other two with potential losses. Once presented with a pair of cues, participants were instructed to select the picture that was most likely to either (1) earn money (reward trials) or (2) avoid losing money (loss-avoidance trials). Feedback regarding the outcome was delivered based on the designated reinforcement property of each cue (e.g., Frequent-winner, 90% win: 90% chance of winning ¥5 and 10% of getting ¥0 (Figure 1)). Each pair of cues was presented 10 times in each block. There were four blocks, which resulted in a total of 160 trials. In the transfer phase, the previously learned four pairs of cues and 24 novel pairs of cues were pooled together and presented randomly (see Additional file 2 Table S2). Participants were instructed to select the optimal cue in each pair. In this phase, no feedback would be delivered. Each original pair was presented four times, and each novel pair presented twice. Monetary reward was calculated based on task performance in the Transfer phase, and could range from ¥30 to ¥50 (US$4-7).
Cognitive and clinical measures
All participants completed the information, arithmetic, similarities and digit span (forward and backward) subtests of the Wechsler Adult Intelligence Scale–Chinese version (WAIS-RC) for estimation of intelligence quotient (IQ). We also administered the Positive and Negative Syndrome Scale (PANSS) [32] and the SANS [33] to patients.
Statistical analysis
The task performance accuracy was calculated as the percentage of correct responses in choosing an item from the pair which could generate more reward or avoiding more loss in the acquisition and transfer phases. Trials with response time (RT) shorter than 100ms (0.5% trials on average) were deemed invalid and were excluded from the analysis.
Participants’ performance accuracy in the acquisition phase was analyzed using a four-way repeated-measure Analysis of Variance (ANOVA), with Context (reward vs. loss-avoidance), Probability (80% vs. 90%), Block (1 - 4) and Group (SZ vs. HC) as independent variables.
During the transfer phase, two indices (i.e., the Go learning and the NoGo learning) were generated, based on the accuracy of performance across contexts. Pairs consisting of a most frequently reinforced item (i.e., 90% win / 90% loss-avoidance) but not a most infrequently-reinforced item (i.e., 10% win / 10% loss-avoidance) were defined as “Go learning” pairs, which including 90% win vs 80% win, 90% win vs 20% win, 90% loss-avoidance vs 80% loss-avoidance and 90% loss-avoidance vs 20% loss-avoidance; whereas pairs which contained one most infrequently-reinforced item (10% win / 10% loss-avoidance) rather than most frequently reinforced item (i.e., 90% win / 90% loss-avoidance) were defined as “NoGo learning” pairs (i.e., 10% win vs 80% win, 10% win vs 20% win, 10% loss-avoidance vs 80% loss-avoidance and 10% loss-avoidance vs 20% loss-avoidance). Based on the accuracy of the Go and NoGo learning index, we conducted a repeated-measure ANOVA with Go/NoGo (Go vs. NoGo) as within-subject factor and Group as between-subject factor. To further examine the different performance of Go/NoGo learning, the difference scores in Go and NoGo learning accuracy were also extracted and tested for group differences.
For the reward context index, we averaged all the novel pair with both winning cues (i.e., 90% win vs 80% win, 90% win vs 20% win, 80% win vs 10% win and 10% win vs 20% win). Similarly, for the loss-avoidance context, four pairs including 90% loss-avoidance vs 80% loss-avoidance, 90% loss-avoidance vs 20% loss-avoidance, 80% loss-avoidance vs 10% loss-avoidance and 10% loss-avoidance vs 20% loss-avoidance were averaged. A repeated-measure ANOVA was then conducted with Context (Reward vs. Loss-avoidance) as the within-subject factor and Group as the between-subject factor. The difference scores in reward and loss-avoidance context accuracy were also calculated in order to examine the contextual learning bias towards either context. Group differences were tested using independent t tests.
To estimate participants’ ability in representing the EV, we calculated the accuracy of performance in four types of pairs in the transfer phase (i.e., 80% loss-avoidance vs 80% win, 90% loss-avoidance vs 90% win, 20% win vs 20% loss-avoidance and 10% win vs 10% loss-avoidance). Notably, these four pairs contained cues having same valence and probability of PE, but different EVs. Given that participants likely encountered items with high probability more often rather than those with low probability during the Acquisition phase, we generated two separate indices for EV (i.e., high probability EV index and low probability EV index). Moreover, given that the expected value indices are relied on the assumption of equal utilization of the positive/negative PEs in the reward and loss-avoidance context, the difference scores between reward and loss-avoidance context accuracy was taken as a covariate in the univariate ANOVAs to determine the group difference.
Given the gender difference in RL [34] and the close relationship between working memory (WM) and RL [35, 36], participants’ gender and WM performance (backward digit span) were entered as covariates in all the analyses. We took the WM performance so as to ascertain the effect of diagnosis on reinforcement learning without the confounding effect of poor WM associated with SZ patients. However, it is possible that covarying will reduce the ability of detecting diagnosis effects. Thus, we also did an exploratory analysis without the covariates and the results remained significant.
Partial correlations were used to examine the relationship between RL and clinical symptoms in terms of amotivation and anhedonia severity (Scale for the Assessment of Negative Symptoms (SANS) avolition and anhedonia subscale scores) in SZ participants, while controlling for gender and WM (backward digit span). We also examined the relationship between RL indices and WM in clinical participants, while controlling for the gender effect. The False Discovery Rate (FDR) corrections were applied. Greenhouse-geisser correction was used for results that did not meet the sphericity assumption.
Results
Demographics, cognitive functions and clinical characteristics
As shown in Table 1, the two groups did not differ in age, gender, education level, IQ estimates and WM performance (ps > .05).
Participants’ performance in the acquisition phase
The four-way ANOVA revealed a significant main effect of Block (F2.57,171.83 = 19.32, p <.001, η2 = 0.22), indicating that participants’ learning accuracy improved steadily over time (Figure 2).
The main effect of Probability was significant (F1,67 =6.16, p = .02, η2 = 0.08), suggesting the accuracies improved as the probability increased. Both groups’ performances in Block 4 were significantly better than random level (ps <.001). The Group-by-Context interaction failed to reach significance (F1,67 = 1.25, p = .27, η2 = 0.02). Main effect of Context (F1,67 =0.26, p = .62, η2 = 0.004) and Group (F1,67 <0.001, p = .99, η2 < 0.001) also failed to reach statistical significance. The Context-by-Block interaction (F2.58,173.00 = 3.44, p = .02, η2 = 0.05) and the Probability-by-Block interaction (F2.47,165.17 = 4.13, p = .007, η2 = 0.06) were significant. None of the other 3-way interactions and the 4-way interaction were significant.
Participants’ performance in the transfer phase
When the Go and NoGo learning indices were subjected to repeated measure ANOVA, the main effect of Group was not significant (F1,63 = 0.46, p =.50, η2 = 0.01), suggesting that participants with SZ did not show a general learning impairment relative to controls. The Group-by-Go/NoGo interaction was not significant (F1,63 = 1.89, p =0.17, η2 = 0.03, Figure 3A), showing that both groups have comparable performance in the Go and NoGo learning. Also, no significant group difference was found on the difference scores between Go-NoGo learning accuracy (F1,63 = 1.89, p =.17, η2 = 0.03). However, the difference scores between the reward to loss-avoidance context yield a significant group difference (F1,62 = 5.60, p =.02, η2 = 0.08, Figure 3C). Participants with SZ showed significantly reduced learning bias from reward context than HCs. The Group-by-Context interaction was also found significant (F1,62 = 5.60, p =.02, η2 = 0.08, see Figure 3B). Further analysis indicated HCs, but not SZs, performed better in the reward context than loss-avoidance context. The main effect of Context (F1,62 = 0.19, p =.66, η2 = 0.003) and Group (F1,62 = 0.05, p =.83, η2 = 0.001) were not significant.
When the EV indices were subjected to ANOVAs, SZ participants and HCs showed comparable EV indices both in the low (F1,64 = 0.001, p=.97, η2 < 0.001) and high probability conditions (F1,64 = 1.66, p=.20, η2 = 0.03).
Correlations between RL performance and clinical measures in medication-naïve participants with first-episode SZ
A significant and negative correlation was found between the accuracy of learning in reward context (10% win - 90% win) across the acquisition phase and the avolition subscale score of the SANS (r23 =-0.54, pFDR-corrected =.004).
Discussion
The present study investigated the multiple aspects of RL in medication-naïve patients with first-episode SZ. Despite the limited sample size and modest findings, we found preliminary evidence of SZ patients showing reduced contextual bias towards the reward context. Furthermore, the reward context learning performance was correlated with avolition symptoms of individuals with SZ. We found no evidence for dysfunction in Go/NoGo learning and EV representations in patients with SZ.
In medication-naïve SZ patients, the results showed intact Go and NoGo learning relative to controls. Similarly, recent studies also found intact positive and negative PE-driven learning in patients with chronic SZ [37]. On the other hand, a few previous studies on chronic SZ reported that both the Go and NoGo learning were impaired [38, 39]. However, the proposed selectively impaired Go but intact NoGo learning was not consistently found in patients with SZ. Compared with previous evidence, our findings were unlikely to be confounded by medication effect on the DA systems, and suggested that the Go and NoGo learning in SZ patients were largely intact. The role of Go and NoGo learning in SZ could vary among different stages of schizophrenia.
Our findings indicated a possible deficit in reduced learning bias towards the reward context in patients with SZ, which is consistent with previous study found more pronounced impairment in reward context among unmedicated SZ patients [40]. Indeed, previous fMRI results suggested that attenuated PE response in unmedicated patients with SZ in the medial prefrontal cortex under the reward but not loss-avoidance context [21, 41]. It also dovetails with studies using the same GLA task on medicated patients with chronic SZ, which were found to have poorer performance in reward than loss-avoidance trials [17, 42, 43]. The attenuated learning from rewards than loss-avoidance context in medication-naïve SZ patients, together with similar findings observed in chronic medicated SZ patients, may suggest a persistent dysregulation throughout the course of illness. Our findings of correlation analysis also suggest a positive relationship between the learning performance under the reward context and avolition symptom. This finding may indicate that reduced learning bias toward reward may related to the more severe amotivation symptoms.
Regarding EV representation, we found no evidence for impaired EV representation in medication-naïve first-episode SZ patients, consistent with earlier results using medicated SZ samples [28]. Our participants with SZ showed comparable preferences for reward stimuli over loss-avoidance stimuli as controls. Similar findings have been reported in individuals with ultra-high risk for SZ of their intact prefrontal activity during PE signaling and reward anticipation [44,45,46], suggesting that people at the very early stage of the SZ spectrum are capable of representing EV. EV performance and prefrontal activation while evaluating reward outcomes have repeatedly been found to be correlated with the severity of negative symptoms [29]. However, in our study, as negative symptoms were unlikely to attributable to medication effects, this relationship was not found. A similar study on brain activity in medication-naïve SZ patients during reward anticipation also did not find any significant correlation between prefrontal activation and negative symptoms [29]. According to the theory proposed by Waltz and Gold [15], although both unmedicated and medicated SZ patients have disrupted RL, the aberrant learning observed in medicated chronic SZ may more likely be due to faulty EV representation rather than dysfunctional PE utilization, while the latter mechanism may be more applicable to unmedicated SZ patients. Such an account posits that EV is strongly linked to negative symptoms in chronic SZ, leading to persistence of these symptoms throughout the illness. Although our sample size was relatively small, our preliminary results suggested that EV may play an important role in maintaining the negative symptoms rather than causing them.
This study has several limitations. First, the sample size was relatively small many of our results were modest in magnitudes, which might have limited statistical power. Moreover, given the limited sample size, it is possible that the current sample may not cover the full populations within the medication-naïve first-episode schizophrenia patients. Our sample was biased by a certain degree of highly-educated and young patients and we encouraged the readers to interpret the results with cautious. Future studies with larger sample size and a more representative sample are in great need to verify and replicate the present results. Second, our paradigm was limited by having a small number of trials. Future studies are required to verify the results with more trials. The GLA task was monetary in nature. Future developments of experimental paradigms imbedded in the social and interpersonal context may further promote the investigation of patients’ learning from social reward and social pleasure. Third, our sample of medication-naïve first-episode SZ patients apparently had a low level of anhedonia and amotivation. In order to understand the role of reinforcement learning in the formation of amotivation, future studies should recruit first-episode SZ patients with prominent negative symptoms.
Conclusions
In conclusion, we found preliminary evidence of a lack of learning bias towards the reward context in medication-naïve first-episode SZ patients. Performance under the reward context was negatively correlated with avolition symptoms measured by the SANS. In addition, we find patients with SZ demonstrated preserved EV representation and Go/NoGo learning in the early stages of the disease. Impaired reinforcement learning under the reward context in this very early case of SZ may indicate that it could serve as a viable starting point to better predict and prevent the developments of patients’ negative symptoms.
Availability of data and materials
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.
Abbreviations
- ANOVA:
-
Analysis of Variance
- DA:
-
Dopamine
- DSM–IV:
-
Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition
- EV:
-
Expected Value
- FDR:
-
False Discovery Rate
- fMRI:
-
Functional Magnetic Resonance Imaging
- GLA:
-
Gain and Loss-Avoidance
- HCs:
-
Healthy Controls
- IQ:
-
Intelligence Quotient
- PANSS:
-
Positive and Negative Syndrome Scale
- PE:
-
Prediction Error
- RL:
-
Reinforcement Learning
- RT:
-
Response Time
- SANS:
-
Scale for the Assessment of Negative Symptoms
- SZ:
-
Schizophrenia
- WAIS-RC:
-
Wechsler Adult Intelligence Scale–Chinese version
- WM:
-
Working Memory
References
Strauss GP, Bartolomeo LA, Harvey PD. Avolition as the core negative symptom in schizophrenia: relevance to pharmacological treatment development. NPJ Schizophr. 2021;7(1):1–6. https://doi.org/10.1038/s41537-021-00145-4.
Strauss GP, Horan WP, Kirkpatrick B, Fischer BA, Keller WR, Miski P, et al. Deconstructing negative symptoms of schizophrenia: avolition–apathy and diminished expression clusters predict clinical presentation and functional outcome. J Psychiatr Res. 2013;47(6):783–90. https://doi.org/10.1016/j.jpsychires.2013.01.015.
Chang WC, Ho RWH, Tang JYM, Wong CSM, Hui CLM, Chan SKW, et al. Early-stage negative symptom trajectories and relationships with 13-year outcomes in first-episode nonaffective psychosis. Schizophr Bull. 2019;45(3):610–9. https://doi.org/10.1093/schbul/sby115.
Najas-Garcia A, Gómez-Benito J, Huedo-Medina TB. The relationship of motivation and neurocognition with functionality in schizophrenia: a meta-analytic review. Community Ment Health J. 2018;54(7):1019–49. https://doi.org/10.1007/s10597-018-0266-4.
Gold JM, Waltz JA, Prentice KJ, Morris SE, Heerey EA. Reward processing in schizophrenia: a deficit in the representation of value. Schizophr Bull. 2008;34(5):835–47. https://doi.org/10.1093/schbul/sbn068.
Strauss GP, Waltz JA, Gold JM. A review of reward processing and motivational impairment in schizophrenia. Schizophr Bull. 2014;40(Suppl 2):S107–16. https://doi.org/10.1093/schbul/sbt197.
Frank MJ, Claus ED. Anatomy of a decision: striato-orbitofrontal interactions in reinforcement learning, decision making, and reversal. Psychol Rev. 2006;113(2):300–26. https://doi.org/10.1037/0033-295X.113.2.300.
Strauss GP, Frank MJ, Waltz JA, Kasanova Z, Herbener ES, Gold JM. Deficits in positive reinforcement learning and uncertainty-driven exploration are associated with distinct aspects of negative symptoms in schizophrenia. Biol Psychiatry. 2011;69(5):424–31. https://doi.org/10.1016/j.biopsych.2010.10.015.
Waltz JA, Frank MJ, Robinson BM, Gold JM. Selective reinforcement learning deficits in schizophrenia support predictions from computational models of striatal-cortical dysfunction. Biol Psychiatry. 2007;62(7):756–64. https://doi.org/10.1016/j.biopsych.2006.09.042.
Waltz JA, Frank MJ, Wiecki TV, Gold JM. Altered probabilistic learning and response biases in schizophrenia: behavioral evidence and neurocomputational modeling. Neuropsychology. 2011;25(1):86–97. https://doi.org/10.1037/a0020882.
Yılmaz A, Simsek F, Gonul AS. Reduced reward-related probability learning in schizophrenia patients. Neuropsychiatr Dis Treat. 2012;8:27–34. https://doi.org/10.2147/NDT.S26243.
Gradin VB, Kumar P, Waiter G, Ahearn T, Stickle C, Milders M, et al. Expected value and prediction error abnormalities in depression and schizophrenia. Brain. 2011;134(6):1751–64. https://doi.org/10.2147/NDT.S26243.
Murray GK, Corlett PR, Clark L, Pessiglione M, Blackwell AD, Honey G, et al. Substantia nigra/ventral tegmental reward prediction error disruption in psychosis. Mol Psychiatry. 2008;13(3):239, 267–76. https://doi.org/10.1038/sj.mp.4002058.
Waltz JA, Schweitzer JB, Gold JM, Kurup PK, Ross TJ, Jo Salmeron B, et al. Patients with schizophrenia have a reduced neural response to both unpredictable and predictable primary reinforcers. Neuropsychopharmacol. 2009;34(6):1567–77. https://doi.org/10.1038/npp.2008.214.
Waltz JA, Gold JM. Motivational deficits in schizophrenia and the representation of expected value. In: Simpson EH, Balsam PD, editors. Behavioral neuroscience of motivation. Cham: Springer International Publishing; 2016. p. 375–410. https://doi.org/10.1007/7854_2015_385.
Barch DM, Treadway MT, Schoen N. Effort, anhedonia, and function in schizophrenia: reduced effort allocation predicts amotivation and functional impairment. J Abnorm Psychol. 2014;123(2):387–97. https://doi.org/10.1037/a0036299.
Gold JM, Waltz JA, Matveeva TM, Kasanova Z, Strauss GP, Herbener ES, et al. Negative symptoms and the failure to represent the expected reward value of actions: behavioral and computational modeling evidence. Arch Gen Psychiatry. 2012;69(2):129. https://doi.org/10.1001/archgenpsychiatry.2011.1269.
Brown EC, Hack SM, Gold JM, Carpenter WT, Fischer BA, Prentice KP, et al. Integrating frequency and magnitude information in decision-making in schizophrenia: An account of patient performance on the Iowa Gambling Task. J Psychiatr Res. 2015;66–67:16–23. https://doi.org/10.1016/j.jpsychires.2015.04.007.
Hernaus D, Gold JM, Waltz JA, Frank MJ. Impaired expected value computations coupled with overreliance on stimulus-response learning in schizophrenia. Biol Psychiatr Cogn Neurosci Neuroimag. 2018;3(11):916–26. https://doi.org/10.1016/j.bpsc.2018.03.014.
Waltz JA, Xu Z, Brown EC, Ruiz RR, Frank MJ, Gold JM. Motivational deficits in schizophrenia are associated with reduced differentiation between gain and loss-avoidance feedback in the striatum. Biol Psychiatr Cogn Neurosci Neuroimag. 2018;3(3):239–47. https://doi.org/10.1016/j.bpsc.2017.07.008.
Reinen JM, Van Snellenberg JX, Horga G, Abi-Dargham A, Daw ND, Shohamy D. Motivational context modulates prediction error response in schizophrenia. Schizophr Bull. 2016;42(6):1467–75. https://doi.org/10.1093/schbul/sbw045.
Koszegi B, Rabin M. A model of reference-dependent preferences. Q J Econ. 2006;121(4):1133–65. https://doi.org/10.1093/qje/121.4.1133.
Frydman C, Camerer C, Bossaerts P, Rangel A. MAOA-L carriers are better at making optimal financial decisions under risk. Proc R Soc B. 2011;278(1714):2053–9. https://doi.org/10.1098/rspb.2010.2304.
Chumbley JR, Krajbich I, Engelmann JB, Russell E, Van Uum S, Koren G, et al. Endogenous cortisol predicts decreased loss aversion in young men. Psychol Sci. 2014;25(11):2102–5. https://doi.org/10.1177/0956797614546555.
Sokol-Hessner P, Camerer CF, Phelps EA. Emotion regulation reduces loss aversion and decreases amygdala responses to losses. Soc Cogn Affect Neurosci. 2013;8(3):341–50. https://doi.org/10.1093/scan/nss002.
Eisenegger C, Naef M, Linssen A, Clark L, Gandamaneni PK, Müller U, et al. Role of dopamine D2 receptors in human reinforcement learning. Neuropsychopharmacol. 2014;39(10):2366–75. https://doi.org/10.1038/npp.2014.84.
Insel C, Reinen J, Weber J, Wager TD, Jarskog LF, Shohamy D, et al. Antipsychotic dose modulates behavioral and neural responses to feedback during reinforcement learning in schizophrenia. Cogn Affect Behav Neurosci. 2014;14(1):189–201. https://doi.org/10.3758/s13415-014-0261-3.
Chang WC, Waltz JA, Gold JM, Chan TCW, Chen EYH. Mild reinforcement learning deficits in patients with first-episode psychosis. Schizophr Bull. 2016;42(6):1476–85. https://doi.org/10.1093/schbul/sbw060.
Nielsen MØ, Rostrup E, Wulff S, Bak N, Lublin H, Kapur S, et al. Alterations of the brain reward system in antipsychotic naïve schizophrenia patients. Biol Psychiatry. 2012;71(10):898–905. https://doi.org/10.1016/j.biopsych.2012.02.007.
Brown JK, Waltz JA, Strauss GP, McMahon RP, Frank MJ, Gold JM. Hypothetical decision making in schizophrenia: the role of expected value computation and “irrational” biases. Psychiatry Res. 2013;209(2):142–9. https://doi.org/10.1016/j.psychres.2013.02.034.
Association AP. Diagnostic and statistical manual of mental disorders. 4th ed: American Psychiatric Association; 1994.
Kay SR, Fiszbein A, Opler LA. The positive and negative syndrome scale (PANSS) for schizophrenia. Schizophr Bull. 1987;13(2):261–76. https://doi.org/10.1093/schbul/13.2.261.
Andreasen NC. The scale for the assessment of negative symptoms (SANS): conceptual and theoretical foundations. Br J Psychiatry Suppl. 1989;7:49–58.
Byrne KA, Worthy DA. Gender differences in reward sensitivity and information processing during decision-making. J Risk Uncertain. 2015;50(1):55–71. https://doi.org/10.1007/s11166-015-9206-7.
Collins AGE, Brown JK, Gold JM, Waltz JA, Frank MJ. Working memory contributions to reinforcement learning impairments in schizophrenia. J Neurosci. 2014;34(41):13747–56. https://doi.org/10.1007/s11166-015-9206-7.
Collins AGE, Frank MJ. How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis: working memory in reinforcement learning. Eur J Neurosci. 2012;35(7):1024–35. https://doi.org/10.1111/j.1460-9568.2011.07980.x.
Albrecht MA, Waltz JA, Frank MJ, Gold JM. Probability and magnitude evaluation in schizophrenia. Schizophr Res Cognition. 2016;5:41–6. https://doi.org/10.1016/j.scog.2016.06.003.
Cicero DC, Martin EA, Becker TM, Kerns JG. Reinforcement learning deficits in people with schizophrenia persist after extended trials. Psychiatry Res. 2014;220(3):760–4. https://doi.org/10.1016/j.psychres.2014.08.013.
Fervaha G, Agid O, Foussias G, Remington G. Impairments in both reward and punishment guided reinforcement learning in schizophrenia. Schizophr Res. 2013;150(2–3):592–3. https://doi.org/10.1016/j.schres.2013.08.012.
Moran EK, Gold JM, Carter CS, MacDonald AW, Ragland JD, Silverstein SM, et al. Both unmedicated and medicated individuals with schizophrenia show impairments across a wide array of cognitive and reinforcement learning tasks. Psychological Medicine. Cambridge University Press; 2020;:1–11. https://doi.org/10.1017/S003329172000286X.
Schlagenhauf F, Sterzer P, Schmack K, Ballmaier M, Rapp M, Wrase J, et al. Reward feedback alterations in unmedicated schizophrenia patients: relevance for delusions. Biol Psychiatry. 2009;65(12):1032–9. https://doi.org/10.1016/j.biopsych.2008.12.016.
Hartmann-Riemer MN, Aschenbrenner S, Bossert M, Westermann C, Seifritz E, Tobler PN, et al. Deficits in reinforcement learning but no link to apathy in patients with schizophrenia. Sci Rep. 2017;7(1):40352. https://doi.org/10.1038/srep40352.
Barch DM, Carter CS, Gold JM, Johnson SL, Kring AM, MacDonald AW, et al. Explicit and implicit reinforcement learning across the psychosis spectrum. J Abnorm Psychol. 2017;126(5):694–711. https://doi.org/10.1037/abn0000259.
Ermakova AO, Knolle F, Justicia A, Bullmore ET, Jones PB, Robbins TW, et al. Abnormal reward prediction-error signalling in antipsychotic naive individuals with first-episode psychosis or clinical risk for psychosis. Neuropsychopharmacol. 2018;43(8):1691–9. https://doi.org/10.1038/s41386-018-0056-2.
Juckel G, Friedel E, Koslowski M, Witthaus H, Özgürdal S, Gudlowski Y, et al. Ventral striatal activation during reward processing in subjects with ultra-high risk for schizophrenia. Neuropsychobiology. 2012;66(1):50–6. https://doi.org/10.1159/000337130.
Wotruba D, Heekeren K, Michels L, Buechler R, Simon JJ, Theodoridou A, et al. Symptom dimensions are associated with reward processing in unmedicated persons at risk for psychosis. Front Behav Neurosci. 2014;8:382. https://doi.org/10.3389/fnbeh.2014.00382.
Acknowledgements
Not applicable.
Funding
This work was supported by grants from the National Natural Science Foundation of China (81671326, 32171084), the Fundamental Research Funds for the Central Universities of Shanghai Jiao Tong University (16JXRZ06), Natural Science Foundation of Shanghai (21ZR142000), the Health Science and Technology Project of Pudong New Area Health Committee in 2020 (PW2020B-12), the Shanghai Pudong New Area Health and Family Planning Commission Key Discipline Construction Fund Project (PWZxk2017-29) and the Outstanding Clinical Discipline Project of Shanghai Pudong (PWYgy2018-10). We thank all the coordinating clinical staff. The authors declared no conflicts of interest regarding the subject of this study.
Author information
Authors and Affiliations
Contributions
Xiao-yan Cheng, Ling-ling Wang collected and analyzed data, and wrote the drafts of the manuscript. Xin-xin Huang, Xi-rong Sun, Jie Yuan interviewed and conducted the clinical assessment. Qin-yu Lv and Hai-su Wu recruited clinical cases and helped literature search. Chao Yan generated the idea, wrote the protocol, interpreted the findings and commented the drafts critically. Zheng-hui Yi designed the study, wrote the protocol, monitored the study and obtained funding for the present study. All authors have approved the final version of this manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
This study was conducted in accordance with the principles outlined in the Declaration of Helsinki and was approved by the Ethics Committee of the Shanghai Mental Health Centre (2017-19R). After the study procedures were explained to the participants, written informed consent was obtained from them in accordance with the Declaration of Helsinki.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1: Table S1.
Valence and Arousal of GLA Task Stimuli.
Additional file 2: Table S2.
Pairs in the Gain vs Loss-Avoidance (GLA) task.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Cheng, X., Wang, L., Lv, Q. et al. Reduced learning bias towards the reward context in medication-naive first-episode schizophrenia patients. BMC Psychiatry 22, 123 (2022). https://doi.org/10.1186/s12888-021-03682-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12888-021-03682-5
Keywords
- Reinforcement Learning
- Reward context
- Prediction error
- Expected value
- Negative symptom
- Medication-naïve