Duloxetine compared with fluoxetine and venlafaxine: use of meta-regression analysis for indirect comparisons

Eckert, Laurent; Lançon, Christophe

doi:10.1186/1471-244X-6-30

Problems with Indirect Comparisons in Meta-Analyses

Dustin Ruff, Eli Lilly and Company

7 March 2007

In “Duloxetine compared with fluoxetine and venlafaxine: use of meta-regression analysis for indirect comparisons” Eckert and Lançon [1] attempt to make comparisons of the overall efficacy and safety of duloxetine to that of fluoxetine and venlafaxine through indirect methods, namely by calculating effect sizes in separate studies of each of these drugs compared to placebo, and then averaging over studies while statistically controlling for some differences between the studies (“meta-regression”). The authors state “In the absence of a well-powered randomized placebo controlled direct comparison trial, meta-regression analysis offers the most rigorous evidence science can buy.” There are good reasons why regulatory agencies do not accept such indirect (non-randomized) comparisons as valid evidence of comparative efficacy. Two major problems can arise with such a “meta-regression” approach; both of these problems are inherent in the analyses presented by Eckert and Lançon and seriously compromise the usefulness of their analyses for making statements about the relative efficacy of duloxetine, fluoxetine, and venlafaxine.

The first major problem with indirect, non-randomized, comparisons is the possibility that any observed effects may be due to confounds between the samples of studies for the different medications. In the case of the Eckert and Lançon analyses, there are confounding variables that render the indirect comparisons uninterpretable. First, most of the venlafaxine studies were conducted at an earlier time point, using a different diagnostic system (one study used DSM-III; 4 studies used DSM-III-R; only 3 studies used DSM-IV) compared to duloxetine (all studies used DSM-IV). Using the data reported in their article, the mean effect size for the 5 DSM-III/III-R venlafaxine studies is 0.63 compared to 0.36 for the 3 later DSM-IV venlafaxine studies. Whether this large difference is due to differences in the diagnostic systems, changes in research diagnostic interview methods over the years, or other differences over the 15 years since the first venlafaxine studies were published, is not known. One potential difference over this time period is that there has been a substantial increase in the use of medications to treat MDD [2]. This increase suggests that recent trials will tend to recruit a higher proportion of treatment-resistant patients, resulting in lower effect sizes. Regardless of the explanation for the decrease in effect sizes over time, it is inappropriate to compare results from older studies to those from newer studies such as the duloxetine studies, as the large effect size difference between older and newer venlafaxine studies demonstrates.

A second confound is that one of the 9 duloxetine studies [3] did not target a standard major depressive disorder (MDD) sample; the study used a sample of patients with painful physical symptoms and concurrent MDD. Although duloxetine has shown efficacy for pain, patients with pain and MDD have lower response rates than those without pain [4]. In the pain/MDD study, evidence was presented showing that additional selection criteria used in this trial based on baseline pain yielded a different population than seen in the standard major depressive disorder trials. When adjustments were made based on prior depression history, study results more closely modeled that seen in other trials [3].

If the pain/MDD study is excluded and only DSM-IV studies are examined, the average response rate (using numbers provided in the Eckert and Lançon article for studies that reported response rate outcomes) is 63.2% for 5 duloxetine studies and 60.2% for 3 venlafaxine studies. Hardly evidence that venlafaxine is superior.

A third major confound is baseline severity. Although the authors apparently adjusted the calculation of the drug vs. placebo effect size for any baseline severity differences between the treatment groups within each study, no attempt was made to adjust for between-study differences in severity of the patient populations. The duloxetine studies all used the 17-item Hamilton Rating Scale for Depression (HAM-D) and required patients to have a baseline minimum score of 15 for inclusion; the venlafaxine studies all used the 21-item version and required a baseline minimum score of 20 for inclusion. For the 8 duloxetine studies (excluding the MDD/pain study), the mean baseline 17-item HAM-D was 19.8 for duloxetine treatment groups. For the 8 venlafaxine studies, the mean baseline 21-item HAM-D was 25.4 for the venlafaxine treatment groups. The 21-item HAM-D total is typically about 2 points higher than the 17-item total [5]. Thus, the venlafaxine studies would have an estimated 17-item baseline HAM-D mean of 23.4, which is 3.6 points higher than the average of the duloxetine studies – a relatively large different in average severity. Baseline HAM-D severity has been strongly linked to the size of antidepressant drug vs. placebo effect, with larger effect sizes associated with higher levels of baseline severity [6]. The differences in severity between the duloxetine studies compared to the venlafaxine studies is likely therefore to explain in part the different effect sizes obtained. It might be tempting to statistically convey these baseline severity differences in a “meta-regression”; however, because of non-overlapping distributions for the two medications, such an approach is not appropriate.

There is considerable evidence of publication bias in the studies included in the Eckert and Lançon article. The funnel plot for venlafaxine given in Figure 4 of their article is indicative of a publication bias. In a funnel plot, no bias would be indicated by an inverted funnel shape in which studies with relatively smaller samples sizes would show greater effect size variability (thus a wide base to the inverted funnel) and studies with relatively larger sample sizes would show little variability because of greater precision of the effect sizes (thus a narrow top to the inverted funnel). In Figure 4, the venlafaxine studies produce a plot where only half of a potential “funnel” shape is evident in the lower half of the figure. The funnel plot evidence suggests that there is a likely bias towards publishing small sample size venlafaxine studies that report larger effect sizes. The authors report a p-value of 0.1 as a test for publication bias but this significance test has been shown to be considerably underpowered when evaluating a small number of trials, even when severe bias is present [7]. Given the lack of power, a p-value close to 0.1 is evidence of notable bias. Finally, the author’s assertion that “… any bias would most likely be in favour of the newer drug and its existence would not undermine the results presented here” is unsubstantiated.

All of these issues call into question the interpretability and validity of the final conclusions of the Eckert and Lançon [1], as well as an earlier meta-analysis that used most of the same studies to reach similar conclusions [8]. While the authors acknowledge some of these issues in their Discussion section, these concerns rise to the level of “fatal flaws” rather than simply “limitations.” In the end, we fully agree with the authors’ assertion that “Superiority of one antidepressant medication relative to another needs to be established by means of prospectively designed, adequately powered, head-to-head clinical trials.” In this regard, of particular importance is that there was available to the authors a report that included two large scale head-to-head multicenter studies of venlafaxine versus duloxetine that were designed to be pooled for primary analyses [9]. The analysis showed no significant difference between venlafaxine and duloxetine at endpoint (12 week trials) despite the large sample size (n=667) that provided ample statistical power to detect even small differences. This finding is highly inconsistent with the indirect comparisons presented by Eckert and Lançon [1]. Therefore, until more of such head-to-head trials are conducted, let’s not be misled by confounded indirect comparisons and publication biases.

Best regards,

Dustin Ruff, PhD; Head, Statistics

Michael J. Detke, MD, PhD; Medical Director

Michael J. Robinson, MD, FRCPC; Clinical Research Physician

References

1. Eckert L, Lançon C. Duloxetine compared with fluoxetine and venlafaxine: use of meta-regression analysis for indirect comparisons. BMC Psychiatry 2006, 6:30

2. Olfson M, Marcus SC, Druss B, Elinson L, Tanielian T, Pincus HA. National Trends in the Outpatient Treatment of Depression. JAMA 2002, 287: 203-209

3. Brannan SK, Mallinckrodt CH, Brown EB, Wohlreich MM, Watkin JG,

Schatzberg AF. Duloxetine 60 mg once-daily in the treatment of painful

physical symptoms in patients with major depressive disorder. J Psychiatr

Res 2005, 39:43-53

4. Bair MJ, Robinson RL, Eckert GJ, Stang PE, Croghan TW, Kroenke K. Impact of pain on depression treatment response in primary care. Psychosom Med 2004, 66:17–22

5. O’Sullivan RL, Fava M, Agustin C, Baer L, Rosenbaum JF. Sensitivity of

the six-item Hamilton Depression Rating Scale. Acta Psychiatr Scand 1997, 95: 379–384.

6. Khan A, Brodhead AE, Kolts RL, Brown WA. Severity of depressive symptoms and response to antidepressants and placebo in antidepressant trials. J Psychiatr Res 2005, 39:145–150

7. Sterne JA, Gavagha D, Egger M. Publication and related bias in meta-analysis: power of statistical tests and prevalence in the literature. J Clin Epidemiol 2000, 53:1119-1129

8. Vis PMJ, van Baardewijk M, Einarson TR: Duloxetine and venlafaxine-XR in the treatment of MDD: a meta-analysis of randomized clinical trials. Ann Pharmacother 2005, 39:1798-8

9. Perahia D, Pritchett YL, Walker D, Tran P, Raskin J, Russell J. Comparing duloxetine and venlafaxine in the treatment of major depressive disorder using a global benefit-risk approach. Presented at the New Clinical Drug Evaluation Unit (NCDEU) meeting, Boca Raton, FL, June 2005

Competing interests

All authors are employees of Eli Lilly and Company.

Problems with Indirect Comparisons in Meta-Analyses

Dustin Ruff, Eli Lilly and Company

7 March 2007

In “Duloxetine compared with fluoxetine and venlafaxine: use of meta-regression analysis for indirect comparisons” Eckert and Lançon [1] attempt to make comparisons of the overall efficacy and safety of duloxetine to that of fluoxetine and venlafaxine through indirect methods, namely by calculating effect sizes in separate studies of each of these drugs compared to placebo, and then averaging over studies while statistically controlling for some differences between the studies (“meta-regression”). The authors state “In the absence of a well-powered randomized placebo controlled direct comparison trial, meta-regression analysis offers the most rigorous evidence science can buy.” There are good reasons why regulatory agencies do not accept such indirect (non-randomized) comparisons as valid evidence of comparative efficacy. Two major problems can arise with such a “meta-regression” approach; both of these problems are inherent in the analyses presented by Eckert and Lançon and seriously compromise the usefulness of their analyses for making statements about the relative efficacy of duloxetine, fluoxetine, and venlafaxine.
The first major problem with indirect, non-randomized, comparisons is the possibility that any observed effects may be due to confounds between the samples of studies for the different medications. In the case of the Eckert and Lançon analyses, there are confounding variables that render the indirect comparisons uninterpretable. First, most of the venlafaxine studies were conducted at an earlier time point, using a different diagnostic system (one study used DSM-III; 4 studies used DSM-III-R; only 3 studies used DSM-IV) compared to duloxetine (all studies used DSM-IV). Using the data reported in their article, the mean effect size for the 5 DSM-III/III-R venlafaxine studies is 0.63 compared to 0.36 for the 3 later DSM-IV venlafaxine studies. Whether this large difference is due to differences in the diagnostic systems, changes in research diagnostic interview methods over the years, or other differences over the 15 years since the first venlafaxine studies were published, is not known. One potential difference over this time period is that there has been a substantial increase in the use of medications to treat MDD [2]. This increase suggests that recent trials will tend to recruit a higher proportion of treatment-resistant patients, resulting in lower effect sizes. Regardless of the explanation for the decrease in effect sizes over time, it is inappropriate to compare results from older studies to those from newer studies such as the duloxetine studies, as the large effect size difference between older and newer venlafaxine studies demonstrates.
A second confound is that one of the 9 duloxetine studies [3] did not target a standard major depressive disorder (MDD) sample; the study used a sample of patients with painful physical symptoms and concurrent MDD. Although duloxetine has shown efficacy for pain, patients with pain and MDD have lower response rates than those without pain [4]. In the pain/MDD study, evidence was presented showing that additional selection criteria used in this trial based on baseline pain yielded a different population than seen in the standard major depressive disorder trials. When adjustments were made based on prior depression history, study results more closely modeled that seen in other trials [3].
If the pain/MDD study is excluded and only DSM-IV studies are examined, the average response rate (using numbers provided in the Eckert and Lançon article for studies that reported response rate outcomes) is 63.2% for 5 duloxetine studies and 60.2% for 3 venlafaxine studies. Hardly evidence that venlafaxine is superior.
A third major confound is baseline severity. Although the authors apparently adjusted the calculation of the drug vs. placebo effect size for any baseline severity differences between the treatment groups within each study, no attempt was made to adjust for between-study differences in severity of the patient populations. The duloxetine studies all used the 17-item Hamilton Rating Scale for Depression (HAM-D) and required patients to have a baseline minimum score of 15 for inclusion; the venlafaxine studies all used the 21-item version and required a baseline minimum score of 20 for inclusion. For the 8 duloxetine studies (excluding the MDD/pain study), the mean baseline 17-item HAM-D was 19.8 for duloxetine treatment groups. For the 8 venlafaxine studies, the mean baseline 21-item HAM-D was 25.4 for the venlafaxine treatment groups. The 21-item HAM-D total is typically about 2 points higher than the 17-item total [5]. Thus, the venlafaxine studies would have an estimated 17-item baseline HAM-D mean of 23.4, which is 3.6 points higher than the average of the duloxetine studies – a relatively large different in average severity. Baseline HAM-D severity has been strongly linked to the size of antidepressant drug vs. placebo effect, with larger effect sizes associated with higher levels of baseline severity [6]. The differences in severity between the duloxetine studies compared to the venlafaxine studies is likely therefore to explain in part the different effect sizes obtained. It might be tempting to statistically convey these baseline severity differences in a “meta-regression”; however, because of non-overlapping distributions for the two medications, such an approach is not appropriate.
There is considerable evidence of publication bias in the studies included in the Eckert and Lançon article. The funnel plot for venlafaxine given in Figure 4 of their article is indicative of a publication bias. In a funnel plot, no bias would be indicated by an inverted funnel shape in which studies with relatively smaller samples sizes would show greater effect size variability (thus a wide base to the inverted funnel) and studies with relatively larger sample sizes would show little variability because of greater precision of the effect sizes (thus a narrow top to the inverted funnel). In Figure 4, the venlafaxine studies produce a plot where only half of a potential “funnel” shape is evident in the lower half of the figure. The funnel plot evidence suggests that there is a likely bias towards publishing small sample size venlafaxine studies that report larger effect sizes. The authors report a p-value of 0.1 as a test for publication bias but this significance test has been shown to be considerably underpowered when evaluating a small number of trials, even when severe bias is present [7]. Given the lack of power, a p-value close to 0.1 is evidence of notable bias. Finally, the author’s assertion that “… any bias would most likely be in favour of the newer drug and its existence would not undermine the results presented here” is unsubstantiated.
All of these issues call into question the interpretability and validity of the final conclusions of the Eckert and Lançon [1], as well as an earlier meta-analysis that used most of the same studies to reach similar conclusions [8]. While the authors acknowledge some of these issues in their Discussion section, these concerns rise to the level of “fatal flaws” rather than simply “limitations.” In the end, we fully agree with the authors’ assertion that “Superiority of one antidepressant medication relative to another needs to be established by means of prospectively designed, adequately powered, head-to-head clinical trials.” In this regard, of particular importance is that there was available to the authors a report that included two large scale head-to-head multicenter studies of venlafaxine versus duloxetine that were designed to be pooled for primary analyses [9]. The analysis showed no significant difference between venlafaxine and duloxetine at endpoint (12 week trials) despite the large sample size (n=667) that provided ample statistical power to detect even small differences. This finding is highly inconsistent with the indirect comparisons presented by Eckert and Lançon [1]. Therefore, until more of such head-to-head trials are conducted, let’s not be misled by confounded indirect comparisons and publication biases.
Best regards,
Dustin Ruff, PhD; Head, Statistics
Michael J. Detke, MD, PhD; Medical Director
Michael J. Robinson, MD, FRCPC; Clinical Research Physician
References
1. Eckert L, Lançon C. Duloxetine compared with fluoxetine and venlafaxine: use of meta-regression analysis for indirect comparisons. BMC Psychiatry 2006, 6:30
2. Olfson M, Marcus SC, Druss B, Elinson L, Tanielian T, Pincus HA. National Trends in the Outpatient Treatment of Depression. JAMA 2002, 287: 203-209
3. Brannan SK, Mallinckrodt CH, Brown EB, Wohlreich MM, Watkin JG,
Schatzberg AF. Duloxetine 60 mg once-daily in the treatment of painful
physical symptoms in patients with major depressive disorder. J Psychiatr
Res 2005, 39:43-53
4. Bair MJ, Robinson RL, Eckert GJ, Stang PE, Croghan TW, Kroenke K. Impact of pain on depression treatment response in primary care. Psychosom Med 2004, 66:17–22
5. O’Sullivan RL, Fava M, Agustin C, Baer L, Rosenbaum JF. Sensitivity of
the six-item Hamilton Depression Rating Scale. Acta Psychiatr Scand 1997, 95: 379–384.
6. Khan A, Brodhead AE, Kolts RL, Brown WA. Severity of depressive symptoms and response to antidepressants and placebo in antidepressant trials. J Psychiatr Res 2005, 39:145–150
7. Sterne JA, Gavagha D, Egger M. Publication and related bias in meta-analysis: power of statistical tests and prevalence in the literature. J Clin Epidemiol 2000, 53:1119-1129
8. Vis PMJ, van Baardewijk M, Einarson TR: Duloxetine and venlafaxine-XR in the treatment of MDD: a meta-analysis of randomized clinical trials. Ann Pharmacother 2005, 39:1798-8
9. Perahia D, Pritchett YL, Walker D, Tran P, Raskin J, Russell J. Comparing duloxetine and venlafaxine in the treatment of major depressive disorder using a global benefit-risk approach. Presented at the New Clinical Drug Evaluation Unit (NCDEU) meeting, Boca Raton, FL, June 2005

Competing interests

All authors are employees of Eli Lilly and Company.

Archived Comments for: Duloxetine compared with fluoxetine and venlafaxine: use of meta-regression analysis for indirect comparisons

Problems with Indirect Comparisons in Meta-Analyses

Competing interests

BMC Psychiatry

Contact us