A preprint of the manuscript was uploaded to psyarxiv (https://psyarxiv.com/). The extracted data used for the meta-analysis are available at our Open Science Framework (OSF) data repository (https://osf.io/z52fc/). A PRISMA checklist concerning the documentation of the meta-analysis can be retrieved in the Appendix (Additional file 1) . The meta-analysis was not pre-registered.
Studies were selected if they fulfilled the following eligibility criteria. BDD symptom severity had to be measured with a questionnaire or interview that captures symptoms as described in the fifth or fourth edition of the Diagnostic and Statistical Manual of Mental Disorders, DSM-5 or DSM-IV [1, 32]. This comprised detailed measures of BDD symptom severity as well as shorter screening measures for BDD symptoms. Alternatively, categorial diagnostic measures of BDD based on DSM-IV or DSM-5 were also considered. Hence, the Yale-Brown Obsessive Compulsive Scale for Body Dysmorphic Disorder (BDD-YBOCS) , the self-report and clinician-administered versions of the Body Dysmorphic Disorder Examination (BDDE) , the Body Dysmorphic Symptoms Inventory (Fragebogen körperdysmorpher Symptome; FKS) , the Questionario sul Dismorfismo Corporeo (QDC) , the Dysmorphic Concern Questionnaire (DCQ) , the Body Dysmorphic Disorder Questionnaire (BDDQ) , and the Body Dysmorphic Disorder Diagnostic Module (BDD-DM)  were included in this meta-analysis. Measures of body image or body dissatisfaction were excluded. Also, measures which specifically address muscle dysmorphia were not included, as we intended to investigate BDD symptoms in general and because of the overlap between muscle dysmorphia and eating disorders. This meta-analysis relied on the definition and operationalization of self-esteem by Rosenberg . Thus, self-esteem needed to be assessed via the Rosenberg Self-Esteem Scale (RSES), the most widely used self-report measure for global self-esteem . For inclusion in the meta-analysis of partial correlations, studies were required to use a questionnaire or interview for the assessment of depressive symptom severity. The Beck Depression Inventory (BDI) [40,41,42], the Hamilton Depression Rating Scale (HAMD) , the depression subscale of the Depression Anxiety Stress Scales (DASS) , the depression subscale of the Hospital Anxiety and Depression Scale (HADS) , the depression subscale of the Symptom Checklist-90 (SCL-90) , and the Patient Health Questionnaire-9 Depression module (PHQ-9)  were used in the studies.
Clinical, subclinical, and non-clinical samples were examined. Studies could target BDD patients, mentally healthy control participants, students, community persons, and cosmetic surgery patients. Participants were allowed to have secondary comorbid mental disorders. However, samples with another primary mental disorder (e.g., eating disorders, social anxiety disorder) were excluded. Studies that were recruited according to the presence or absence of a physical condition (e.g., rheumatic arthritis, obesity) were not included in this analysis. Also, samples that were selected according to related factors (e.g., body dissatisfaction) were not considered. No restrictions concerning age or gender of the sample were applied. Studies could be designed as correlational surveys or intervention studies. Since we investigated the cross-sectional relationship, data on all our variables of interest had to be collected at a single measurement point. In the case of more than one measurement point, baseline measures were analyzed. Case studies were omitted. For inclusion, manuscripts were required to be written in English or German.
Several sources were used to identify relevant studies. The databases PubMed, PsycInfo, PsycArticles, Medline, Web of Science, Psyndex, and Dissertation Abstracts International were searched for eligible studies. Furthermore, ongoing trials were found in the http://ClinicalTrials.gov registry, the Cochrane Central Register of Controlled Trials (CENTRAL), the WHO International Clinical Trials Registry Platform (ICTRP), and the ISRCTN registry. We also tried to obtain unpublished data by searching OpenGrey (http://www.opengrey.eu). The keyword-based literature search was carried out by the second author in April 2017. Subsequently published or registered studies were identified in January 2019, August 2019, and in May 2020. The following search term was applied: (body dysmorphic AND self-esteem) or (dysmorphophobia AND self-esteem) or (dysmorphophobic AND self-esteem) or (body dysmorphic AND self-worth) or (dysmorphophobia AND self-worth) or (dysmorphophobic AND self-worth). The corresponding German search terms were: (körperdysmorphe AND Selbstwert) or (Dysmorphophobie AND Selbstwert) or (dysmorphophobe AND Selbstwert). Additionally, 24 well-known researchers in the field of BDD were contacted for unpublished studies in September 2019.
In a first step, the abstracts of identified studies were screened. The abstract screening of studies which were published after April 2017 was performed by two research assistants. The abstracts were required to suggest that BDD symptoms and self-esteem were captured in the study. Subsequently, a full text assessment was conducted by the second author (or a research assistant for studies with dates of publication after April 2017) according to the eligibility criteria described above.
A coding scheme for extraction of relevant data was developed. The coding scheme contained the following information: First, the sample was described with regard to the number of participants (in total and in the subgroups), clinical status, age, sex, education, ethnicity, sample type (e.g., students, cosmetic surgery patients), comorbidities, and other study-specific inclusion criteria (e.g., a certain cut-off on a BDD questionnaire). Second, the assessment of BDD symptom severity was specified. The interview or questionnaire used to examine BDD symptoms, diagnostic criteria, the diagnostic method (self-report vs. clinician-administered), as well as means and standard deviations of the diagnostic measure in the sample were coded. Additionally, the range of BDD symptom severity (e.g., only clinical participants) and whether the study compared two extreme groups (e.g., BDD patients versus healthy controls) were rated. Third, mean and standard deviation of the RSES in the total sample were gathered. Fourth, information on the assessment of depressive symptoms was collected. This included the measure for depressive symptom severity, the applied diagnostic criteria, the diagnostic method, as well as mean and standard deviation of the measure for depressive symptoms. Fifth, the reported effect size data were compiled. Preferably, the correlations between BDD symptom severity and self-esteem, between BDD symptom severity and depressive symptom severity, and between self-esteem and depressive symptom severity were gathered. Additionally, we coded whether the correlation was reported in the study or obtained by the authors afterwards. The type of correlation and the number of participants, for whom the correlation was calculated, were also coded. Alternatively, Cohen’s d for the difference in self-esteem and depressive symptoms of participants with BDD compared to participants without BDD were entered. If Cohen’s d was not reported, the mean and standard deviation of self-esteem and depressive symptom severity, and the number of participants in each comparison group were collected.
Data were coded independently by the first and second author. Interrater agreement was 97% and consensus was achieved after discussion of divergent coding. If studies did not report all data that were needed for the meta-analysis, authors were asked for the missing information. Altogether, 30 authors were contacted (concerning 35 studies) and 17 authors provided the required information (for 20 studies).
The effect sizes in the individual studies might have been subject to bias. We considered the selection of the sample (e.g., clinical BDD patients versus non-clinical students) and the diagnostic method for assessing BDD symptoms (self-report versus clinician-administered) as possible sources of bias. Consequently, these aspects were included in our coding scheme and controlled for in moderator analysis. Furthermore, we dealt with potential selective reporting by contacting all authors of studies which assessed our variables of interest without reporting an effect size for the relationship between BDD symptoms and self-esteem.
Effect sizes for the relationship between BDD symptom severity and self-esteem were calculated in three ways depending on the level of measurement of BDD symptom severity. For the majority of studies (k = 21), Fisher’s z transformed Pearson correlations between BDD symptom severity and self-esteem were analyzed. If effect sizes could not be based on a continuous measure of BDD symptom severity, we either used the pointbiseral correlation (k = 1) between BDD (coded 1 for BDD and 0 for healthy controls) and self-esteem or Cohen’s d (k = 1) which was transformed to Fisher’s z [48, 49]. In this case Cohen’s d described the difference in mean self-esteem between participants with BDD compared to participants without BDD. This categorial effect size is not based on the individual values of participants but rather on the group means. Thus, it mirrors the relationship between BDD symptom severity and self-esteem on a less precise group level. Nevertheless, we preferred to integrate these categorial effect sizes in the meta-analysis to achieve an extensive overview of the field and to avoid complete loss of the information. Two studies [12, 50] followed an ordinal approach and reported correlations between the number of items endorsed on the BDDQ and self-esteem. As this represents a gain in information compared to mere nominal data, this procedure was applied for studies which used the BDDQ.
If possible, an effect size for the total sample (instead of separate effect sizes for the subgroups) was gathered. Still, samples with varying ranges of BDD symptom severity were examined. In some cases, this may have caused underestimation of the true effect, whereas in others the magnitude of the relationship might have been overestimated . Restriction of range in samples with reduced variance of BDD symptom severity (e.g., only clinical BDD participants) may have led to underestimation of the true effect. Enhancement of range and corresponding overestimation of effect sizes may have been produced by comparison of extreme groups (BDD patients versus healthy controls). A meta-analysis without artifact correction was conducted to describe the actual observed effects. Additionally, we attempted to correct for the artifacts. Thereby, we intended to achieve an estimate of the effect scaled on the general population without variance restrictions. For this purpose, studies with potentially restricted or enhanced range of BDD symptom severity were identified on the basis of theoretical assumptions concerning the sample. The individual correlations of these studies were adjusted before conducting a meta-analysis using standard corrections for variance restrictions . For the adjustment, an estimate of the standard deviation of the BDD symptom severity measure in the general population was used and applied to all studies included. If possible, this was drawn from studies with large community samples.
For the calculation of partial correlations between BDD symptom severity and self-esteem controlling for depressive symptom severity, Pearson correlations between BDD symptom severity and depressive symptom severity, as well as between self-esteem and depressive symptom severity were conducted and preprocessed in the same manner as described above. The partial correlations controlling for depressive symptom severity were also Fisher’s z transformed for a subsequent meta-analysis. A meta-analysis of (z-transformed) partial correlations was also conducted with and without artifact correction.
A random effects meta-analysis was chosen to account for heterogeneity in effect sizes across studies. The computation was performed in R  using the metafor package . For the assessment of effect size variability I2 and τ were used. A moderator analysis was conducted to examine the influence of participants’ mean age, percentage of females, sample type, diagnostic method, and BDD diagnosis on effect sizes. An alpha level of α = .05 was applied. To visualize a potential publication bias, we created funnel plots.