Source of data and study design
The data for this study came from the Jimma University Medical Center, which is located in the Jimma Zone of Oromia Regional State in Ethiopia’s south west. Jimma Zone is approximately 325 km from Ethiopia’s capital city, Addis Ababa.
The patient’s registry dates to the event time or censoring time in this data, which is secondary data recorded at the hospital. As a result, after identifying patients who were admitted and followed up from September 1, 2018 to August 31, 2020, data was retrieved from the patient’s card, which contains epidemiological, laboratory, and clinical information of MDD patient’s card and information sheet. The first symptomatic recovery, which was otherwise censored, was the event for this investigation. The information on the suppressed or abridged subjects, however, is incomplete. Patients with MDD who did not have symptomatic recovery over the research period, lost, or withdrew before symptomatic recovery were censored. Patients who were admitted for follow-up of all major depressive disorders for at least three visits at Jimma University Medical Center from September 2018 to August 2020 were included in this study, which used a prospective cohort study design. A total of 366 patients with depression disorders were enrolled in this investigation.
Variables in the study
The survival time (time to first symptomatic recovery) evaluated in months from the start of treatment to the date of the patient’s recovery or censored was the dependent variable in this study. The patients' status was 1 if they recovered and 0 if they were censored during the study period. About the recovery, the psychiatrist made decision based on the psychiatric examination. The standards criterion is by using the DSM-5 diagnostic criteria when the patient is fully free from those symptoms for at least six months. There are different instrument, especially regarding to screening the patient sign and symptom to know whether suffering from specific mental illness or psychological distress, but there are only to confirming or what you call diagnostic instrument. Those are: 1) DSM-5 which stands for Diagnostic Statistical Manual version five and 2) ICD-11 which stand for International Classification of Disease version 11 for which in Ethiopia we use DSM-5.
Major depressive episode was diagnosed when at least 2 weeks of persistent depressed mood, anhedonia, or hopelessness occurred (reported by self or observed by others), plus additional symptoms from criterion A, for a total of 5 of the 9 DSM-5 major depression criteria [32] and the clinical significance criterion. Lifetime DSM-5 MDD was defined as at least one lifetime major depressive episode without full DSM-5 manic, mixed, or hypomanic episodes, [32, 33] excluding substance induced and medical-induced disorders. Those with at least one episode in the prior 12 months were classified as having 12-month MDD. Clinical validity was assessed through concordance with blinded clinician reappraisals using the Psychiatric Research Interview for Substance and Mental Disorders, DSM-5 version (PRISM-5) [34, 35]. Concordance for binary MDD diagnoses was fair [36] (κ = 0.35–0.46) and higher with corresponding DSM-5 MDD dimensional scales (intraclass correlation, 0.60–0.64) [34].
Gender, age, marital status, first onset age, educational status, other cofactors, family history of mental illness, substance abuse, religion, ethnicity, chewing khat, and employment status were all considered factors of recovery duration (independent variables) in the study.
Inclusion and exclusion criteria
All patients (12–65 years old) with major depressive disorder were included in the study, whereas children under the age of 12, pregnant or lactating women (less than 6 months), and patients with irrelevant information during the study period were excluded. MDD is less common in pre-school children (1–2%) than in adults (20%) [37], hence children under the age of 12 were excluded.
Statistical methods
Data that measures the time to a certain event of interest is referred to as survival data [26]. Estimates of the survival function and hazard function are useful for summarizing survival data. Because no explicit assumptions regarding the underlying distribution of survival times are required, this method is non-parametric or distribution frees [38]. Otherwise, survivor function estimators, such as the Kaplan–Meier (KM) survival function estimator and the log-rank test for comparing two or more groups of categorical variables, were utilized in this work.
Suppose we have a sample of independent observations, their survival times denoted by \({t}_{1}, {t}_{2}, {t}_{3}, ..., {t}_{n}\) and indicators of censoring denoting by \({\delta }_{1}, {\delta }_{2}, {\delta }_{3}, ..., {\delta }_{n}\) where
$${\delta }_{i}=\left\{\begin{array}{l}1, if\,the\,first\,symptomatic\,occur\\ 0, otherwise\end{array}\right.$$
Thus, the survival data are denoted by \({t}_{i}, {\delta }_{i}; i=1, 2, 3, .., n\). The first step to obtain the KM estimator of the survival function is to order the survival times as \({t}_{1}, {t}_{2}, {t}_{3}, ..., {t}_{n}\). Assume that \(m\le n\) events occurred at distinct m times among the n observations. The probability that an event will not occur by time t:S(t) = P(T > t) is the main quantity of interest. The survival function is estimated by Kaplan and Meier.
$${\widehat{S}}_{KM}\left(t\right)=\prod_{{t}_{i}\le t}{\left(\frac{{n}_{i}-{d}_{i}}{{n}_{i}}\right)}^{{\delta }_{i}}=\prod_{{t}_{i}\le t}{\left(1-\frac{{d}_{i}}{{n}_{i}}\right)}^{{\delta }_{i}},$$
where \({d}_{i}\) is number of patients experienced event at \({t}_{i}\) and \({n}_{i}\) is number of patients at risk before \({t}_{i}\) [38, 39].The log-rank test which is used for comparison of the survival curves of two or more categorical covariates also applied [40].
A random effects model with shared frailties is one in which the frailties are common (or shared) among groups of individuals or spells and are randomly distributed among groups. The shared frailty model is a conditional model in which all participants in a cluster share frailty [41, 42]. The multivariate frailty model is a variation of the univariate frailty model that permits people in the same cluster to have the same frailty value.
The researchers assumed that there is a clustering (frailty) effect on modeling time-to-first symptomatic recovery from MDD which might be due to the heterogeneity in district from which the patients came-from i.e. patients’ coming from the same district share similar risk factors related to MDD. Clusters with minimum median time have smaller frailties, so that these clusters are predicted to have a high hazard and more probable to first symptomatic recovery [43]. These nuisance terms modify the hazard function, so that the hazard function should be evaluated conditionally on this effect. Moreover, districts frail more are more likely to symptomatic recovery than the less frail districts (since the event is positive).
Conditional on the random term, called the frailty denoted by \({u}_{i}\), the survival times in cluster \(i (1\le i\le n)\) are assumed to be independent, the proportional hazard frailty model assumes.
$${h}_{ij} \left(\frac{t}{{X}_{ij}}, ui\right)= exp\left({\beta }^{\mathrm{^{\prime}}}{X}_{ij}+{u}_{i}\right){h}_{0}\left(t\right),$$
where \({u}_{i}\) the random term of all the subjects in cluster.
The choice of frailty distribution is critical for obtaining an accurate description of the data’s dependent structure. Gamma and Inverse Gaussian frailty distributions were used in this investigation. In both cases, the degree of independence is represented by a single heterogeneity parameter (denoted by θ).
The functional form of the one parameter gamma distribution is given by:
$${f}_{z}\left({Z}_{i}\right)= \frac{{{Z}_{i}}^{(\frac{1}{\theta })-1}\mathrm{exp}(-{Z}_{i}/\theta )}{\Gamma (\frac{1}{\theta }){ \theta }^{{}^{1}\!\left/ \!{}_{\theta }\right.}},\theta >0$$
The inverse Gaussian (inverse normal) distribution was introduced as a frailty distribution alternative to the gamma distribution by [44]. The probability density function of an inverse Gaussian shared distributed random variable with parameter \(\theta > 0\) is given by:
$${f}_{z}\left({Z}_{i}\right)=( {\frac{1}{2\pi \theta })}^\frac{1}{2}{Zi}^{-\frac{3}{2}}{\mathrm{exp}\left(\frac{-\left(zi-1\right)}{2\theta zi}\right)}^{2},\theta >0,z>0$$
The baseline hazard functions for the parametric shared frailty models were the Exponential, Weibull, and Log normal distributions.
Furthermore, the Akaike Information Criterion (AIC) was utilized to choose the optimal model for describing the data. Quantile–Quantile plots were used to examine the goodness of the fitted model, whereas Cox-Snell residuals were used to evaluate the baseline parameters. The data was analyzed using R-3.6.3 program.