The study was conducted with a telemedicine platform (Talkspace) used by independently practicing, licensed therapists in the United States. The platform is accessible through internet search, through Employee Assistance Programs, and as a behavioral health benefit through some individual insurances. Patients first meet with an intake clinician through a live messaging system to conduct a brief, standardized intake to identify the presenting complaint, patient treatment history, and the patient’s provider preferences. This information informs a matching algorithm that prioritizes and presents three providers with the desired characteristics for the patient to choose among. Once a clinician is chosen, the provider is alerted, and the patient is immediately introduced to the messaging “room” where treatment takes place. Patients complete a self-report baseline assessment and the provider walks them through the informed consent and emergency contact process after which treatment can begin. Observations in this study include data collected as part of organizational quality assurance and program management processes between January 1, 2016 and February 1, 2018. All patients and clinicians give written consent to the use of their data in a de-identified, aggregate format as part of the user agreement before they begin using the platform. Study procedures were approved as exempt by the institutional review board at Teachers College, Columbia University (15–426).
Participants were individuals who presented with a chief complaint of anxiety or depression, were seeking treatment through the service, and who completed at least one PHQ-9 and/or GAD-7. Inclusion criteria consisted of:  being English speakers in the United States,  between the ages of 18 and 65,  having regular internet or cellphone access,  receiving a depression or anxiety diagnosis from their assigned licensed mental health provider based on a clinical intake and live messaging or video-based interview, as recorded in the electronic medical record with ICD-10 codes,  scoring 10 or higher on the PHQ-9 and/or GAD-7. Exclusion criteria consisted of current or past diagnoses of:  bipolar disorder,  any schizophrenia spectrum and psychotic disorder, or psychotic features,  any medical or neurological condition that would better account for the symptoms,  substance or alcohol use disorder  any condition requiring hospitalization; or  suicidal thoughts and/or behavior sufficient to be marked a “Yes” on any of questions three through six (at least thoughts about a potential suicide method), on the Columbia Suicide Severity Rating Scale Lifetime-Recent Screen , requiring a more intensive level of care that interrupted treatment on the platform. Twenty three thousand nine hundred one patient records were reviewed with these criteria; the final sample consisted of 10,718 patients.
Clinicians in the provider network were currently licensed in at least one state, were required to have a Masters degree or above, and had at least 3 years of post-licensure experience delivering mental health care. Clinicians were matched only to patients where licensure included the patient’s residence. There were a total of 1599 clinicians – 43.7% of whom reported five to 9 years of post-licensure experience, and 36.5% reporting ten or more years of experience. Eighty-eight percent (88.0%) were female. Providers had a mean age of 40 (SD = 10.04) years, and as part of their provider profile they reported offering treatment based on multiple orientations: 61.0% cognitive-behavioral treatment, 40.3% third-wave cognitive behavioral interventions (e.g., mindfulness-based), and 25.5% psychodynamic or relational.
Methods and procedures
Clinicians and patients asynchronously exchanged text-, audio-, and video-based messages using a secure, HIPAA-compliant platform accessible on mobile devices and on desktop computers. Patients could freely send messages at any time without limit, and all messages were stored for the clinician when they returned to review the message history. Therapists responded to messages from their patients at least once a day, 5 days a week. Clinicians were expected to adhere to all reporting, professional, and ethical standards for their respective fields, and appropriate referrals were provided for patients judged to need a higher level of care.
The number of words exchanged between therapists and patients is automatically counted as meta-data by the platform regardless of the medium, and these counts were used as a proxy to quantify the extent of therapeutic interaction through the asynchronous messaging medium. Words contained within audio and video messages were converted to text to enable word counting using secure and proprietary voice-to-text algorithms. Raw counts of words sent by clinicians and patients were used in supplementary analyses. Raw counts of the number of audio and video messages sent by each party were also analyzed.
Patients were assessed for depression and anxiety symptoms at baseline and then every 3 weeks for the duration of treatment, or until the patient opted to stop receiving assessments. Assessments are introduced to patients as an important aspect of their care that facilitates goal setting and to track progress. In this study, five assessments from baseline to week 12 were analyzed, including: Baseline, Week 3, Week 6, Week 9, and Week 12.
The 9-item Patient Health Questionnaire  was used to identify the clinical severity of depression. Responses on all items were given on a 4-point Likert scale (0 = Not at all to 3 = Nearly every day) with a total maximum score of 24. Scores greater or equal than 10 have been shown to have high sensitivity and specificity as a threshold for clinical depression, or at least moderate depression [31, 32].
Anxiety symptoms were assessed with the 7-item Generalized Anxiety Disorder questionnaire . Responses on all items were given on a 4-point Likert scale (0 = Not at all to 3 = Nearly every day) with a total maximum score of 21. Scores of 10 or above have been shown having high sensitivity and specificity as a clinically significant threshold for at least moderate anxiety .
Patients opting to leave the platform were asked to indicate the reason for leaving. Reasons included feeling better or meeting their goals, having money concerns, not liking the therapy medium, having frustrating technical issues, not liking their therapist, deciding to continue treatment face-to-face, or no longer having the time necessary to engage in treatment.
Data analytic strategy
Outcome trajectories of anxiety and depression symptoms over the 12 weeks of treatment were analyzed using Latent Growth Modeling (LGM) in Mplus 8 . LGM is an unsupervised machine learning method to identify groups with heterogeneous outcomes (i.e., such as responders and non-responders) and examine their differences. Compared to traditional average-effects approaches, LGM analyzes patterns of change in the data over time, to determine whether there are subpopulations within the overall group of patients. For example, patients with severe symptoms at baseline who end with low symptoms versus patients that begin and end treatment with a milder symptom presentation. In the current study, LGM also teased patients with changes in both anxiety and depression symptoms, versus those improving in only one of the two conditions. Another advantage to LGM is that once patients have been grouped into different trajectories (or classes), characteristics that are common to each class can be identified (i.e., covariates). For example, patients who share a remission trajectory may be far more likely to be female or engage with treatment more consistently than those in another class. As such, LGM provides much more information in understanding how large groups of people respond to a specific treatment delivery than simply looking at pre- and post-assessment scores for the entire sample. Covariates of interest in this study included age, education, gender, weeks in treatment, words per week for the therapist and words per week for patients. A more technical description of each step of the statistical procedure is provided in the next section.
Technical specifications of the LGM
Prior to the analyses, missing values for variables with ~ 40% or less missingness  were iteratively imputed by random forests (500 trees, 10 iterations), using the R package missForest . Examined predictors were imputed while masking clinical and outcome variables, to prevent information leakage. All LGM models were estimated under missing at random assumptions using maximum likelihood estimation. Sensitivity analysis to assess the relation between missing data in symptoms measures and therapists’ characteristics are reported in the supplementary materials.
To concurrently capture changes in both anxiety and depression outcomes, the LGM modeled concurrent changes of PHQ-9 and GAD-7 scores as parallel processing . Specifically, two sets of distinct intercept, slope, and quadratic growth parameters were assigned to each symptoms measure, estimating separate trajectories of anxiety and depression over five assessments (weeks: 0, 3, 6, 9, and 12). The patients’ classes were then determined based on joint patterns of PHQ-9 and GAD-7 scores growth. The optimal number of classes was determined comparing nested unconditional LGM with increasing numbers of classes. Variance of the growth parameters was fixed to zero, to increase delineation of classes. Examined model fit indices included Bayesian Information Criterion (BIC), sample-size adjusted Bayesian Information Criterion (SSBIC), Akaike Information Criterion (AIC), relative Entropy, Lo–Mendell–Rubin–adjusted likelihood ratio test (L-M-R LRT), and bootstrapped likelihood ratio test (BLRT). The best fitting solution was estimated based on model fit indices, as well as explanatory properties of the solution [38, 39].
After determining the solution with the best relative fit, demographic variables, weeks before treatment dropout (or completion), and therapists’ characteristics were nested as covariates in a conditional LGM, to analyze class membership predictions. Categorical data was subsequently converted into binary variables from modal values. Auxiliary 3-step method multinomial logistic analyses for latent class predictors  were then performed on the conditional model. This approach to latent class logistic regression analyses takes into account measurement error in the most likely class attributions, to estimate the predictive role of quantitative treatment delivery characteristics (i.e., the average number of words per week used by therapists and clients over the course of treatment) in determining group membership. Word counts were log-transformed to improve odds ratio dose-response interpretability.