This report explains how we determined the cost-effectiveness of group or task-shifted psychotherapy in low- and middle-income countries using subjective wellbeing and affective mental health.

November 2024: Update to our analysis

We have made a substantial update to our psychotherapy report. This 83 page report (+ appendix) details new methods as well as an updated cost-effectiveness analysis of StrongMinds and Friendship Bench. We now estimate StrongMinds to create 40 WELLBYs per $1,000 donated (or 5.3 times more cost-effective than GiveDirectly cash transfers). We now estimate Friendship Bench to create 49 WELLBYs per $1,000 donated (or 6.4 times more cost-effective than GiveDirectly cash transfers).

See our changelog for previous updates.

Summary

We estimate that psychotherapy delivered by lay-people or to groups in low-income countries (LICs) improves affective mental health by 4.3 SDs per $1,000 spent, which is 12 times (95% CI: 4, 27) more cost-effective than monthly cash transfers. The effects are in terms of reducing recipients’ self-reported measures of affective mental health (anxiety and depression) and the costs are the costs it takes an organization to treat a person. 

This report is part of our work evaluating the expected and potential cost-effectiveness of interventions. We are currently focussed on studying micro-interventions in low- and middle-income countries. To find out more about the wider project, see Area 2.3 of our Research Agenda and Context

1. What is the problem?

Depression is a substantial source of suffering worldwide. It makes up 1.84% of the global burden of disease according to the IHME, similar to Malaria (1.83%) (GBD, 2019.). This is likely an underestimate for three reasons. First, disability-adjusted life-years (DALYs) do not account for deaths caused by mental health disorders.69 Second, stigma surrounding mental health issues is widespread in low- and middle-income countries70(LMICs), so its prevalence is likely underreported. And third, mental health appears relatively more important in terms of subjective well-being (SWB) than when using DALYs (Happiness Research Institute, 2020). We discuss this in more depth in our report on global mental health (HLI, 2021a).

The treatment of depression, as is true with most mental health problems, is neglected relative to other health interventions in LMICs. Governments and international aid spending on mental health represents less than 1% of the total spending on health in low-income countries (Ridley et al., 2020; Liese et al., 2019).  

2. What can be done?

Fortunately, depression is a tractable problem. Psychotherapy is a common treatment for depression (Cuijpers et al., 2020; Kappelmann et al., 2020). It also works for several other mental health disorders, including anxiety (Bandelow et al., 2017) and bipolar disorder (Chiang et al., 2017). Psychotherapy is also, surprisingly, an effective treatment for chronic pain, which is also a substantial source of disability (Majeed et al., 2018). 

Another common treatment for depression is pharmacotherapy (medications such as antidepressants). Psychotherapy has some potential advantages over drug treatments for depression and anxiety, although we do not consider the case for drug treatment in detail here.71The benefits of psychotherapy may outlast common drug treatments (Cuijpers et al., 2013; Biesheuvel-Leliefeld et al., 2015). While psychotherapy is often provided by highly-trained professionals, research has shown that it can be delivered by non-specialists at a lower cost72 (Chowdhary et al., 2020; Purgato et al., 2020). Unfortunately, we did not find any existing research that clarifies how much task-shifting lowers the cost of delivering psychotherapy and how much of its effectiveness it retains. 

We found few estimates of the cost-effectiveness of psychotherapy in LMICs (see MH report section 5.5), but previous work suggests that it could be a cost-effective intervention (Plant, 2016; 2018; Elizabeth, 2017; Founders Pledge, 2019). 

These factors motivate this report on psychotherapy as an intervention to improve well-being. A more general consideration is that, since we’re looking to improve happiness, looking at interventions that directly target negative mental states (such as those caused by depression), seems promising. 

2.1 Scope of psychotherapy for this report

For the reasons listed in the previous section and elaborated on further in this section, we narrowed the scope of this review to a smaller scope than psychotherapy in general. We focus our analysis on the average intervention-level cost-effectiveness of any form of face-to-face psychotherapy delivered to groups or by non-specialists deployed in LMICs. We measure the effect of psychotherapies as the benefit they provide to subjective well-being (SWB) or affective mental health (MHa). Next, we elaborate on what we mean by each of these criteria. 

This analysis is on the intervention level, which is more granular than our cause area report on mental health (HLI, 2021a) but broader than an analysis of an organisation that implements an intervention, like our review of StrongMinds (HLI, 2021b).  

Psychotherapy is a relatively broad class of interventions delivered by a trained individual who  intends to directly and primarily benefit their patient’s mental health (the “therapy” part) through discussion (the “psych” part). Psychotherapies vary considerably in the strategies they employ to improve mental health, but some common types of psychotherapy are psychodynamic (i.e. Freudian or Jungian), cognitive behavioral therapy (CBT), and interpersonal therapy (IPT).73That being said, different forms of psychotherapy share many of the same strategies.74 We do not focus on a particular form of psychotherapy. Previous meta-analyses find mixed evidence supporting the superiority of any one form of psychotherapy for treating depression (Cuijpers et al., 2019). 

We did not consider remote modes of psychotherapy (delivered digitally rather than face-to-face. Delivering psychotherapy remotely is plausibly cheaper than doing so in-person.75However, we postpone looking at remote therapies because the evidence base is currently small in LMICs (c.f., Fu et al., 2020).76 

There is some evidence from HICs  (Barkowski et al., 2020) and LMICs (Cuijpers et al., 2019) to support the notion that groupdelivered formats are at least as effective as individual formats of psychotherapy. We have no similarly direct comparisons77between non-specialist78 and specialist-delivered psychotherapies, but we do have evidence that non-specialist psychotherapies are effective at treating depression and anxiety in LMICs (Purgato et al., 2018a; Singla et al., 2017; Vally & Abrahams, 2016).

If we assume that non-specialist delivery or group-delivered formats are only marginally less effective than one-on-one modes of therapy provided via a specialist, then the reduction in costs should more than make up for a loss in efficacy. When we asked several experts if this intuition seemed right, they agreed, with some caveats79(Crick Lund (personal communication; 2021); Akash Wasil (pers. comm., 2021)).

We restrict our attention to LMICs for two main reasons. First, we expect the cost of hiring someone to deliver face-to-face modes of psychotherapy to be substantially cheaper, particularly if the task of delivery is shifted to someone with less formal training. Second, we think that it’s much less likely that someone treated in LMICs would have an alternative form of treatment. 

Finally, we seek to measure the impact of any intervention in terms of subjective well-being or affective mental health. We define subjective well-being as how someone feels or thinks about their life broadly. We further describe what we mean by subjective well-being, and explain why we believe they are the best measures of well-being here.

If no measure of SWB was available (as was the case for this review), we consider self-reports of affective80 mental health conditions (anxiety, depression, or distress) as acceptable proxies. We think this is reasonable because they contain many questions relating to SWB. For example, measures of depression capture people’s moods and thoughts about their lives, but also ask questions about how well an individual functions.81 The issue of whether it’s reasonable to treat these measures as comparable is discussed further in Appendix A. 

2.2 What does psychotherapy look like in practice?

We describe how two types of psychotherapy, Problem Management Plus and Interpersonal Group Therapy are practiced.  

In Problem Management Plus participants meet with a lay mental health worker for 90 minutes a week for five weeks. Each week is dedicated to discussing a different subject. In the first session they practice deep breathing exercises. The second focuses on creating a detailed plan for how to do more activities the participant enjoys. In the third session, the mental health worker helps them identify which problems are solvable and brainstorm solutions. In the fourth session they identify which friends and family are supportive and propose some steps for strengthening those bonds. In the final session, they review past sessions (Dawson et al., 2015). 

StrongMinds deploys Interpersonal Group Therapy over 12 weeks in roughly 90-minute sessions. The 12 weeks are broken into three phases. Across all phases members support one another, discuss their depressive symptoms, their triggers and practice coping strategies. In the first phase the facilitator focuses on building bonds, trust, and rapport amongst the group members. In the second phase they focus on discussing the problems that cause depressive episodes. In the third phase they focus on identifying the triggers of their depression and practicing how they will respond to such triggers. 

3. How effective is task-shifted psychotherapy in LMICs and what does it cost?

The following sections discuss our synthesis of the literature on the effectiveness and cost of psychotherapy. First, we discuss how we collected our data, then we summarize the methods we used for analyzing that data and present the results we found. We then use the results to estimate the total effect of psychotherapy, which we discount based on an assessment of the risk of bias in the sample of studies. We conclude by estimating a range for the cost of treating an additional person with psychotherapy, which allows us to contextualize our estimates by comparing the cost-effectiveness of psychotherapy to cash transfers, which we summarize in section 6.

3.1 Evidence of psychotherapy in LMICs

We extracted data from 39 studies that appeared to be delivered by non-specialists and/or to groups from five meta-analytic sources,82and any additional studies we found in our search for the costs of psychotherapy. The total sample size was 29,643 individuals. These studies are not exhaustive.83We stopped collecting new studies due to time constraints and the perception of diminishing returns.84The studies we include are presented in Appendix B.

We aimed to include all RCTs of psychotherapy with outcome measures of SWB or MHa but only found studies with measures of MHa. We present some summary statistics of the sample of studies in Figure 1, which we then subsequently elaborate on. 

These summary statistics convey a few important points. There are only two follow-ups two years after treatment has ended: Tripathy et al., (2010) and Baranov et al., (2020).  Sample size follows a similar skewed distribution as follow-delay where most studies have relatively modest sample sizes (under 500) but a few have quite large samples such as Tripathy et al., (2010; n =12,431). 

Most forms of group psychotherapy in our sample are delivered by non-specialists. We defined a non-specialist as anyone who had not received a degree or formal training lasting more than a year to treat mental health problems. Similarly, most studies make high use of psychotherapy. We classified a study as making high (low) use of psychological elements if it appeared that psychotherapy was (not) the primary means of relieving distress, or if relieving distress was not the primary aim of the intervention. For instance, we assigned Tripathy et al., (2010) as making low use of psychotherapy because their intervention was primarily targeted at reducing maternal and child mortality through group discussions of general  health problems but still contained elements of talk therapy. We classified “use of psychotherapy” as medium if an intervention was primarily but not exclusively psychotherapy. 

Figure 1: Summary statistics of key variables in the sample

Graphs showing psychotherapy outcomes, sample sizes, expertise, and depression data.

Note: The total count is above 39 because some studies have contained multiple observations for different follow-ups (which themselves often differ in sample size). In the second panel, we remove the largest studies (Patel et al., 2010, n = 1,961; Tripathy et al., 2020, n = 12,431) to allow for a clearer visualization of the distribution of sample sizes. A study was classified as focusing on women if women made up most of the sample. We define expertise and use of psychological elements in the following text.  

The intensity or dosage of most psychological interventions was ‘low’, by which we mean it involved ten hours or less of total time spent in sessions of therapy. About an equal number of studies focused primarily on women or girls as they did the general population. Finally, as the distribution of effectiveness should convey, nearly all studies find that psychotherapy has a positive impact on affective mental health. We measure the effect using Cohen’s d standardized mean difference, which is interpreted as the improvement in standard deviations of MHa.

4. Estimating the effect of psychotherapy

We start in section 4.1 by discussing the regressions we ran on our sample of RCTs  to estimate the effects at post-treatment and how long they persist. This allows us to calculate, in section 4.2, the total effect of psychotherapy on an average member of the treated population. After we estimate total effects on the individual, we consider in section 4.3 any effects that psychotherapy has on the recipient’s household and community. We conclude with section 4.4 where we discount the estimated total effect according to our assessment of the evidence’s relative bias compared to the evidence base collected for cash transfers.

4.1 Effects of psychotherapy at post-treatment and change through time

To arrive at the total individual effects, we need to estimate two parameters: the effect post-intervention, and how this changes over time. Combining these two parameters generates a curve of the estimated benefits over time. The total benefit is the area under the curve from the time the treatment ends to until the effects become zero (or, very close to zero, as a curve that asymptotes to zero never reaches it). We illustrate the total benefit in Figure 2 below.

Figure 2: Total benefit of psychotherapy

Graph showing wellbeing decline and intervention impact over time in psychotherapy analysis.

To estimate the effect of psychotherapy at post-treatment and its rate of decay (or growth) we perform several regressions on the sample of studies we collect (i.e., meta-regressions). In these meta-regressions, we explain variation in the effect sizes with variation in characteristics of the studies. Our focus is first on the relationship between “years since therapy ended” and the effect size, to capture the decay or growth of the effects of psychotherapy through time. We estimate this using two models:  linear decay in equation (1) and an exponential decay in equation (2). 

In equation (1), the total effects are estimated by assuming that the effects do not become negative but stop at zero. The total effect is then the area of the triangle, ½bh =  * ||. In equation (2), the total effect is calculated by integrating the exponentiated right-hand side of the equation: 

Where the effect at post-treatment,  changes at a rate of  for  years. 

We expect that the effects of psychotherapy will decay through time, which will reflect a negative coefficient on the “time since therapy ended” term in both equation (1) and equation (2). This expectation is based on the high occurrence of relapse after treatment with common forms of psychotherapy (Wojnarowski et al., 2019). We rarely see change through time estimated in individual studies or meta-analyses since follow-ups longer than two years are rare (Steinert et al., 2014). An exception for long-term follow-ups is Wiles et al., (2016) which found a lasting effect of CBT 40 months after psychotherapy ended (n = 248). 

Several meta-analyses purport to find persistent effects of psychotherapy (Rith-Najarian et al., 2019; Bandelow et al., 2018) but they do not compare effects to a control and contain few follow-ups beyond a year (n ≈ 4). In van Dis et al. (2019), which uses the appropriate effect size drawn from comparing treatment to a control condition, they find that the effects of CBT do eventually decay for treating anxiety, but we cannot calculate the decay rate because they use broad and inconsistent categories to measure the time since treatment ended (all studies with follow-ups greater than a year are aggregated). Karyotaki et al., (2016) finds a decline in the effects of psychotherapy on depression, implying that the benefit would disappear within 18 months. 

We prefer an exponential model because it fits our data better (it has a higher )  and it matches the pattern86In Figure 3, we represent the extended trend through time with a black line. We give specific details on the intercept and slope of this line in Table 1. Recall that the total benefit will be the area under the black line. 

An additional goal of the meta-regressions is to find features of a study that plausibly change its cost-effectiveness. These features are: whether the psychotherapy is delivered to a group, the duration of the therapy, the expertise of its deliverer, and time spent in therapy. 

Figure 3: Effects of psychotherapy and time of follow-up

Line graph showing the decline in standardized mean difference of psychotherapy effectiveness over y.

Note: Points reflect estimated effects reported in individual studies. Lines connect studies with multiple follow-ups. Larger points and lines reflect larger sample sizes. Effects appear larger immediately after treatment has ended, then decline rapidly, then appear to decline more slowly.   

4.2 Key results from meta-regression analysis

Effect at post-treatment and change through time

In Table 1, we display the estimated post-treatment effects and how long they last for the average psychotherapy intervention in our sample. The post-treatment effects are estimated to be between 0.342 and 0.611 SDs of MHa. We discuss some possible explanations for why the effects appear lower than other meta-analyses found in the last part of this section.

Table 1: Post-treatment effect and decay through time

 Effect in SDs of depression improvement
 Model 1 (linear)Model 2 (exponential)
Effect at post-treatment
(in SDs of depression improved)
0.5740.457
95% CI  (0.434, 0.714)(0.342, 0.611)
Annual decay of benefits
(SDs lost in mod.1, percent kept in 2)
-0.10471.5%
95% CI(-0.197, -0.010)  (53.0%, 96.5%)
Total effect at 5.5 yrs
(end of linear model effects)
1.591.56
Total effect at 10 yrs1.591.78
Total effect at 30 yrs1.591.85

Note: The decay through time for model (2) can be thought of as “benefits retained per year”, such that if the benefits were 1 SD in year one they’d be 0.713 in year two. The coefficients in model (2) are exponentiated for ease of interpretation.  The total effect refers to the total effect the model estimates the recipient will accumulate by the year given. *Since the linear model predicts that the benefits end in 5.5, the effects do not grow after that time. 

The effects appear to decay in both models.87 In the linear model, this is given by a significant negative coefficient that indicates the effect will diminish by -0.1 SDs  per year. In the exponential model, the decay coefficient indicates that 0.72% of benefits will be retained each year (i.e., the benefits will decay by 28% each year). 

A cost-effectiveness analysis of psychotherapy conducted by Founders Pledge also specifies that the effects decay exponentially. Specifically, they cite Reay et al. (2012) which estimated on a small sample (n = 50) that interpersonal therapy had a half-life of about 2 years, or it decayed about 30% annually. `Our model predicts a very similar decay rate of 28%. However, it is possible that this dropoff in effects is overstated if studies with short follow-ups are likelier to have larger effects.88 

An alternative way to estimate the effects through time is to select the subset of studies that have multiple follow-ups and take the average within-study estimate of effects through time. If we estimate this by adding study level fixed effects, we surprisingly get very similar estimated decay rates as those we display in Table 2. 

We also considered estimating the decay rate of psychotherapy studies we find in high income countries (HICs) and incorporating that evidence into our analysis. However, we refrain from performing that analysis because we are in contact with academics who plan to share with us a comprehensive dataset of psychotherapy studies in high income countries that includes detailed follow-up information for the studies. Once we receive this data, we will perform a much more robust analysis of the decay rate of psychotherapy in HICs and update our analysis at that time. 

Estimating the total individual effects using the meta-regression results

We’ve discussed the post-treatment effects and annual decay effects. Now we discuss in more detail how we arrive at the total effects.

The estimated total effect given by the linear decay model is 1.6 SD-years. We arrive at this estimate quite simply. First, we assume that once the effects diminish to zero, the decay stops. Then we apply the formula for the area of a triangle (½bh). We’ve been given the height (effect at post-treatment). To solve for the base (duration), we divide the post-treatment effects by the decay rate (0.57 / 0.104 =  5.5 years). The total effect is then 0.5 * 5.5 * 0.57 = 1.6 SD-years. 

Finding the total effects of the exponential model is more involved. To find it, we integrate over the function estimated by the regression for a period starting at post-intervention and ending in 5.5, 10, and 30 years. The results for both models are shown in the foot of Table 1 for 5.5, 10, and 30 years after treatment has ended. The differences in the time we assume the effectiveness of psychotherapy persists do make a difference to the expected total effect but they are not large (25 additional years only adds 0.29 SDs). This is because, by the fourth year, the effects have shrunk to less than 0.1 SDs (and by year 10 they are 0.01 SDs).

Are our estimates sensitive to outliers? Baranov et al., (2020) has an unusually long follow-up. If we exclude it from our analysis the estimated total effect is reduced by around half for both models. While we think it is generally unwise to put too much weight on any particular study, we think Baranov et al., (2020) is a higher quality than most others we use.89The authors were careful to subject their analysis to a variety of robustness checks. Their sample does experience sizeable attrition of around 30% of their sample over seven years but they argue convincingly that this does not bias their estimates.90 Considering these factors, we kept Baranov et al., (2020) in our sample. 

What is the influence of delivery mechanism and dosage on psychotherapy’s effectiveness?

The format of the psychotherapy, the expertise of its deliverers, and the duration of the psychotherapy all very plausibly affect the cost. Therefore, we check whether those factors also influence psychotherapy’s effectiveness. 

We run five regressions with variables to indicate whether the psychotherapy was delivered to individuals instead of groups (model 1 & 1.5), by experts (model 2), and how many hours of therapy were involved (model 3). In model 4 we include all of these variables. We show the results of these regressions in Table 2 below, which contains linear specifications of the additional variables discussed.  

Table 2: Impact of group, expertise, and dosage on the effectiveness

 Effect in SDs of depression improvement
LinearModel 1Model 1.5Model 2Model 3Model 4
Intercept0.787 ***0.863 ***0.541 ***0.389 ***0.580 ***
 (0.127)(0.130)(0.073)(0.105)(0.153)
Follow-up delay (yrs)-0.103 *-0.264 ***-0.103 *-0.104 *-0.104 *
 (0.047)(0.059)(0.047)(0.047)(0.048)
Individual Format-0.359 *-0.460 **  -0.261 +
 (0.139)(0.140)  (0.152)
Individual * Time 0.203 **   
  (0.059)   
Specialist delivered  0.343 + 0.174
   (0.188) (0.220)
Total hours of therapy   0.019 +0.014
    (0.011)(0.009)
Number of studies3939393939
Number of outcomes6161616161

Note: These models are linear specifications for ease of interpretation.

In our sample we find evidence that group psychotherapy is more effective than psychotherapy delivered to individuals (by 0.34, 0.46 and 0.26 SDs in models 1, 1.5, and 4). This is in line with other meta-analyses of psychotherapy’s effects on depression (Barkowski et al., 2020 ; Cuijpers et al., 2019). One explanation for the superiority is that the peer relationships formed in a group provide an additional source of value beyond the patient-therapist relationship.     

More specialized deliverers and more time undergoing therapy is associated with a positive but weakly significant relationship to the effectiveness of psychotherapy. These coefficients are relatively large in magnitude. Taking the estimates of model (4) at face value, ten more hours of therapy (which would double the average time spent in therapy) would improve depression by 0.14 SDs. Having a specialist deliver psychotherapy could increase its effectiveness by 0.17 SDs. This gives us some evidence to indicate that psychotherapy interventions that are task-shifted or delivered more briefly will be somewhat less effective. However, as we explain in sections 5 and 6, we think that the drop in cost more than makes up for the loss in efficacy.  

4.3 Household and community spillovers

We’ve described our estimates for the total effect on the individual recipient, but we also care about its consequences for the recipient’s household and community. In other words, we care about the spillovers on the people that the recipient lives with. Unfortunately, spillovers are rarely studied for mental health interventions in general (Desrosiers et al., 2020) nor measured by MHa or SWB in particular.  

The only empirical information we have on psychotherapy’s spillovers on the community comes from Haushofer et al., (2020) and Barker et al., (2021). They found no significant community spillover effect in terms of SWB or MHa.91 

In a simulation (using Guesstimate), we performed a ‘back of the envelope’ calculation where we made the following assumptions: To estimate the impact of community spillovers we assumed that there were between 1 and 10 non-recipients in the community for every direct recipient. Second, we assume that the spillover effects lasted between 1 and 6  years. Given these assumptions, the negative community spillover effect would not decrease the total effect by much (-0.11 SDs, 95% CI: -0.067, 0.61).

We expect that receiving psychotherapy will benefit the recipient’s household. Any intervention that makes someone happier should make their close connections happier too. We expect this to work through pure emotional spillovers, which some studies find evidence for in longitudinal studies that take place in high income countries (Fowler & Christakis 2008; Rosenquist et al., 2011).  Note that these benefits should be the case for all interventions that increase wellbeing and not just psychotherapy. It also seems plausible that as better MHa leads to increased productivity of the recipient (Angelucci & Bennet, 2021), which in turn benefits the recipient’s household. We found a single study that captures the spillover effects of psychotherapy on the recipients’ household.92 

In a non-randomized controlled trial Mutamba et al., (2018) found that treating adult caregivers of children affected by nodding syndrome with group psychotherapy has an effect on the parents of 0.80 then 0.46 SDs of depression at 1 and 6 months post-treatment. For the children the effects were also high at 0.57 then 0.46 SDs of depression (Cohen’s d). If we assume the effects end at six months then the children received 77% as much benefit as their parents or grandparents.

In a simulation (using Guesstimate), we performed a ‘back of the envelope’ calculation where we made two assumptions. First, we assumed that the ratio of benefits to the household were between 15% and 95% the impact received by the direct recipient. Second, we assumed that the household size was four. Under these assumptions, including the household effect would approximately double the total effect from 1.6 to 3 SDs (95% CI:  0.57 to 8.1). This appears to be a sizable increase. But what is important here for the sake of the comparison is whether the factor by which household spillovers increases the total effect differs across interventions.93 

We have not incorporated an estimate of spillovers into our comparison between cash transfers (CTs) and psychotherapy. However, our analysis does not seem very sensitive to community spillovers. We do not think that adding community spillovers would change the magnitude of between-intervention differences in cost-effectiveness. Household spillovers appear to be highly influential. We do not include them because of the large uncertainty about the relative magnitude of household spillovers across interventions.

4.4 Biases and discounts based on the quality of the evidence

We previously discussed how we estimated the two parameters we need to calculate the total effects through time. But before we compare the total effect of psychotherapy to cash transfers, we adjust for the risk of bias present in psychotherapy’s evidence base relative to the evidence base of cash transfers, which we judge to be of a slightly higher quality. We estimate that the evidence base for psychotherapy overestimates its efficacy relative to cash transfers by 11% (0% – 40%) because psychotherapy has lower sample sizes on average and fewer unpublished studies, both of which are related to larger effect sizes in meta-analyses (MetaPsy, 2020; Vivalt, 2020, Dechartres et al., 2018 ;Slavin et al., 2016). Our specific calculations can be viewed in Tables A.2 and A.3 in Appendix C. We do not consider ‘social desirability bias’ amongst our concerns, we explain why next. We explain our general process in Appendix C. 

Does ‘social desirability bias’ pose a particular problem for psychotherapy? 

One further concern you may have is whether there is an ‘social desirability bias’ for this intervention, where recipients artificially inflate their answers because they think this is what the experimenters want to hear. In conversations with GiveWell staff, this has been raised as a serious worry that applies particularly to mental health interventions and raises doubts about their efficacy. 

As far as we can tell, this is not a problem. Haushofer et al., (2020), a trial of both psychotherapy and cash transfers in a LMIC, perform a test ‘experimenter demand effect’, where they explicitly state to the participants whether they expect the research to have a positive or negative effect on the outcome in question. We take it this would generate the maximum effect, as participants would know (rather than have to guess) what the experimenter would like to hear. Haushofer et al., (2020), found no impact of explicitly stating that they expected the intervention to increase (or decrease) self-reports of depression. The results were non-significant and close to zero (n = 1,545). We take this research to suggest social desirability bias is not a major issue with psychotherapy. Moreover, it’s unclear why, if there were a social desirability bias, it would be proportionally more acute for psychotherapy than other interventions. Further tests of experimenter demand effects would be welcome. 

Other less relevant evidence of experimenter demand effects finds that it results in effects that are small or close to zero. Bandiera et al., (n =5966; 2020) studied a trial that attempted to improve the human capital of women in Uganda. They found that experimenter demand effects were close to zero. In an online experiment Mummolo & Peterson, (2019) found that “Even financial incentives to respond in line with researcher expectations fail to consistently induce demand effects.” Finally, in de Quidt et al., (2018) while they find experimenter demand effects they conclude by saying “Across eleven canonical experimental tasks we … find modest responses to demand manipulations that explicitly signal the researcher’s hypothesis… We argue that these treatments reasonably bound the magnitude of demand in typical experiments, so our … findings give cause for optimism.”

5. Cost of delivering psychotherapy to an additional person

Up until this point, we have focused on explaining how we estimated the effectiveness of psychotherapy. Next, we turn our attention to its cost. We define cost to be the average cost to the organisation of treating an individual, this means the total cost the organization incurs divided by the total number treated. 

We organised the cost information we came across in this spreadsheet. We reviewed 28 sources that estimated the cost of psychotherapy and included 11 in our summary of the costs of delivering psychotherapy. Nearly all are from academic studies except the cost figures for StrongMinds. 

Figure 3: Distribution of the cost of psychotherapy interventions

Histogram showing the distribution of psychotherapy cost-effectiveness in USD MER.

Unfortunately, it appears that cost figures reported in six out of ten academic studies are incomplete. They reported the variable cost but neglected to incorporate overhead costs. We impute the complete cost for the studies that present only variable costs (i.e., don’t include overhead expenses) by multiplying it with the ratio of the complete to variable cost for studies which provide both. In this case, the complete cost is on average 2.5 times larger than the variable cost.94 

As can be seen in Table 3 below, the variable costs range from $35 to $288, but we specify the average cost of treating an additional person with lay-delivered psychotherapy to range from $50 to $659, the second highest figure. We take the average treatment cost given in Haushofer et al., (2020), $1,189 as an outlier. The authors report the total amount they paid the NGO to deliver psychotherapy, not how much it cost the NGO to deliver psychotherapy (p. 32, Haushofer et al., 2020). This detail could make this figure less comparable to other sources of cost information if their grant to the NGO greatly exceeded the actual implementation cost. 

Table 3: Cost of psychotherapy

 Variable costComplete cost
Average cost$135.74$359.29
SD$95.10$302.43
Range lower$35.00$50.48
Range upper$288.27$1,189.00

 

6. Cost-effectiveness analysis

6.1 Monte Carlo-based cost-effectiveness analysis using Guesstimate

In previous sections, we’ve described the process for arriving at the estimates for the effects of psychotherapy on the individual, how we discount it, and how we arrived at the estimated cost of treating an additional person with psychotherapy. Next, we place these estimates in context by calculating the cost-effectiveness of psychotherapy and comparing it to our benchmark intervention, cash transfers (HLI, 2021c). 

We estimate the cost-effectiveness using a Monte Carlo simulation where we assume all variables are drawn from a normal distribution.95The cost-effectiveness is given by taking the expected value of the total beneficial effect, Td and dividing it by the cost, Cpp of delivering psychotherapy to an additional person or: .. We summarize the inputs to our simulation in Table 4.

Table 4: Guesstimate model explained

VariableEstimateLower 95% CIUpper 95% CISourceSensitivity (R2 of CE)Explanation

Effect at

t = 0.

0.480.350.64Meta- regression1%This is the intercept of model (2). We assume the meta-regression does a reasonable job at estimating the post-intervention effects. 
Duration6.6410Subjective judgement2%This is a key subjective input of when we want the integral to end. Given the studies we’ve seen we’d be surprised if the effects did not dissipate within a 4 to 10 year window. However, we think there is a small but real chance the effects last longer (up to 15 years). Two studies with 14 and 15 year follow-ups find the effects of drug prevention and a social development intervention have effects of 0.13 and 0.27 SDs on adult mental health service use and likelihood of a clinical disorder (Riggs and Pentz, 2009; Hawkins et al., 2009).
Yearly decay0.730.5300.980Meta- regression10%This is a parameter we took from the decay model to take the integral. It’s close to that used by Founders Pledge in their CEA of psychotherapy (2019).
Discount for study quality0.890.71.1Subjective judgement17%We estimate, based on several characteristics of studies related to bias, that a naive analysis of the evidence base of psychotherapy would overestimate effictiveness 16% relative to cash transfers. Note that this tool is still under construction.
Total effect1.30.714.7Calculation30%This is the total effect on the individual recipient and equal to the definite integral96(from time = 0 to duration) of equation 2. 
Cost$36030610Subjective judgement8%While the cost of implementing therapy can go up to $1,000 per person (e.g., Haushofer et al., 2020) we expect that figure to be inflated by unusually high startup costs. StrongMinds did a survey of 22 NGOs treating depression and found that the reported cost per person ranged from $3-$200 dollars, but we expect these are underestimates because NGOs are likely under pressure to report low costs. 

From the simulation, we arrive at the estimate of the total effect as 1.6 SDs of improvement in MHa (95% CI: 0.68, 3.6) and an estimated cost per person treated of $360 (95% CI: 30, 610). This results in a beneficial change of 4.3 SDs (95% CI: 1.1, 24) in MHa scores improved per $1,000. Note that this figure represents the cost-effectiveness of a hypothetical average programme, rather than of any actual, existing programme. 

In a separate report, we calculate the cost-effectiveness of StrongMinds, a particularly efficient organisation providing such an intervention (HLI, 2021b). The point of assessing costs and effects for many programmes of a certain intervention is to both estimate the expected (or average) cost-effectiveness of an intervention, and also assess the possible (upper bound) of cost-effectiveness. 

6.2 Sensitivity

In Table 4, the “sensitivity” column describes how much variation (ranging from 0% to 100%) in the cost-effectiveness each input explains. The intuition here is that the more that variation in an input variable relates to the variation of an output variable, the more sensitive the output is to the input. 

However, the sensitivity given by Guesstimate (in terms of the  of an input for explaining the cost-effectiveness) appears unreliable. That is, Guesstimate does not give consistent sensitivity scores across simulation runs. So we take these figures as a rough ranking of variables according to their sensitivity. In future versions we will perform the sensitivity analysis in R. 

The cost-effectiveness of psychotherapy is relatively more sensitive to the total individual effects than the cost. Further, the estimate of the total individual effect is most sensitive to the discount we apply to the quality of evidence. The total individual effect is next most sensitive to the estimated decay over time. The decay over time can be estimated more precisely with the inclusion of more studies. We may be able to estimate the discount more precisely too. With more time we can improve the decision tool we pilot to adjust the effect for bias (discussed in Appendix C). 

6.3 Comparison of psychotherapy to monthly cash transfers

We pull the estimated effect of sending $1,000 in monthly CTs from Table 4 of the cost-effectiveness analysis of CTs (HLI, 2021c). We lay out the estimates side-by-side in Table 5.

Table 5: Comparison of monthly cash transfers to psychotherapy in LMICs

 Total benefit for the individual (in SDs)
Monthly cash transfersPsychotherapy
SWB & MHa0.50
(0.22, 0.92)
1.60
(0.68, 3.60)
Cost$1,277
($1,109, $1,440)
$360
($30, $631)
Cost-effectiveness per $1,000 USD spent0.40 SDs
(0.17, 0.75)
4.30 SDs
(1.1, 24)

Note: 95% CIs are presented in parenthesis below the estimate.

We expect psychotherapy to be around 12 times more cost-effective than cash transfers for the recipient (95% CI: 4, 27). However, we do not include the effects on the household or the community in our comparison. The household spillovers are unclear because of the lack of evidence, as explained in section 4.3. We visually show the simulated differences between psychotherapy and CTs in Figure 4 below. Each point is a single run of the simulation for the intervention. Lines with a steeper slope reflect a higher cost-effectiveness in terms of MHa improvement. The bold lines reflect the interventions’ cost-effectiveness and the grey lines are for reference. 

Figure 5: Comparison of cost-effectiveness between psychotherapy and monthly CTs

Psychotherapy cost effectiveness analysis

7. Discussion

7.1 Crucial considerations, limitations, and concerns

A potential issue with using SD changes is that the mental health (MH) scores for recipients of  different programmes might have different size standard deviations – e.g. SD could be 15 for cash transfers and 20 for psychotherapy, on a given mental health scale.  We currently do not have much evidence on this. If we had more time we would test and adjust for any bias stemming from differences in variances of psychological distress between intervention samples by comparing the average SD for equivalent measures across intervention samples.97

There may be issues with assuming that a unit improvement in depression scores is equivalent to the same unit increase in a subjective well-being measure (such as happiness or life satisfaction questionnaires). We discuss this issue in Appendix A. We think that this is a source of uncertainty that further research should work to reduce. 

Cost data is sparse for psychotherapy. Studies that report costs often make it unclear what their cost encompasses, which makes the comparison of costs across studies more uncertain. However, this uncertainty matters less when estimating the cost-effectiveness of a specific program, as long as we think the range of costs we specify are reasonable (which we do).  If we had more time we would search or request cost information from more NGOs treating psychotherapy.

Data on the long-term effects of psychotherapy, i.e. beyond 2 years, is also very sparse. As noted, the two longest follow-ups are 2.5 and 7 years after the intervention ended. Given we need to know the total effect over time, this means a key parameter – duration – is estimated with relatively little information. The situation here is worse than for cash transfers, where there are a number of studies with reports from 2 years or more after the intervention.

We do not incorporate spillover effects of psychotherapy into our main analysis. This is an important limitation of the current report that we hope to address after gathering more evidence.   

The populations studied in the RCTs we synthesize vary considerably. It’s possible that there is considerable heterogeneity within populations where the dynamics of psychotherapy remain unexplored. For instance, maybe the benefits of psychotherapy persist much longer for youth because they are more open to changing their habits of thought. However, we believe that we’ve accounted for the most important sources of heterogeneity when controlling for the format, dosage, and specialization of the deliverer. This is also a general concern with reviewing any intervention. 

7.2 Research questions raised by this work

Our work on this psychotherapy report raised some research questions we think are worth answering, beyond the previous ways we’ve mentioned to improve our analysis. Answering these questions could entail larger projects that we do not plan to pursue in the near future. We order these questions in terms of most to least perceived priority. 

What are the spillover effects of psychotherapy on the household? This is an important question because beneficial household spillovers could substantially change the total effect of psychotherapy. These could be found by pursuing original research that treats one household member but surveys the SWB and MHa of all household members. Potentially, researchers have already collected household MHa or SWB information in psychotherapy interventions, but have not used it to estimate household spillover effects. Finding this data could allow for the estimation of household spillover effects. 

What would additional tests of experimenter demand effects in psychotherapy (or any intervention) reveal? Many are concerned about social desirability as a source of bias in psychotherapy research, but little work has been done to see how much of an impact it has. We think more work in the vein of Haushofer et al., (2020) would be helpful at reducing our uncertainty about the potential for bias stemming from social desirability. 

What is the cost-effectiveness of treating depression with antidepressants in LMICs? Existing evidence appears sparse on both the effectiveness of pharmacotherapy to treat anxiety or depression and  on the cost of delivering such treatment. We think that further primary or secondary research on this topic would be valuable. 

Conclusion

This report is the first attempt we are aware of to synthesize the existing literature on psychotherapy interventions in LMICs to determine their cost-effectiveness. Specifically, the effectiveness was assessed using measures of affective mental health. While this investigation was not comprehensive, we believe it to be the most comprehensive one to date. 

The methods we employ are not new. We nevertheless believe that their combination constitutes a novel approach to performing and comparing cost-effectiveness analyses. To reiterate, we meta-analytically estimate the total effects of an intervention (not just the post-treatment effect) on MHa, then we ground our discount of the total effect based on empirical estimates of bias, and then use those estimates to simulate the comparative cost-effectiveness of psychotherapy relative to cash transfers. 

Psychotherapy appears to be around 12 times (95% CI: 4, 27) more cost-effective than monthly cash transfers. To increase our confidence in this estimate requires collecting more information on household spillovers, cost data and long-run follow-ups of psychotherapy. Our estimate would also be improved by updating and refining our tool for discounting the effectiveness of an intervention according to its relative risk of bias.

Appendix A: Converting depression scores to subjective well-being scores

At HLI, we believe that happiness is what ultimately matters. What do we do, then, if we don’t have direct measures of happiness, but we do have other subjective data, such as mental health scores? As a factual claim, depression scores seem closer to being a measure of happiness than the most popular measure of SWB, life satisfaction (LS).  This comes from a quick search which found three sources, all of which found that the correlation between happiness and depression is greater than between depression and life satisfaction. 

Using data from the most recent wave of the HILDA (n = 15,879), we find that the relationship between depression (measured by K10) and happiness (how often have you been happy?) is -0.593, and -0.454 for life satisfaction (how satisfied are you with your life?). Brailovskaia et al., (2019) found on a sample of ~2,000 that the correlation between depression (measured by the depression section of the DASS) and happiness (measured by the SHS) was -0.53, while it was -0.41 for LS (measured by the SWLS). Margolis et al., (2021) using a sample of ~1,200 found that the disattenuated correlations (correlation / reliability) between depression and happiness (SHS) was -0.90 and -0.79 between depression and LS (measured by RLS). 

Hence, we think that, if we don’t have happiness data, but we do have depression scores, we should use depression scores as the outcome measure, rather than convert depression scores into LS scores. 

What’s the best way to convert depression scores to SWB scores?

We think the best way to convert depression scores to SWB scores is to determine the relative impact of an intervention on both SWB and depression by looking at the comparative magnitude of SD changes. Suppose we found that therapy had (say) a 1 SD change on depression scores, and a 0.5 SD change on LS scores, that gives us the conversion ratio: therapy has twice as big an SD impact on depression as LS. Hence, if we had another study, where we only had a depression measure, we would assume the (unmeasured) LS change for those participants was 0.5 of the (measured) depression change. 

So what should we do if we don’t have both the conversion measures we want for a particular intervention? We could find similar types of interventions for which we do have those conversion measures, then assume the intervention we are primarily interested in works the same way.

We summarized the relative effectiveness of five different therapeutic interventions on SWB (not just LS specifically) and depression. The results are summarized in this spreadsheet. The average ratio of SWB to depression changes in the five meta-analyses is 0.89 SD; this barely changes if we remove the SWB measures that are specifically affect-based.

The second best alternative for conversion is to use the correlation between depression and SWB as an anchor point for assessing how much the scales tend to overlap.98

Correlations are useful for building a prior for how strong the relationship is between depression and LS. However, we think they are less useful when trying to answer the question: “Given that the effect of psychotherapy on depression scales was X, what will its effect on LS be?”

Correlations give us a standardized measure of how two variables vary together, not a prediction of the size of their differential response to an intervention. To put it another way: our preferred option is to ask the conditional question “Given that the impact of intervention on depression is X, we expect the impact on SWB to be Y.”  Using simple correlations asks the unconditional question “If the change in depression is X, then what is the expected change in SWB?” where variation in depression scores may come from any source. 

We think that the use of correlations as an adjustment factor between SWB and depression is only sensible if used as a lower and upper bound. Otherwise, you wouldn’t be able to convert from depression to SWB back to depression (because correlations are always less than 1). The formula, corrected for a measures auto-correlation / reliability being less than one would be:

Appendix B: All studies included in meta-regressions

AuthorsCountrynGroup or ind deliv.Training length in daysOutcomesSessionsFollow-up in monthsdActive controlType of delivererPopulation
Tripathy et al. 2010India12,431group7depression20300.1401local womanmothers
Bolton et al. 2014a(i)Iraq180ind14depression125.50.2800com MH workerssurvivors of violence
Bolton et al. 2014a(i)Iraq180ind14anxiety125.50.2400com MH workerssurvivors of violence
Bolton et al. 2014a(ii)Iraq167ind14depression125.50.3800com MH workerssurvivors of violence
Bolton et al. 2014a(ii)Iraq170ind14anxiety125.50.1300com MH workerssurvivors of violence
Patel et al. 2010India1961ind60anxiety & dep860.0361lay health counselorprimary care attendees
Bolton et al. 2014bThailand347ind10depression1000.7100laysurvivors of violence
Bolton et al. 2014bThailand347ind10anxiety1000.4200laysurvivors of violence
Baranov et al., 2020Pakistan, Punjab818indNAHamilton dep1660.5591lady health workerpregnant mothers (rural)
Baranov et al., 2020Pakistan, Punjab704indNAHamilton dep16120.4521lady health workerpregnant mothers (rural)
Baranov et al., 2020Pakistan, Punjab585indNAHamilton dep16840.1381lady health workerpregnant mothers (rural)
Hughes 2009India422groupNAEPDS dep560.1801non-specialistmothers
Singla et al. 2015Uganda291group14CESD1230.2800non-profmothers with children <3 years
Mao 2012China240groupNAEPDS411.2801obstetricianmothers
Rojas et al. 2007Chile208group2EPDS830.6231doctor, nurseprimary care attendees
Rojas et al. 2007Chile208group2EPDS860.2201doctor, nurseprimary care attendees
Weiss et al. 2015(i)Iraq149ind10depression223.50.6980com MH workerstorture survivors
Weiss et al. 2015(i)Iraq149ind10anxiety223.50.6900com MH workerstorture survivors
Weiss et al. 2015(ii)Iraq193ind7anxiety224.50.1300com MH workerstorture survivors
Weiss et al. 2015(ii)Iraq193ind7depression224.50.1530com MH workerstorture survivors
Araya et al. 2003Chile211groupNAdepression930.8841doctorsprimary care attendees
Araya et al. 2003Chile211groupNAdepression960.9001doctorsprimary care attendees
Bolton et al. 2003Uganda284group14depression160.51.8521local personlocal men and women
Bass et al., 2006Uganda216group14Depression1661.6081local personlocal men and women
Bryant et al., 2017Kenya319ind8GHQ12530.5701laysurvived gender violence
Rahman et al., 2019Pakistan, Swat598group7HADs50.250.7851laywomen post-conflict
Rahman et al., 2019Pakistan, Swat577group7HADs530.6051laywomen post-conflict
Rahman et al., 2016Pakistan346ind8HADs53-0.8301laygen.16-60 in conflict area
Hamdani et al., 2021Pakistan198ind8HADs53-0.3141lay hospital
Haushofer et al., 2020Kenya1018ind9PWB512-0.0100layrural
Fuhr et al., 2019India251group7PHQ9103-0.3401peerswomen
Fuhr et al., 2019India251group7PHQ9106-0.1801peerswomen
Meffert et al, 2021Kenya209ind10BDI1230.3801non-specialistssurvived gender-violence
Nakimulu-Mpungu et al., 2020Uganda1140group5SRQ-20860.3791laypeople w/ HIV
Nakimulu-Mpungu et al., 2020Uganda1140group5SRQ-208120.2211laypeople w/ HIV
Patel et al., 2016India466ind78BDI-II730.5091lay counselor18-65 Goa
Weobong et al., 2017India447ind78BDI-II7120.2901lay counselorgen. Goa
Weobong et al., 2017India447ind78PHQ-97120.3201lay counselorgen. Goa
Lund et al., 2020South Africa384ind5Hamilton DRS640.3461chwmothers SA.
Lund et al., 2020South Africa384ind5Hamilton DRS6130.27411chwmothers SA.
Husain, 2017Pakistan216groupNAdepression631.7901lady health workersmothers
Husain, 2017Pakistan216groupNAdepression660.8901lady health workersmothers
Mukhtar, 2011Malaysia113groupNAdepression804.8301profadults / gen. pop
Nakimuli-Mpungu, 2015Uganda109groupNAdepression800.0501layw/ HIV
Nakimuli-Mpungu, 2015Uganda109groupNAdepression860.7601layw/ HIV
Chibanda et al. 2016Zimbabwe573ind9PHQ9660.8971lay HWwomen (urban)
Gureje et al., 2019Nigeria686ind5EPDS660.1891primary care providersmothers
Gureje et al., 2019Nigeria686ind5EPDS6120.2651primary care providersmothers
Naeem et al., 2015Pakistan129ind5anxiety & dep600.8601psych grad student psychiatry outpatient (urban)
Naeem et al., 2015Pakistan110ind5anxiety & dep660.3151psych grad student psychiatry outpatient (urban)
Bass et al, 2013Congo405group5Depression120.0001.0871psychosocial assistantsfemale survivors of violence
Bass et al, 2013Congo405group5Depression126.0001.0001psychosocial assistantsfemale survivors of violence
Baker-Henningham et al. 2005Jamaica139ind42CESD5012-0.4121com health workersmothers
Cooper et al. 2009South Africa449ind120Depression1660.2400peersmothers
Cooper et al. 2009South Africa449ind120Depression16120.2600peersmothers
le Roux et al. 2013South Africa1157ind30Depression1160.1380Com health workersmothers
Richter et al. 2014South Africa543ind60Depression800.5011peer mentorsSA HIV moms
Rotheram-Borus et al. 2014aSouth Africa1030group30Depression800.5011peer mentorsSA HIV moms
Rotheram-Borus et al. 2014aSouth Africa766group30Depression860.3451peer mentorsSA HIV moms
Rotheram-Borus et al. 2014aSouth Africa251group30Depression8120.5471peer mentorsSA HIV moms
Rotheram-Borus et al. 2014bSouth Africa1082ind30Depression; EPDS11120.1161com health workersSA moms

Appendix C: System for adjusting estimated effects by relative bias

We start from the assumption that researchers may try  to make their results larger and more significant.99Therefore we assume that features of a study that reduce the researcher’s flexibility in performing research will tend to lead to smaller effects in fields where bigger effects are more exciting — which we take to at least be the case in psychotherapy where most researchers probably have some degree of loyalty to the intervention, or else why are they studying it? 

Is there any evidence to support this view in the general case? We searched the literature across a range of fields. Two meta-analyses find general indicators of quality are related to smaller effects (Berkman et al., 2014; Hempel et al., 2013), but the evidence is often inconclusive.  In Bialy et al., (2014) only a high risk of bias for selectively choosing outcomes led to significant overestimation of treatment effects but in Hoppen & Morina (2020) and Hartling et al., (2014) the associations were not significant. However, for each decrease in risk of bias in the MetaPsy database of psychotherapy’s effect on depression, the effect size significantly decreases by -0.13, given that the average effect size  is 1 SD this is a substantial difference in the studies with the highest and lowest risk of bias. 

Also, one can compare how replicated effect sizes compare to the original effects. Tajika et al. (2015) find that the “standardised mean differences of the initial studies were overestimated by 132%.” Camerer et al. (2018) find “a significant effect in the same direction as the original study for 13 (62%) studies”. Of course, it may also be worth considering whether replicators themselves face a publication filter that pushes them to find smaller effects.

Assuming that our premise holds in general, we next compiled a list of those features of a study that signify constraints on the researcher’s part. We deem these to be a) easily extractable and b) having a consistently significant relationship with the effect size that does not have a clear explanation other than bias. 

We will describe the features we consider and a few we do not when building our decision tool for adjusting an evidence base according to bias. The observable elements we compare between psychotherapy and our review of cash transfers correspond to issues of interval, external validity, and publication bias. 

  • Is the study an RCT? We find mixed or non-significant estimates when comparing RCTs to quasi-experimental studies. Vivalt (2020) finds that quasi-experimental studies have smaller (not significantly) effects than RCTs on a large sample of development studies. Cheung and Slavin (2016) for a sample of 645 educational interventions find that effects are significantly higher in quasi-experimental studies than in RCTs. However, we expect the effects to follow this order: RCT < quasi < panel < cross-section so we include this as a source of bias. There’s also evidence that looks within study comparisons of different methods. However, at this time, they do not add much clarity to the evidence although they seem to support our view.100 
  • If the study is an RCT, does it use an active or passive control? Is the passive control a waitlist? Using data from MetaPsy, the effects of treatment comparing to a waitlist, even when controlling for other characteristics of the study, are large compared to care as usual101 (0.16 SDs 95% CI:  0.1039,  0.2184) and are about 11% the size of the average effect (so we’d say that a waitlist comparison is 89% the effect of a non-waitlist comparison). We use data from MetaPsy because it’s the largest and most relevant but other studies come to similar conclusions (Furukawa et al., 2014; Michopoulos et al., 2021). However, we should arguably be comparing waitlists to “nothing” instead of care as usual. Because we expect “nothing” to be the “care as usual” most people will receive in LMICs for mental illness. 
  • Does the study have a large sample size? The evidence we’ve found finds that studies with smaller sample sizes consistently tend to have larger effect sizes (Vivalt, 2020; Cheung & Slavin, 2016; Pietschnig et al., 2019). The leading explanation for this is that a researcher can ‘farm’ , that is performing many small trials and only publishing the results which show positive effects. Small studies are easier to micromanage to an unrealistic degree, for instance ensuring higher quality of treatment.
  • Is the study pre-registered or unpublished? Pre-registered  (Kvarven et al. 2019; Schäfer & Schwarz, 2019; Chow & Ekholm, 2018; Dechartres et al., 2016; Papageorgiou et al., 2018) and unpublished studies (Dechartres et al., 2018) display much smaller effects. Presumably because they bypass the publication filter that pushes for larger effects. We suspect this deserves more weight when study sizes are small, so there’s some overlap between average sample size and publication bias. Do you expect someone to really ‘file-drawer’ their n = 10,000 RCT or for them to think “null results be damned!”? 
  • Is the analysis only performed on those who received the treatment? Is it a complete case analysis as opposed to an intention to treat (ITT) analysis? Results diverging from ITT have larger effects (Abraha et al., 2015) (studies = 310), Although this wasn’t found for (Døssing et al., 2016) (studies = 72). Intention to treat often isn’t reported but seems to be the default analysis because removing cases that did not complete treatment requires tracking them, which is harder. We generally assume if a researcher goes to greater lengths to ensure the quality of their study, they will write about it.  

For the evidence base in general we ask:

  • Is there a large sample of studies? We think this is a restatement of concerns about publication bias and would be almost entirely nullified if all the studies were pre-registered and very large. The source of bias that may remain even if the studies are pre-registered and large is that there are few studies then there’s a higher chance the authors share the same beliefs about the intervention, which could bias the evidence. This would be a small concern. 
  • Do they overlap geographically with the area of interest? If not, does this lead to an over or underestimate of the effects? If the geographical concentration of studies mostly overlaps with the locations the interventions would take place in, we take this, in conjunction with their being a large sample of studies or well-powered pre-registered studies, as a proxy of external validity.

Some features we consider but currently do not incorporate are: 

  • Baseline differences and how they are handled. While this appears like an important source of potential bias, it seems difficult to operationalize the degree of baseline differences and how well they are handled. 
  • Attrition and how it is handled. The first element of both of these features is relatively easy to extract, but the second one takes a judgement and is often difficult to tell whether a study handled baseline difference and attrition satisfactorily. 
  • Whether participants, interviewers or analysts were blinded. We find conflicting evidence over whether this matters in general,102or what aspect of blinding matters. Furthermore, this is often difficult and time consuming to assess. 

So how do we actually take these features into account and arrive at a precise discount? We will explain how we did it in this case, although we expect this process to change in the future as we develop this tool. The ideal form of this decision tool is to rely on fewer subjective judgements and more empirical estimates of how much a proxy for bias tends to inflate or deflate the effect size. 

That being said, we first set out the bias we expect if an evidence base were to be completely full of studies with that characteristic and the weight we assign to each feature. We try to base our estimate bias on estimates found in the meta-analytic literature that predicts whether a study having a particular characteristic tends to over or underestimate its effectiveness.  We explain the sources in the column “sources suggesting direction of bias”.  Our judgement of how much to weigh a signifier of bias comes from a subjective assessment of its relative importance. We base this assessment on the consistency and magnitude of findings.  The results of this process can be seen in Table A.3. 

Next, in Table A.2, we assess how relatively biased the evidence is for psychotherapy compared to cash transfers. For instance, if RCTs tend to give lower effect sizes and the psychotherapy literature has relatively more RCTs in it than cash transfers, that leads us to inflate the effectiveness of psychotherapy to the degree that there are more RCTs. In this case, our sample of studies has about twice the RCTs as a share of the total sample of studies as cash transfers. To put this concretely:

estimated discount  = discount deserved * bias

discount deserved  =

We arrive at the total discount by taking the weighted average of estimated discounts, which we adjust based on our judgement of how correlated the signifiers of bias are (Table A.4). 

Table A.2: Estimation of absolute bias predicted by signifiers of bias

Source of biasProxyEstimated biasWeightGeneral sources suggesting direction to bias
Internal validity: causality% RCTs0.95SmallSince the evidence is inconclusive (see discussion above) we give it a small estimated discount and relatively little weight. This is our subjective judgement.
Internal validity: control worse off% active control0.89MediumUsing data from MetaPsy, the effects of treatment compared to a waitlist, even when controlling for other characteristics of the study,  are about 11% the size of the average effect (so we’d say that a waitlist comparison is 89% the effect of a on-waitlist comparison).
Internal validity: low take-up% using ITT0.87MediumAbraha et al., (2015) find studies diverging from ITT have larger effects (in this case a smaller odds ratio  0.83., but due to conflicting evidence we decreased the discount to 0.87.
Internal validity
AND
pub bias: power
Average sample size85%LargeWe’ve found across a few (3-4) meta-reviews that studies with larger samples have smaller effects (Vivalt, 2020; Cheung & Slavin, 2016; Pietschnig et al., 2019; MetaPsy). We use estimates from MetaPsy which suggest that per person added to a sample the effect decreases by 0.0003 and 0.0001 SDs. To get the discount, we multiply the estimated decrease in effect size by the average difference in samples between interventions standardized by the intercept given in the MetaPsy model. Given that the difference in average sample sizes between cash transfers and psychotherapy is 2,727 – 634 = 2,093 then we estimate  0.0003 * (2,093) / (7.1 = intercept) –> 91% size of average effect. Another simpler specification: 0.0001 * (2093) / 1.001 = 80%. So we take the midpoint to use for our discount.
Pub biasn pre-registered0.60LargePre-registered studies have lower effects (Schäfer & Schwarz, 2019 by 0.44; Dechartres et al., 2016 by 0.84  Kvarven et al. 2019: 0.38, Tajika et al. 2015: 0.75). We are not sure if registered means pre-registered for psych studies. Taking the specific mean gives us 0.60 as a discount. 
Pub biasn unpublished0.5MediumFrom (Dechartres et al., 2018) and Cheung & Slavin (2016) we estimate a discount of around 0.5.
External validityGeo. overlap0.8-1.2MediumWe are not sure whether studies in different geographic locations will differ in effects. 

Table A.3: Estimation of relative bias based off signifiers of bias

An evaluation of therapy benefits versus costs, highlighting its impact on wellbeing and overall lif.

Note: Yellow represents estimated bias. Green represents upgrades that favor psychotherapy, while red indicates a discount against psychotherapy. Weights are orange if we alter them downwards based on a sense that the tool gave us unintuitive results i.e., we ignored it, and blue otherwise. 

Table A.4: Subjective assessment of correlations between biases

Data table on psychotherapy cost-effectiveness and wellbeing outcomes.

Note: The purpose of this table is to illustrate the subjective correlations I assume between sources of bias. I then use the average correlation to discount the overall discount. The average correlation between signifiers of bias is 0.22 (after arctangent transformation).

Before you go, subscribe to our newsletter!

We’ll update you on wellbeing research and how to make the world a happier place.