What’s happening…

For anyone who is interested in evaluating initiatives promoting “evidence informed policy-making”, you might be interested in this review of 3 knowlege-to-policy programs supported by DFAT in Indonesia:

  • https://www.dfat.gov.au/sites/default/files/final-report-independent-strategic-review-k2p-investment-indonesia.pdf

The study covered  the Knowledge Sector Initiative; the Abdul Latif Jameel Poverty Action Lab (J-PAL) South East Asia and UN Pulse Lab Jakarta and looks at what and how they contributed to public policy development.

It’s good to talk? Assessing the quality of dialogue

Policy dialogue has been an important element of international development efforts for some years now.  However, its importance seems only set to grow, as the development arena becomes more contested, as commitment to multilateralism weakens and individual governments align international assistance with foreign policy.  

But assessing policy dialogue is hard.  Multiple factors condition the progress and outcome of a dialogue initiative, complicating the monitoring and evaluation challenge.  But that said, there seems plenty of scope for more rigorous, systematic and (crucially) informative strategic assessments of dialogue investments.  To date, application of such assessments has at best been patchy.  

With this in mind, I wanted to share some details about an interesting effort I was involved in with Stuart Astill and Enrique Wedgewood-Young – to assess dialogue quality. 

Why was it important to look at quality (as opposed to ‘satisfaction’ or ‘outcomes’)?   In this instance, dialogue – specifically deliberative dialogue – was a central element in the theory of change for a long-term and complex programme.  Deliberative dialogue was held to be a necessary (if not sufficient) means of shifting stakeholders’ attitudes.   Understanding whether efforts to achieve this type of dialogue were being successful was therefore quite important for the strategic management of the programme.      

Deliberative dialogue can be (loosely) characterised as evidence-based (drawn from trusted sources), involving a plurality of interests, conducted in a manner that builds shared understanding of different groups’ positions and the trade-offs, and focused on benefits and fairness.  Deliberative dialogue is seen as a means of unblocking and reframing adversarial debates and, through the process of social learning,  capable of achieving ‘enlightened’ solutions i.e. agreements that are in some sense collectively optimal and not simply the result of power relations or horse-trading (zero-sum game).

Measuring the quality of deliberative dialogue, however, is not straightforward. To do so, we examined a number of programme-sponsored dialogue events, using a mix of:

  • pre- and post- contextual interviews with programme staff:  to ascertain the specific objectives for the event, factual information about preparatory activities and views on success;
  • a post-event questionnaire survey of participants, to obtain their views on a number of aspects of the dialogue; and 
  • non-participant, direct observation of the dialogue events and discourse analysis. 

It is this last element that I want to highlight because this was the part that was reasonably innovative.  However, important to note that the interviews and surveys, in addition to providing useful information themselves, were also key in triangulating  the results of direct observation.

To operationalise the observation exercise, we developed an assessment framework informed by deliberative theory and discourse analysis, and which drew heavily on Bächtiger et al (2011)[1] and their adaptation of the Discourse Quality Index (developed Steenbergen et al).  The framework comprised nine ‘qualities’:

Dialogue qualityDescription
Participation equalityRecords the (time) participation by different groups attending  the dialogue  
Clarity of understandingRecords instances when participants demonstrate a lack of understanding/ clarity regarding the objectives of the dialogue
InteractivityRecords instances when participants refer to other participants or their arguments
Framingwhether the issue presented for discussion is framed from singular or multiple perspectives and whether trade-offs actively considered
Justification rationality  The extent and quality of justification offered for positions or arguments presented.
Story tellingInstances where participants rely on personal narratives or experiences in supporting their positions
Respect and agreementThe nature (tone or language) of references by participants to others’ contributions
Common good orientationThe basis for how arguments are cast  –ranging  from narrow constituency interests to principles of the common good. 
Constructive politicsThe ‘politics’ of solutions/conclusions offered, ranging from positional politics, to mediating proposals that explicitly acknowledge and seek to address different concerns

To conduct the assessment, a team of three observers were trained in the tool and given responsibility for tracking three qualities each (with tests run initially to check observer reliability).  Each quality was scored either as a simple frequency count (number of instances where the quality was observed) or against a predefined scale measuring the degree to which the quality was observed.

But of course it is unrealistic to expect any dialogue to score continuously “high” in deliberative terms throughout the process.  Recognising this meant our approach needed (a) to capture the variation in the dialogue over the course of the event; and (b) a basis for interpreting that variation.

In order to capture the ebb and flow and changing intensity of the discussion, we assessed each dialogue in five minute blocks – assessing the quality of dialogue throughout each session.   When breakout groups exceeded the number of our observers, we had to sample pragmatically.  Because we felt the nature of these smaller-group dialogues would also differ over their course, we divided each breakout session into three equal time periods –  ‘beginning’,  ‘middle’ and ‘end’ – and sampled across these units.  The aim was to ensure the observers had a representative picture of the breakout discussions.  

To interpret the results, we drew on Bächtiger et al. again, (who in turn drew on the work of Shawn Rosenberg).  Depending on the combination/profile of qualities observed, we could characterise the dialogue for any period according to five types of discourse:

  • Proto-discourse:  communication to provide/share information and build social assurance; little or no focus on disagreements over validity claims between different interest groups;
  • Competitive discourse:  lacks the aim of reaching a shared understanding and any noticeable cooperative spirit; actors are not prepared to be persuaded by the “better argument” but only seek to justify their own standpoint; may involve elements of high quality dialogue but instances of respect and agreement are infrequent
  • Conventional discourse: geared towards problem definition and problem-solving but not building a shared understanding; typically comprises a succession of concrete contributions intended to describe, to explain or to evaluate an aspect of the topic at hand.   
  • Cooperative discourse: geared towards reaching a common understanding and problem-solving. The goal is agreement among participants. To achieve agreement, diverse standpoints are evaluated; entails key elements of what many would consider high quality deliberation;
  • Collaborative (rational) discourse:  the most demanding form of exchange – involves the free and equal expression of personal views and a respectful consideration of others’ perspectives, fairness and the common good;  in finding solutions, the goal is preference transformation both personal and collective, and disagreement is actively managed in productive, and creative ways. 

There isn’t scope here to elaborate the detailed findings but suffice to say that, although some sessions in all of the dialogues demonstrated deliberative characteristics, none of the sampled dialogues overall achieved the level of deliberative discourse envisaged in the programme theory of change.  

That said, the granularity of the analysis enabled by detailed direct observation and follow-up survey did point to potential actions to strengthen the deliberative quality of dialogue in the future:

  • engage stakeholders earlier in the dialogue preparations by sharing more relevant, timely evidence in advance; deliberative theory places significant importance on the role of preparatory evidence, as a basis for dialogue and ensuring participants come with the appropriate mindset
  • allow sufficient time and space for discussion of issues:  by a) narrowing the scope of dialogue events; and b) using breakout groups more systematically. On the latter, our data indicated group discussions were typically more deliberative than plenary sessions (and involved greater participation by women); but, in practice, much of the value of group discussions was lost by reliance on rushed, abridged and (at times) incomplete feedback to plenary as the main means of capturing findings.  
  • finally, high quality deliberation involves tackling and resolving differences of view. This is a challenge for external agencies when the issues are highly contentious but ways have to be found if deliberative approaches are to effectively deployed.  This may require greater engagement politically by supportive governments, in addition and alongside to the usual provision of funding and technical expertise.

If any of the above interests you, do get in touch! 


[1] Bächtiger, A., S. Shikano, S. Pedrini, and M. Ryser.  “Measuring Deliberation 2.0: Standards, Discourse Types, and Sequentialisation. University of Konstanz and University of Bern.  2011(?)

Evaluating uncertainty

How do you assess an intervention where the outcomes are long-term and hard to predict, there is little ‘hard’ or definitive ‘objective’ data on which to judge progress in the meantime, and anyway the relationship between effort and results is non-linear and the programme’s influence is at best indirect and small (but hopefully important nonetheless).  In short, how do you evaluate uncertainty?

A lot has been and is being written about applying PDIA approaches to programmes, particularly the sort I characterise above.   I won’t talk here about the (significant) implications for the design of monitoring and evaluation activities more generally.  Instead, I wanted to share some thoughts on a particular method that strikes me as highly relevant in such settings: influence maps.

The context is the use of influence maps in some recent work with IOD PARC – a great team comprising Dr Stuart Astill, Enrique Wedgewood-Young and Dr Sheelagh O’Reilly.  We’ve been evaluating a DFID initiative called the South Asia Water Governance Programme (SAWGP), which supports knowledge sharing and deliberative dialogue to promote transboundary cooperation between the seven countries covering the Indus, Ganges and Brahmaputra river basins.  Without going into the details, you will have a sense of the evaluation challenges.

Influence maps are an interesting and insightful tool for compiling and analysing actors’ beliefs about what’s happening and what’s important, using Bayesian techniques of inference (hence the more general term “Bayesian belief networks”).

Bayesian networks have been around some time, in medicine, engineering, environmental studies and, of course, IT.  I first got involved with them some 16 years ago, working with Bob Burn, a statistician from University of Reading.  We were exploring approaches for evaluating long-term research into natural resource management.   But they also appear to be an idea that has somewhat come of age – judging by recent articles in the journal publication “Evaluation” (e.g. this, and this).

But to my knowledge, they have not yet been widely applied as a mainstream monitoring and evaluation tool in international development and, for that reason, DFID deserves credit for allowing IOD PARC to apply the technique as a central element in the SAWGP evaluation –  genuinely innovative.

I’ll try to outline the general approach, and why they are potentially so useful in contexts characterised by high degrees of uncertainty.  For those interested in learning more, I have written a slightly longer paper that provides more explanation – available on the IOD PARC website.

The first step is to elaborate and ‘map’ the programme theory of change.  In the most uncertain settings, there may be little in the way of formal, explicit ‘theory’, and the job is to unearth and structure (in a causal diagram) the actors’ mental models of how change should occur. This is standard practice for evaluators familiar with theory-based approaches.  However, one tip: ignore the programme (initially at least) and instead focus on the important factors that matter in the operating environment and the causal relationships between them.  Once you have those down, you can then add where the programme intervenes.

The process of drawing out and elaborating key changes to the status quo that need to happen,  ordering them causally and thinking about the role of the programme is an incredibly valuable exercise in itself – in building an shared understanding but also clarifying differences in view.

So, now you have a ‘map’ that comprises a series of boxes (or nodes) setting out the key changes required, ordered and linked by arrows to convey the (purported) causal relationships. So far, so familiar.

The next step is to define the conditional probabilities for each of the nodes identified in the map.  This process uses a Bayesian way of thinking about probabilities – essentially, what’s the chances of this node happening, if other things have or haven’t happened (see box).  When outlined on paper, the approach can seem clunky and artificial, but in practice it is quite intuitive to do (and fun!) once people get their heads around a Bayesian way of thinking.

Text Box:

Once you have a compiled the model, you can manipulate it easily to analyse in surprising richness what actors’ beliefs reveal about causality and what’s important for success.  You can explore “what if” questions, by examining different outcomes for different nodes, and trace the effect on overall chances of success.  You can look at the what the model says about the influence of individual nodes (including the programme’s interventions) and simulate ‘ex post’ outcomes to see what the most likely combination of explanatory factors are. You can compare the influence of different factors with the allocation of programme resources/effort, to identify avenues to explore from a value for money perspective…

The options are extensive but, of course, all the results will typically reflect the subjective beliefs of the actors involved. Even if you conduct the exercise with more than one person and get broadly similar responses, it doesn’t make the model “right” in conventional, objective terms – all actors may be subject to the same levels of ignorance, group-think may be rife…  But in the case of programmes where decision-making is fundamentally based on managers’ subjective interpretation and judgement, is understanding and systematically analysing those beliefs of any less interest to an evaluator?  And is it really sensible to dismiss the results obtained as somehow ‘not valid’ because of their subjective underpinnings?

The outline above assumes a one-time exercise.  Even a single snapshot can provide significant insight. However, it gets even more interesting if the exercise is repeated.  We have been lucky to do this in the SAWGP evaluation – with the first exercise in 2016, a repeat exercise in 2017 and a third and final exercise scheduled for later this year. (A big thanks here is due to the participating SAWGP implementing partners for their generous contribution of time and assistance in the work).

The addition of a longitudinal dimension to the analysis, examining what has changed, why, and how that affects actors’ beliefs about the future, opens the door to even more rigorous assessment, allowing the evaluator to link specific, observed changes in the context to changes in actors’ probabilistic assessments.

Of course, as with any ‘new’ approach, there are what we might politely term “areas for further research”.  These include: ways of eliciting conditional probabilities that minimise bias; ways (and value) of estimating distributions around central estimates, how to aggregate/combine multiple actors’ views, how to define and ensure consistent interpretation of ‘occurrence’ and ‘non-occurrence’ of nodes means, understanding the sensitivity of results to model design and in particular the level of detail.

And given that influence maps are fundamentally linear models, they cannot accommodate multidirectional, feedback loops and as such are not the ‘answer’ to evaluation complexity.  But for me they are a valuable tool.   I aim to continue exploring and developing the approach in my work and will keep sharing as my learning proceeds.

If you are interested, do get in touch – either with questions or your own experiences/perspectives.   I strongly believe the technique has potential for wider application in more mainstream settings – as a tool to augment conventional contribution analysis. With this in mind, I hope to write a piece soon explaining the approach in a bit more detail.  Watch this space…