Some rights reserved by cliff1066™
The fact that 'Lies, Damned Lies and Statistics' is a phrase that has been in use for more than 100 years is perhaps an indication of the abuse to which data has been subjected in public discourse. We are by turns exasperated, decieved and baffled by the incessant quoting of percentages, putative correlations and trends by politicians, journalists and pundits, which we then sprinkle into our conversations (tweets, facebook updates) in more or less thoughtful, balanced ways. Usually less.
If we are to harness the potential of open data to allow us to ask better questions, and make better choices, we need to consider what kinds of conversation allow us to reach an informed consensus, rather than afford 'victory' to the most skillful orator with the weightiest arsenal of 'killer stats'.It would be futile to attempt to exhaustively catalogue the ways in which our conversations about data may go awry, so I will touch upon just two areas which seem to me to be among the most influential: definitions and significance.
My concern here is the struggle we have in maintaining the distinction between a concept and the indicators we have chosen, in a particular situation, to define that concept. For example, a child receives free school meals based on their financial circumstances: this has for some time been widely used as an indicator for 'deprivation'. Enter 'free school meals deprivation' into google, and the first result is a blog post disputing the validity of connecting the indicator to the category: a little further down, a press release from the Department for Education announcing that half of the new free schools opened this September are in "the 30% most deprived communities". A little digging reveals the latter claim was based on the 'Index of multiple deprivation' - a number of different indicators (including free school meals) combined into one overall deprivation 'score'.
Some combinations of indicators may be more sensible than others in different contexts, but the full content of the concept will remain stubbornly elusive. There is an inescapably ethical compenent to such concepts as deprivation, wealth, well-being, education, security.... A conversation about what any of these 'mean' drives which indicators are chosen, but once chosen, we are prone to conflate the two, and thereby shut down a vital part of the ongoing discussion.
Our assessment of the significance of a statistic is prone to a worrying array of more or less irrational influences. An awareness of confirmation bias, "a tendency for people to favor information that confirms their preconceptions or hypotheses regardless of whether the information is true", should usefully undermine the confidence we have in our own opinions. Yet we are usually pretty keen on our opinions: much of our journalism and political discourse trades on our willingness to take sides, then boo, hiss, clap or cheer accordingly.
Daniel Kahneman's description of 'System 1' and 'System 2' thinking sets out a widely-noted distinction between our quick reactions, guided by intuition, familiar associations and narratives, and the slower process of rational analysis, which interrogates such reactions, seeks out further relevant information and attempts to reach logical conclusions. The use of data and statistics can easily give us the false impression that our conversation is resolutely 'System 2', grounded in cool logic and rationality: from whence we derive the conviction that anyone who disagrees with our position has simply got their facts wrong: or rather, 'their facts' are derived from a biased position and are therefore insignificant. The reality is almost certainly that we all flip between these modes of thought, whether snatching at connections, reflecting upon new information, defending one's position or identifying common ground.
So the answer is....?
I can't pretend to have a fool-proof formula for successful data-discussion: my concern is rather to highlight some of the causes of our regular failure to reach wise, inclusive decisions based on a shared understanding of the information available. If I have any prescription to offer it is merely to suggest some of the qualities which might guard against predictable sources of error. If we were to bring more humility, persistence, patience, flexibility and empathy to our discussions, and celebrate these qualities when we recognise them in others, I think that'd be a good start.