Playing the Numbers

And, no, I don't mean playing the lottery numbers! We've posted a couple of articles in the past about What Makes a Good Measure (Jan 3, 2012) and Telling Stories with Data(Jan 21, 2011). These posts discuss the challenges of working with numbers in the analytics world. I also emphasize with clients the importance of expressing the context of a report or data that people are seeing. What does it include? And sometimes much more importantly, what does it NOT include? What filters are applied? What are the values of those filters? What time frame does the data represent? Can users see this context when reviewing a report or is it likely they could apply their own assumptions about what data means?

A perfect example of this context issue came while reading an excellent commentary in today's US edition of the Financial Times. (You can access it via this link, but have to sign up for a free account to read the full text.) Let me summarize the gist of Steven Hill's argument about inappropriate and misleading use of numbers.

The conventional thinking, particularly purveyed by the media and governments themselves, is that youth unemployment in Europe is at crisis levels. There are demoralizing rates of nearly 50% in Spain and Greece, and over 20% in the rest of the Eurozone. But is this really the case? He argues that a very flawed methodology is used to calculate those rates. Namely, they do not include those youths who are in school or job training and not looking for a job anyway. The denominator is a much smaller number of individuals, and therefore drastically overstates the unemployment rate. So, the "unemployment rate" conveyed most often to the public does not tell the whole story.

He suggests a better measure might be an overall ratio of the entire youth population 24 years old and younger (regardless of their intent to seek employment or schooling) to the youth who are actively seeking a job and can't find one. Using that measure, Spain's youth unemployment is only 19% (vs 48.9%) and Greece's is only 13% (vs 49.3%). In the Eurozone as a whole it would only be 8.7%. That is a dramatic difference in the two measures. It tells a different story. In the adult unemployment rate, there is a reverse problem by excluding those who have given up looking for work, the rate is commonly understated.

As I read the article, I was thinking you could create a similar ratio of students in school or job training for comparison purposes that shares the same denominator and thus has a common foundation. Further, you could list the ratio of those not in the labor market. This is a much better way of expressing data on common grounds with common definitions and therefore make better informed policy decisions. Yet, traditionally, the unemployment "rate" has always excluded those in school and not looking for work, therefore providing frightening unemployment picture to those who wish to use it that way.

Now take for instance the challenge of communicating graduation and completion rates for students. Completion is a hot button issue across all segments of higher education. The IPEDS numbers are famously unreliable since by definition the cohort only includes students who were first time to any college. Is that really helpful? In community colleges it leaves a large population out of the denominator since many students have attended other institutions before. Furthermore, for any institution, do they measure if the student completed somewhere else after they transferred? At least the Department of Education recognizes these issues and has formed a working group to address the limitations of current measure definitions.

What similar scenarios might you have in your institutional measurement structures? Are your rates and ratios on a common foundation? Do people know the whole context of the ratios they are seeing? What might need to change to address the shortcomings of what is communicated? Discuss!