‘Readability metrics are not worth the paper they’re not written on.’
This was a quote from Professor Geoffrey Pullum at the Society for Editors and Proofreaders (SfEP) annual conference in September 2017, and it’s the only thing Geoff Pullum has ever said that I disagree with.
In November 2016 I wrote a blog explaining the main readability metrics, Flesch Reading Ease, Flesch-Kincaid and the like, and making a case that there is value in these metrics for writers and editors. They’re not sufficient, they’re not even necessary, but they are useful – in context, and if their limitations are understood and accepted.
I put forward this opinion in a 5-minute lightning talk at the same SfEP conference where Geoff had earlier expressed his less complimentary view.
The rest of this blog outlines some of the thoughts behind my SfEP talk, and goes on to propose a new simplified metric (‘the Pix’) to use alongside Flesch-Kincaid. Pix is derived from three measures which are routinely used by Yoast SEO but are not in the Flesch family of metrics.
Flesch it out
First a quick reminder about the Flesch measures.
Flesch Reading Ease (FRE) is an index of English readability derived from two inputs and scaled into an index (typical range 30-100). The index scores are accompanied by descriptive levels of readability. 30 is deemed very hard to read (postgraduate level). 100 is deemed easy to read by a 10 year old child. 60 equates approximately to plain English.
The two inputs are:
which are converted into an index using the following formula:
206.835 – 1.015 * (Words/Sentences) – 84.6 * (Syllables/Words).
Flesch-Kincaid (F-K) uses the same two inputs, but scaled in a different way to equate directly to a reading age, without the need for a translating description.
The F-K formula is:
0.39 * (Words/Sentences) + 11.8 * (Syllables/Words) – 15.59.
The scores associated with the two Flesch metrics can be interpreted as shown in the table below:
|Flesch Reading Ease: Index||Flesch-Kincaid: Reading Age (approx)||Notes|
|100+||Up to age 10||Extremely easy to read.|
|90–100||Age 11||Very easy to read.|
|80–90||Age 12||Easy to read: conversational English for consumers.|
|70–80||Age 13||Fairly easy to read.|
|60–70||Age 14-15||Plain English: target for much business writing especially aimed at consumers.|
|50–60||Age 16-18||Fairly difficult to read.|
|30–50||Undergraduate: Age 18-21||Difficult to read.|
|0–30||Postgraduate: Age 21+||Very difficult to read. Best understood by university graduates.|
Other similar metrics are set out in my earlier blog but not covered here. Flesch is the most widely used, including in MS WORD and online platforms such as Yoast SEO, so is quite representative of this class of measures.
Prism Clarity Simplicity measure (PCS) (‘Pix’)
Now for the new simplified measure which I call the Prism Clarity Simplicity index (phonetically, ‘Pix’).
What is Pix?
It is a simple way of combining three other readability elements that SEO platforms (such as Yoast SEO) care about, but which are not part of the Flesch family of measures. As writers and editors we too sometimes care about these things, especially if we’re concerned about Google ranking; or even if we simply want to know what level of readability our content is achieving.
A quick disclaimer
As a matter of writing philosophy I don’t necessarily agree with the Yoast constraints implied below.
But I accept them as a reality; and as a valid input to an objective (but stylised) measure of writing simplicity.
Long sentences: Yoast considers 20 words to be a long sentence and 25% to be an acceptable maximum number of ‘long’ sentences (as % of total).
Passive sentences: Yoast considers 10% to be an acceptable maximum number of sentences including a passive verb (as % of total).
Transition words: Yoast considers 30% to be an acceptable minimum number of sentences including a transition word as defined in this Yoast SEO blog: extract shown in the below table
Putting it all together
So here’s the point: what I do in the new Pix measure is take these elements and combine them in a single formula, as follows:
- Element PS: Passive sentences % multiplied by 100; e.g. 20% * 100 = 20
- Element LS: Long sentences % multiplied by 100; e.g. 40% * 100 = 40
- Element TW: Transition words % multiplied by 100; e.g. 30% * 100 = 30
- Formula = (10-PS) + (25-LS) + (TW-30)
Illustrating using the examples above:
Pix = (10-20) + (25-40) + (35-30) = -10 – 15 + 5 = -20
A low score (larger negative) means less simple. A higher score (including smaller negative) means more simple.
How to interpret Pix
Let’s look more closely at what’s going on in this formula.
PS: If you use a lot of passive sentences (>10%) you get a negative score on PS. If you use none you get the maximum score of 10.
LS: If you use a lot of long sentences (>25%) you get a negative score on LS. If you use none you get the maximum score of 25.
TW: Conversely if you use a lot of sentences with transition words (>30%) you get a positive score on TW. But if you use none you get the minimum score of -30. The TW factor can offset (or partially offset) the other two.
But what does it all mean?
Basically the more long or passive sentences you use, the lower your Pix score. But you can get some Pix points back, a kind of rebate, if you also use a lot of transition words, which add fluency to your prose even if you habitually use a lot of long or passive sentences.
So the Pix is a single net score, with three elements that can cancel each other out. The score can be positive (= simple), negative (= complex) or zero (something in between).
A meaningless number
Let’s straightaway admit that in its own terms the Pix, as an absolute number, is meaningless. A bit like Flesch Reading Ease (FRE) which doesn’t make much intuitive sense on its own. In Flesch terms, to make more intuitive sense you’d use a different formula (Flesch-Kincaid) which scales directly to a reading age, rather than an abstract index like FRE.
I have not (yet) gone so far as to do this kind of translation for Pix.
Nor have I attempted any kind of descriptive labelling of what different Pix scores represent, as we saw in the Flesch table above. As it stands the measure only has any value as a comparison across different writers; to signify the relative complexity or simplicity of their writing using three common measures, described above, which are not part of the Flesch family.
The SfEP vs USA Readability Test Match
Funnily enough, that kind of comparison is exactly what I did in my 5-minute lightning talk at the SfEP 2017 annual conference last autumn, following a kind invitation from Robin Black and Lucy Ridout.
Just for fun I set up a readability ‘test match’ between the SfEP and the USA. This followed a model used by Bank of England Chief Economist Andy Haldane which compared BoE output in terms of readability to some prominent US public figures.
So I took published blogs from eight sources: four British writers representing the SFEP, and four from the public domain representing the USA.
The SfEP team was the incomparable Louise Harnby, the relentlessly helpful John Espirian, the editor’s editor Liz Jones, and myself masquerading as A.N.Other. Selected for the USA were the New York Times, Oprah, Trump and Elvis.
I chose random excerpts of prose or lyrics from all eight – acknowledging that Elvis was NOT responsible for his own lyrics – put that down to poetic licence – and put them through both the Flesch-Kincaid reading age measure and my new Pix simplicity measure.
The results of this immense and competitive readability test match are shown below:
Below is a graphical representation of the results: the purple scores on the right hand side of each picture represent the Pix scores denoted in the above table.
Of the SfEP writers Louise used shorter sentences, fewer passives and a high percentage of transition words; giving her the best Pix score (14) to go alongside a Flesch-Kincaid reading age of around 13, comfortably within the plain English category. John had fewer long sentences, fewer passives and fewer transition words – perhaps a more business-oriented style but more straight to the point and staccato in style, with less transition word fluency. The most interesting SfEP writer was Liz, who used more long and passive sentences but got some Pix points back from using lots of transition words; reflecting a writing style which is recognisably more literary and complex but also fluent. The less said about A.N.Other the better.
In terms of the US public domain sources, their Pix index scores pretty much mirrored their Flesch scores, supporting intuition. The New York Times was complex and fluent. Elvis was simple and readable. Oprah’s book was somewhere in the middle, around the same levels of simplicity as John and Liz according to the Pix measure. Trump – like A.N.Other on the SfEP side – needs no additional comment.
[Acknowledgment: I am extremely grateful to my three fellow SfEP writers for their permission to use their work and faces in this fun exercise.]
It wouldn’t surprise me if the answer to this question is: nothing!
There is no magic or even anything really about this Pix measure; it is just using other people’s data (namely Yoast SEO – shout out again to them) and combining it in a way that enables simple comparison of different authors’ writing styles in a single measure. And which seems to roughly bear out intuition if you look at the results of the test match.
Perhaps a more meaningful test would be to look at some literary samples: Shakespeare (converted to prose-equivalent), Tolstoy and Joyce, for example, would make for an interesting comparison which might surprise a few people. Watch out for further blogs in this space where we may take this forward.