Flesh of my Flesch: Readability Matters

This blog explores the concept of readability.

It looks at some common readability metrics – notably Flesch Reading Ease, Flesch-Kincaid and a couple of others – and wonders why as writing and editing professionals we don’t make more use of them, to promote our skills and measure our own performance in an objective way.

The metrics are now widely available in Search Engine Optimisation (SEO) tools and word processing software – including Microsoft WORD.

There are whole websites devoted to readability, with free tools to check your own content in lots of different ways.

The tools themselves are objective, intuitive and easy to understand. And they function well for relative comparisons – across multiple authors or over time – even if you can argue about their absolute value.

We already use them at Prism-Clarity to assess our own blogs and try to make them more readable. The argument now is to try using them for other purposes, and to get more people tuned in to them.

Stock Pen Nib

Why do we need metrics?

A fair question. We know good writing when we see it. And bad. I was told by a prominent professional editor that only once in 30 years had a publishing client even mentioned readability metrics to her, and then only in passing.

The perception is that if you use such tools to simplify your writing you end up over-simplifying – or just being plain boring. Far better to aim to write like a human, irrespective of tools and metrics. Especially if you are writing for an educated audience. Let good clean writing (and editing) stand up for itself. English is a subtle beast and can’t – shouldn’t – be controlled by algorithms.

These are valid, powerful arguments and I would not contest most of them.

Still, I have an inkling we could make more of these metrics; add them to our armoury; hold them in the background to give us constant information about the work we are producing or processing.

I’m going to look at a few separate but related metrics, define what they are and what they might each be trying to measure.

Then finally I suggest some possible ways they could or should be used, with a stylised illustration from my own experience.

Along the way I include some links to useful websites which explain further, and can help you check your own readability.

prismclarity16_036

What metrics?

There are numerous metrics out there, but all trying to do approximately the same thing: take some elements that are easily measurable – number of words, number of syllables, number of sentences – and convert them into an index of readability that makes intuitive sense.

We cover:

  • Flesch Reading Ease
  • Flesch-Kincaid
  • FOG and SMOG
  • Simple percentage metrics

At this point I should acknowledge some credits, to people who have helped me discover these tools and are far more knowledgeable and expert on their uses and limitations than I will ever be.

First Yoast SEO, the embedded WordPress SEO tool I use, includes Flesch Reading Ease as a standard metric. Yoast/Flesch was my first introduction to readability metrics, and led me on to the related Flesch-Kincaid grade level. More on both of these shortly.

Yoast also routinely uses two simple percentage metrics: (i) ‘long’ sentences as % of total; and (ii) sentences with passive voice as % of total.

These are more controversial, arguably, than Flesch – as they involve a value judgment on the ‘acceptable’ rate of using long or passive sentences – and a judgment on what is ‘long’ in the first place. Still, we look at these too.

Next, the good folks at thewriter.com introduced me to two widely-used alternative metrics: the Robert Gunning ‘FOG’ index and the Harry McLaughlin ‘SMOG’ index, widely known as an acronym for Simple Measure Of Gobbledygook.

Finally I was indebted to Brian Scott’s readability formulas website for a detailed account of SMOG, alongside explanations of Flesch, FOG and numerous others, in what is a paean to the whole concept of readability.

stocksnap_cydqk075ww

Flesch Reading Ease

Flesch Reading Ease is widely used as a readability checker. It is embedded in Microsoft WORD, and is the readability tool of choice for the popular Yoast SEO engine. Both Flesch Reading Ease and Flesch-Kincaid (covered below) are also used by the US military for checking technical document readability.

The basic idea of Flesch is to measure two easily identifiable elements – the ratio of words to sentences and the ratio of syllables to words – and then scale them into numbers that can be described and understood intuitively.

Higher scores indicate material that is easier to read. Lower numbers indicate material that is more difficult to read.

Flesch Reading Ease formula

206.835 – 1.015 * (Total Words/Total Sentences) – 84.6 * (Total Syllables/Total Words)

The key elements are the ratios in brackets. The other numbers are simply used to translate these ratios into an approximately understandable scale.

The scores from this metric can be interpreted as shown in the table below:

Flesch Reading Ease: Definitions
Score School Level Notes
100+ Up to Year 5 Extremely easy to read.
90–100 Year 6 Very easy to read. Easily understood by an average 11-year-old student.
80–90 Year 7 Easy to read. Conversational English for consumers.
70–80 Year 8 Fairly easy to read.
60–70 Years 9-10 Plain English. Easily understood by 13- to 15-year-old students.
50–60 Years 11-13 Fairly difficult to read.
30–50 Undergraduate Difficult to read.
0–30 Postgraduate Very difficult to read. Best understood by university graduates.

A simple illustration is artificial but – to give you an idea how this works – let’s do it anyway.

Take ‘the cat sat on the mat’. One sentence, six words, six syllables. Plugged in to the Flesch Reading Ease formula this simple sentence gets a score of 116 (extremely easy to read): 206.835 – 1.105 * (6/1) – 84.6 * (6/6) = 116.

Flesch Reading Ease is useful for comparing the complexity or simplicity of different people’s writing against each other – i.e. as a relative comparison – or for comparing against pre-defined standards where the absolute level of the number doesn’t have to follow intuition too much.

For example, as thewriter.com points out, some countries have legislation specifying that official or consumer documents, such as insurance policies, have to be within a certain range on the Flesch Reading Ease scale. The trend towards greater clarity and comprehensibility for important consumer documents is growing and welcome, and readability metrics are already part of that trend in some countries.

A score of 60 is a good target for plain understandable English on this measure. The present author aims to achieve that score on his blogs – or at least the less technical ones – though in truth only rarely succeeds.

prismclarity16_078

Flesch-Kincaid

Flesh-Kincaid is an extension of the elements and concept introduced in Flesch Reading Ease. Like its cousin it is widely used, including as an embedded tool in Microsoft WORD.

It uses the same core elements but scales them in a different way to try to improve the intuitive quality of the result. The numeric scale it converts to is US school grade, which is one number less than UK school year (US grade 6 = UK year 7).

So Flesch-Kincaid does away with the need for translation we saw in Flesch Reading Ease, by returning a metric that is understandable on its own.

Flesch-Kincaid grade level formula

0.39 * (Total Words/Total Sentences) + 11.8 * (Total Syllables/Total Words) – 15.59

If we compare the example we used above for Flesch Reading Ease – ‘the cat sat on the mat’ – we get a Flesch-Kincaid grade level of -1.45. So much for intuition!

In fact this is not unexpected. A one word one syllable sentence (‘Oi!’) scores -3.4 on this measure. Which just demonstrates that for a realistic illustration of this measure you need a more meaningful sample than a single sentence of one syllable words.

We can build a more meaningful example by doubling the number of words and syllables but keeping them as a single sentence. Let’s try ‘seven felines relaxed under comfy blankets before running away across rainy rooftops’. One sentence, twelve words, 24 syllables. This gives a Flesch-Kincaid grade level of grade 8 (UK year 9) which seems more intuitive. But it also gives a Flesch Reading Ease measure of 25 (very difficult to read) which seems a bit overdone.

Perhaps these various scores underline the problem with these metrics. Different metrics might be valuable for different purposes but are not necessarily reliable on their own.

Accepting this limitation, a Flesch-Kincaid score of 7-8 (US 7th to 8th grade, UK year 8-9) corresponds roughly to a Flesch Reading Ease measure of 60. So 7-8 might be our target for plain understandable English using the Flesch-Kincaid measure.

stocksnap_yz7wfadwy7

Other Measures

For completeness we now take a quick look at some other metrics.

Gunning FOG

The FOG readability index was developed after the war by the American publisher Robert Gunning, who thought newspapers and business documents in general were too complex and ‘foggy’. Like Flesch-Kincaid it returns an American school grade level but using a slightly different method.

The different element with Gunning FOG is that it takes account of the number of words with more than three syllables (with exceptions for proper nouns, verb participles and the like) as a percentage of total words.

Gunning FOG formula

0.4 * {(Total Words/Total Sentences) + (Polysyllable Words/Total Words * 100)}

Using a single sentence as an example is once again not the most meaningful exercise – and not statistically valid – but we will demonstrate for illustrative purposes anyway.

Our revised example sentence above (‘seven felines relaxed under comfy blankets before running away across rainy rooftops’) has no words of three syllables or more so returns a FOG score of 0.4 * (12 + 0) = 4.8. In other words, US fifth grade.

Inserting one three-syllable word (say ‘seventeen’ instead of ‘seven’) returns a score of 8.1, while inserting two longer words (‘seventeen’ as above and ‘comfortable’ instead of ‘comfy’) returns 11.5.

What the folks at thewriter.com say about Gunning FOG is that it is most valuable as a second opinion if your Flesch-Kincaid measure goes haywire, for example returns a negative score as in our earlier example.

Like Flesch-Kincaid a good target for plain understandable English under Gunning FOG might be around 7-8, while a score above 12 would be deemed hard to read.

As an aside, the Bible and Shakespeare are alleged to have a Gunning FOG index of around 6, while Time Magazine and the Wall Street Journal are around 11.

PrismClarity16_005

SMOG

Harry McLaughlin’s SMOG readability formula was developed before computers started dominating our lives. It is a simplified measure along similar lines to Gunning FOG, based essentially on the incidence of three-or-more-syllable words in a sample of the text being assessed.

There are two ways of calculating it, one where a sample of at least 30 sentences is available, and one for shorter pieces. We will briefly cover both.

If at least 30 sentences are available the basic idea is to take a sample (for example, ten from the start of the piece, ten from the middle and ten from the end), count the number of words with three or more syllables across the 30 sentences, take the square root of that count, and add it to 3.

McLaughlin SMOG formula (1)

3 + Sqrt (Count of Polysyllable Words in Sample 30 Sentences)

To illustrate this simply, let’s take the enhanced sample sentence we created above (‘seventeen felines relaxed under comfortable blankets before running away across rainy rooftops’).

Let’s assume that our entire sample has this same frequency of three-syllable words, i.e two per sentence. Thirty sentences then gives us 60 three-syllable words.

On this basis our piece would have a SMOG Index of 3 + Sqrt(60) = 3 + 7.8 = 10.8. In other words, US 11th grade.

The other way of calculating SMOG is where you don’t have a 30-sentence sample to look at. In this case you simply take the average number of three-or-more-syllable words per sentence, multiply that by the number of sentences you do have, and add to the total number of three-or-more-syllable words in your sample.

McLaughlin SMOG formula (2)

Count of Polysyllable Words in Sample + {Average of Polysyllable Words per Sentence * Number of Sentences in Sample}

We then apply this result to a pre-defined conversion table as set out in readabilityformulas.com.

In our simple example we have one sentence with two three-syllable words. Plugging this into the alternative version of SMOG we get 2 + (2 * 1) = 4. Using the above conversion table we get a result of US grade 5 for this one-sentence sample.

The difference compared to the earlier SMOG result, presumably, is that a single sentence with two three-syllable words is straightforward for a 5th grader to handle. Whereas a sequence of 30 such sentences has a cumulative effect and is harder to read; thus more suitable for the 11th grader.

This is very well, but as in the earlier examples we should be wary of drawing too many conclusions from these very limited samples. They are purely to illustrate the concepts behind each metric.

Coffee

Long Sentence % and Passive Sentence %

We finish this section with a note on the percentage of long sentences and passive sentences.

These are used, as mentioned, as routine metrics in the Yoast SEO tool. They have the advantages of simplicity and intuition, and the disadvantage of being subject to important value judgements.

Accordingly they sometimes generate controversy among people who rely on gaining ‘good’ SEO marks for their website material. Such people don’t agree with the value judgments that have gone into Yoast’s definitions; perhaps because their output has been scored adversely on these measures.

The first value judgment is that long sentences are bad. That a long sentence consists of more than 20 words. And that an acceptable frequency of long sentences is less than 25%.

The second value judgment is that passive sentences are bad. And that an acceptable frequency of passive sentences is less than 10%.

At best these judgments are arbitrary but we take them at face value for these purposes and do not dispute them. The line has to be drawn somewhere. We accept that the Yoast proprietor has a commercial right to draw it where he chooses, and accept where he has drawn it.

Typewriter

How Can We Use Them?

So we have all these metrics. But what could they be used for? We have already noted several times in this blog that, in isolation, the measures aren’t sufficient, could be misleading. The scores don’t tell you everything.

They’re useful background, a rule of thumb to give us an idea how readable our work is, in a broad sense. But perhaps little more than that. As thewriter.com points out, not all words with many syllables are difficult words. And not all short sentences are clear.

These subtleties are not captured – or are even contradicted – by the logic behind the readability metrics.

But in situations where you need a very broad benchmark, or where you need material to be universally comprehensible, they could be part of the writing or editing process. In some educational settings, for instance, where you want to make sure the material is at the right level for the audience. The information isn’t sufficient, isn’t even necessary, but is interesting.

For me the most compelling reason to use these metrics is where comparison is important.

Comparison across authors, for instance, where you want some reflection of the relative ease or difficulty of editing to a unified standard. Or comparison across different pieces by the same author, written for different purposes or audiences. Or comparison over time.

A stylised example

Here is a stylised summary of a template I developed for use with a client where I wanted to compare directly the intensity of editing work across different pieces written by different authors.

The editorial brief in this case was to achieve greater simplicity, cut down words, reduce the incidence of long sentences and long words, and obtain more uniformity of style across the different hands.

I used four metrics: word count, Flesch Reading Ease, long sentence % and passive sentence %. The ones I chose reflected the particular brief, but could differ case-by-case.

These numbers aren’t strictly real, but convey the underlying picture in the case I am representing here.

Stylised Example of Readability Metrics in Practice
Author Word count – good Flesch + good
Pre-edit Post-edit % change Pre-edit Post-edit % change
1 12000 7500 -38% 30 35 17%
2 6000 6500 8% 45 40 -11%
3 7000 7000 0% 30 32 7%
4 8500 8200 -4% 25 30 20%
All 33500 29200 -13% 33 34 8%
Author Long % – good Passive % – good
Pre-edit Post-edit % change Pre-edit Post-edit % change
1 46% 40% -13% 20% 15% -25%
2 27% 31% 15% 17% 18% 6%
3 32% 30% -6% 17% 17% 0%
4 32% 32% 0% 25% 23% -8%
All 34% 33% -3% 20% 18% -7%

The underlying story behind this edit was as follows:

– The first author was heavily edited: significantly cut back, simplified and streamlined.
– The second author was light on substance and written in a more simple style than the others: I ended up adding more length and passive tense to her sentences, and actually reducing her Flesch score.
– The third author was pretty much OK as she was and needed only a light edit.
– The fourth author had the right length but too many polysyllable words, which contributed to a very low Flesch score to start with, while the rest of his parameters were broadly OK.

I don’t know how valuable the commissioning editor found this analysis, but I found it useful for myself, to characterise a complex edit using objective measures. And I will continue to do it where it seems suitable and useful.

wordcloud (1)

To sum up

To sum up: there is no magic about this; it is only a tool, but in my view a valuable one and perhaps under-used by the editorial industry; perhaps even by writers and bloggers themselves.

And finally, a very quick analysis of this blog:

  • 3,000 words
  • Flesch Reading Ease score 62.8 (beating 60% target)
  • Passive sentences 7.2% (beating 10% target)
  • Long sentences 12.8% (beating 25% target).

In short – readable. But having read valiantly to the end of a 3,000 word blog, please judge for yourself!

Contact Prism-Clarity for further help or information, including advice on where to get the best advice.