Bridge and Water

A Sound Experiment: Writing By Voice And Ear Alone

This blog is a sound experiment.

It will be written entirely by voice recognition – with keyboard-based editing at the end – and it will be written in less than an hour.

I don’t know how many words it will be but I am aiming for 1000. Another aim is to achieve a plain English standard based on Flesch readability metrics – without trying too hard.

I have two main objectives: (1) to test the efficiency of writing by voice recognition alone; (2) to see if the ‘sound test’ really does work. I am always telling my City, University of London students to listen out loud to their own work, to hear themselves reading it back. Is the ear really a better judge than the eye? Maybe I will know more by the end of this blog.

I’ll share some of my thoughts on that at the end and I hope you will give me yours too.

The writing factory

I learned to write talking into a dictaphone. Before word processors were commonplace I worked in the Chief Registrar’s Office (CRO) of the Bank of England, dictating letters to government stockholders about all sorts of things. Dividends, transfers, death (sorry, probate), legal problems, any other kind of problems. We sat in a big office with a desk of case files in front of us. If a problem was too complicated or delicate to be resolved with a standard pre-printed letter (a ‘roneo’) it would be sent to the CRO with instructions. Our job was then to write a beautifully clear, accurate, polished, professional letter – reflecting the high standards of the Bank of England – to resolve whatever the more complicated or delicate problem was.

It was a writing factory. We were expected to write 30-70 letters a day, straight into the typing pool and back out to be quickly checked, signed and dispatched. You got into the habit of forming sentences in your head and putting them straight into their final written form, via an accomplished typist.

There was no cognitive blockage, no interference from uncertain fingers, no constant temptation to edit and re-edit. It was the first take every time. Because of that the pressure was on to get it right in the first take.

Microsoft Windows VR

The habit has stayed with me. I recently discovered the Microsoft Windows 10 voice recognition (VR) suite, a free desktop app which enables you to write by just talking to your computer. I used it to finish some of the more difficult chapters in my book manuscript and I’m starting to use it more and more for day-to-day writing.

It’s the same process as writing letters in the CRO those years ago. Just talk. Talking is easy. Easier than writing. We can all talk, some of us perhaps too much.

Unfortunately if you swallow or garble or even in my case just talk normally, Windows 10 VR will often throw something back at you which is completely unrecognisable. And often it will not recognise punctuation – brackets for instance. So the instruction “open brackets” – “(” – comes back on screen as “open brackets” – not particularly helpful. Then again, the other thing I am always telling my students is to edit their pieces within an inch of their life: the pieces’ life that is not the students’. So, editing is needed. And a strong editing hand at that. That’s OK. Fulfilling my own guidance I will take an hour to edit this piece having spent an hour writing it by voice. And hopefully will not miss any crude approximations made by the software – such as the word “police” for the word “voice” just then.

One thing I do know from my classes is that writing by voice is becoming increasingly popular. Many people use their phones and – however much we may laugh at it – the voice recognition software is improving.

Writing in the digital age

When you think about it, conversational style is a suitable motif for writing in the digital age. Writing these days is all about engagement, about impact, about triggering your readers’ interest before they “swipe left”. Writing by speech, on the face of it, should suit those needs very well. Podcasts and videos are increasingly popular modes of content delivery, for the same reasons. They engage the ear.

Steven Pinker in his book The Sense of Style: The Thinking Person’s Guide To Writing In The 21st Century talks about a mode of writing which sounds as if everybody should have heard of it. In fact I think many people haven’t, but it is a fascinating idea nevertheless. The mode of writing is called “classic style”.

The basic idea is that writing – good writing – is all about averting your reader’s gaze to something – an idea, a concept, a concrete thing – which you as the writer have seen but your reader has not yet seen. It is based on an equal, respectful, unpatronising relationship between you and the reader. You are on an equal footing, and your piece is like a conversation between friends.

That conversational aspect of “classic style” struck a chord with me as I rediscovered the art of writing by voice. Conversational is good in the digital age, and if it is good enough for Steven Pinker it is good enough for me.

The sound test

But what about accuracy? Is it really true that by reading something back at yourself you can find errors, problems, structural flaws, misplaced words, incorrect vocabulary and many other problems? I still don’t really know. It depends a bit on your auditory skills. Not everybody hears things as well as they see things: my 23-year old daughter is a good example, and she has a degree in Archaeology and is studying at RADA so it hasn’t held her back too much in her fields. In general, though, our ears are well attuned to finding problems. An incorrect pronoun, or an instance of subject-verb disagreement, or the “wrong” preposition: that’s the kind of thing the ear will usually identify.

It isn’t an easy discipline to get used to. In an open plan office it’s almost impossible: you would be drummed out. In one of my corporate jobs I was in the habit of reading back to myself particular kinds of copy, especially more complex copy. It drove my neighbour nuts.

I think the answer is to be selective. Don’t try and do it for everything, but for the things that really matter, try and find the opportunity. In your bedroom, as a kind of grown-up alternative to miming David Bowie songs with a hairbrush. Or in the bathroom, in front of the mirror, your words resonating across the tiles. Perform. How good your message is, your coherence and the impact of what you’re saying will rapidly become apparent, as well as giving yourself a chance to find the errors. Just try it. It could become almost as much of a habit as talking to your computer.

So how did it go?

Speaking of which, my time is almost up. Microsoft Word is telling me that I am well over the 1000 words target. Up to this point it has taken 52 minutes, so I have precisely 8 minutes to finish off and get to work on the editing.

I am going to publish it raw, with only obvious (“police”/”voice”) robot errors corrected.

It has been a fascinating experience. The Flesch Reading Ease measure is 68, squarely in the plain English category and a lot more “readable” (by that measure) than my usual efforts. The number of longer sentences is 28%, and the number of sentences with a passive verb is 10%: both are close to or within Yoast SEO guidelines and in line with my usual returns from this kind of blog. And the number of sentences with transitional words is 32%, within Yoast SEO “good” guidelines. Overall I would say it has turned out pretty readable according to the metrics.

I guess the proof will be how it turns out on the page and screen, whether people read this far, and what they think of it. Maybe the VR experiment will continue to grow. To my ear that sounds a good idea and the ear is a better judge than the eye.