How can you tell if ‘enhancing’ has been effective? That might seem like a very straightforward question. You simply listen to it to see if the ‘enhanced’ version sounds clearer.
That makes sense on the basis of common knowledge. However, from the point of view of phonetic science, it is quite incorrect. Hopefully if you have been studying Rethink Speech 101: Unlearning, the reason is plain.
What makes speech seem ‘clear’ is not just the nature of the speech signal itself. Indeed, the nature of the speech signal itself plays a surprisingly small role.
Far more important than the signal itself are the expectations and assumptions of the listener.
In fact, the easiest way to ‘enhance’ the clarity of an indistinct speech recording, by a long shot, is to provide a transcript (or even just a hint) to tell the listener what is said.
The problem is, as discussed in detail in the Forensic Transcription module, that works even if the transcript (or hint) is inaccurate.
Another factor that contributes to an indistinct recording sounding ‘clear’ is listening to it repeatedly. Each time you listen, you hear more detail, confirming your perception and making it sound clearer and clearer – to the point it becomes hard to believe it is not clear to others.
Again, the problem is that this effect is just as strong whether you have an inaccurate interpretation or an accurate one.
That may seem hard to believe on the basis of common knowledge, but it is well established by science. You can listen many times, becoming more and more confident of your perception – and yet be wrong every time.
A better way
The only way to be sure whether ‘enhancing’ has objectively improved the intelligibility of the audio is to do an experiment in which the original and ‘enhanced’ versions are played, without context, to two groups of listeners who are asked to write down what they hear.
If those who listen to the ‘enhanced’ version hear its content more accurately than those who listen to the original, then we can deem the ‘enhancing’ successful. Of course, the problem in the forensic context is we don’t know what was really said – so we don’t know which listeners (if any) hear accurately. Nevertheless, testing like this can at least help to demonstrate what effect (if any) the ‘enhancing’ has had.
However nothing like this is ever done for audio used in trials. If it were, we can be sure that very very few ‘enhancements’ would be deemed successful.
Here’s a small experiment based on real audio used in a real case to back up that claim
Unlike some other cases, this audio was ‘enhanced’ by a reputable audio engineer, who gave evidence as to exactly what processes had been used to alter the audio. The problem was, no evidence was led as to the effect of the processing on perception.
When this was questioned by a phonetician, the judge ruled that there was no need to hear from an expert as speech perception was a matter of common knowledge. He then listened to both versions of the recording to reach his own conclusion. Yep – more material for an experiment …
Here is an example of a section of audio that was crucial to the trial, given in the original and ‘enhanced’ versions. Without knowing which is ‘enhanced’, do you think one is objectively clearer than the other?
Participants were given a very brief introduction to the concept of ‘enhancing’, and to the distinction between listenability (how easy or pleasant audio is to listen to) and intelligibility (how well listeners can understand the words). They were told they would listen to audio samples treated with two different methods of ‘enhancing’, in order to help scientists determine which was more useful. In fact, they listened to the original and the ‘enhanced’ version.
Longer versions of the examples above, along with several other 1-minute samples, were played in random order. Participants were asked to listen to each version once only, and rate each for listenability and intelligibility.
Then they were randomly assigned to hear one or other version, and asked to listen as many times as they wished, and write down what they heard in the audio. They were then given additional context and background about the case, and asked more questions about what they heard.
For all samples, the original recording was subjectively rated by participants as considerably more ‘listenable’ and somewhat more ‘intelligible’ than the ‘enhanced’ version (i.e. the exact opposite to how the judge evaluated them – and the exact opposite of the intention of the ‘enhancing’).
Despite these differing subjective opinions, when tested for what they could actually hear after this first hearing, none of the participants could write down any words from either recording.
Participants’ comments suggested that in many cases, even those who rated the ‘enhanced’ version as more listenable did so simply on the basis that it was louder than the other, and thus seemed physically easier to hear, not noticing that the techniques used had actually degraded the quality of the speech.
In rating the audio as more intelligible, participants expressed the belief that, though they couldn’t hear it well now, they would be able to make out the words if they listened repeatedly.
The last part of the experiment showed this was not true. Without context, few could make out any words at all in either the original or the ‘enhanced’ version, even after repeated listening, and none heard anything remotely like what the police transcriber thought was said.
With some context about the case, participants could make out a few more words (though again, none heard anything like what the police transcriber thought was said).
There was also some evidence (though the number of participants was too small for full confirmation) that those listening to the original recording were able to hear more of the (few) words that could be confirmed by a phonetics expert, while those listening to the ‘enhanced’ version could still make little sense of the audio – with some commenting it was harder to hear than they had expected on first listening. Recall that participants in this experiment did not know which version was the original – they thought they were listening to two versions ‘enhanced’ by different methods.
What does it all mean?
This experiment adds weight to mounting evidence on two important points:
- ‘enhancing’ generally does not objectively improve the intelligibility of indistinct audio
- knowledge that one version is ‘enhanced’ can make listeners believe it is clearer when it is not objectively easier to interpret.
The latter effect is greater if, as is often the case, they have the transcript in advance, and listen to the ‘enhanced’ version after the original. Repeated listening is known to make audio seem clearer, so this scenario will naturally incline listeners to hear the second version more strongly in line with the transcript – and they may well attribute this apparent improvement in their perception to the ‘enhancing’.