An experiment that lets you experience ‘enhancing’ yourself

This was a very simple experiment, designed to demonstrate two important points about ‘enhancing’ forensic audio which, though well known in phonetic science, are often misunderstood in the legal system, and beyond.

  1. Techniques that may be effective in improving the quality of overt recordings (those made openly, with objectively known content) do not necessarily transfer well to covert recordings (made secretly, with unknown or disputable content).
  2. Techniques that may appear to make an indistinct recording clear-er (in the sense of ‘less noisy’) do not necessarily make it clear (in the sense of ‘intelligible’).

For these and other reasons, it is essential in forensic contexts to test the effects of ‘enhancing’ objectively, with careful management of ‘priming’ effects.

What did we do in the experiment?

Materials

The experiment started by presenting two short samples of an overt recording with known content. One was the indistinct original. The other was modified via standard ‘enhancing’ techniques of a kind often used on forensic audio, and presented (in good faith and in a low-stakes context) as an example of how these techniques can make a recording clearer.

Method

Participants were asked to listen once only to each recording and see which seemed, on first impression, to be ‘clearer’. Here are the two samples. You might like to jot down which seems clearer to you after listening once each.

Sample 1

Sample 2

The next screen of the experiment presented one of the samples (randomly chosen), asking participants to listen as many times as they liked and transcribe what they thought was said. If you would like to try it, you have both samples above, so choose whichever you prefer. It is a good idea to jot down your transcript so you have an objective record before reading on.

Results in brief

In total, about 60 people did the experiment, with 30 transcribing the original and 30 the enhanced version.

These results have now been officially published as part of: Fraser, H. 2018. “Enhancing” forensic audio: false beliefs and their effect in criminal trials. Australian Journal of Forensic Sciences. The account below includes a few additional details and illustrations.

Which version was found ‘clearer’?

Overall, 37 of the 60 participants (62%) found the enhanced version ‘clearer’, while 23 (38%) thought the original was ‘clearer’.

How clearly did participants actually hear the content?

Overall, no participants interpreted either the original or the enhanced version remotely correctly.

Whichever sample may seem ‘clearer’, neither of them is actually ‘clear’ enough for listeners to determine its content in the absence of knowledge (or assumptions) about what was said.

Why does it matter?

Widespread false belief in the possibilities for ‘enhancing’ to make unintelligible audio ‘clear’ is part of the mistaken ‘common knowledge’ about speech in our society that makes it so easy for indistinct forensic recordings to be interpreted incorrectly in our criminal justice system.

We need to spread the word to be sure as many people as possible become aware that just because an indistinct recording may sound ‘clearer’ after processing does not mean it has being heard accurately.

More detailed results and discussion

While the experiment was successful in making the very general demonstrations it aimed at, it is very informal in its design (in particular, we have no details about participants). It would be a mistake to read too much into the detailed results – but they are quite interesting, so here they are for those with time to explore.

The experiment did not explicitly provide any context or background information about the recording. The fact the experiment appeared on a website about forensic phonetics possibly suggested a ‘forensic’ context to some listeners. However, this is not a forensic recording.

The audio can be divided roughly into two utterances. We can look at them in order.

1st utterance

22 of the 60 participants (37%) heard the word ‘fish’, which was in fact spoken. This was by far the most accurately heard word in the entire recording, though the surrounding words were transcribed variously, with only one correct response:

  • Molly’s fish
  • Polly fish
  • Hobbies fish
  • Bobbie’s fish
  • I’ll use fish
  • What are these fish?
  • All these fish
  • This is fish.

Interestingly, 15 of the 22 who correctly heard ‘fish’ were listening to the enhanced version (50% of that group of 30), and one of these was the only participant to transcribe the first utterance correctly.

By contrast, only 7 who correctly heard ‘fish’ were listening to the original (23% of that group), with none hearing the full utterance correctly.

However, please read on before leaping to the conclusion this indicates the ‘enhancing’ had successfully made the speech clearer.

The next most commonly heard phrase in the first utterance was ‘police station’, which was not accurate. This was erroneously transcribed by 6 people, of whom 5 were listening to the ‘enhanced’ version.

10 participants heard a range of other phrases, such as the following:

  • this is special
  • what is this?
  • the copies switched
  • switched on
  • it’s fiction
  • the platitude
  • English
  • restriction.

22 participants found the first utterance uninterpretable. Interestingly, 16 of these were listening to the original version. By contrast, all but 6 listening to the enhanced version provided a transcript.

Again that might seem at first to indicate the ‘enhancing’ had made an improvement – until you recall that only one of the 24 who offered a transcript of the ‘enhanced’ version heard the utterance correctly. All the others were guessing, sometimes wildly.

Arguably, in a forensic context, ‘uninterpretable’ is a more reliable transcript than a wrong guess.

Second utterance

Here no one heard anything remotely like the actual words spoken.

Overall, 52 of the 60 (87%) heard phrases including one or two uses of the word ‘time’. In fact the word ‘time’ was not used in the real utterance.

The most common interpretation was ‘I need some time’ or ‘I’ll need some time’ (perhaps reflecting the stressful lives of our participants!). This was heard by 31 of the 60 (52%) participants overall, including 20 (67%) of those listening to the original and 11 (37%) of those listening to the ‘enhanced’ version.

Others heard other phrases involving ‘time’ or words like ‘tone’, ’tonight’ or ‘tight’ embedded in a wide range of different phrases. A few examples (below) will give an impression of the variety. No major differences between transcripts based on the original or enhanced versions were evident on a superficial analysis (which is really all that is appropriate given the informal nature of this experiment).

Some representative transcripts

  • Ill use fish, so ill need some time, so tonight.
  • Bobby’s fish. I’ll need some time (thyme) so for the thousandth time
  • this is special, something and tights
  • fish something Time’s quality enhancing tone
  • Police station. I’ll need some time, so call the cops on time.
  • Patty’s said sh-. Patty’s in time, like what did house in time.
  • .. the copies switched. I’ll need some time so we’ve got accounting time.
  • It’s fiction but I need some time so ? some time
  • This is fish. I’ll need some time so I’ve got a ‘planting’ time
  • Restriction. At least in time/tone so we’ve got an enhancement tone
  • I need some time so I can go out hunting tonight.
  • I need some time cause I’ve got a concert tonight.

How can these results be explained?

Take a look at the spectrograms below of the two samples.

It is pretty easy to see why the original is ‘unclear’. There is so much noise (shown by vertical grey streaks) it drowns out the patterns of the speech signal (shown by dark horizontal bars).

The ‘enhanced’ version removes some of that noise – but in doing so it leaves unnatural white ‘holes’ at random spots all over the signal. This is typical of ‘enhancing’ techniques used by audio engineers – phoneticians can sometimes (depending on a range of factors including the exact nature of the original audio) carry out more nuanced processing.

Those ‘holes’ distort the speech signal so that even though it is technically less ‘noisy’ it is still pretty hard to understand. Further they can create ‘artefacts’ that make it easier to ‘hear’ words or sounds other than those actually spoken.

We can conclude by reiterating the importance of objectively testing the effects of any ‘enhancing’ or other processes applied to forensic audio. Unfortunately such testing is rarely if ever required for audio used as evidence in criminal trials, where recordings ‘enhanced’ by audio engineers are routinely admitted as an equivalent, though somehow better, version of the original.