In ordinary English, the word ‘enhancing’ means ‘to further improve the quality of (something that is already good)’. That doesn’t quite fit its usage in forensics though, where the whole point is the audio is poor quality to start with. Let’s look in a bit more detail at what ‘enhancing’ means in forensics.
In relation to audio, it was originally, and is still most commonly and appropriately, used for the set of processes applied by audio engineers to movie sound tracks, music recordings, advertising videos, etc. In these cases, they start with a really good audio recording, but want to make it sound even better, to enhance the enjoyment of listeners.
You are probably familiar with the concept from photography. Professional photographers and graphic designers are employed to take already good photos, and make them look stunning.
Here’s a couple of simple examples from a photography website. You have possibly done this kind of thing yourself.
If you have, you’ll know that the key to great photography is starting with a good original. Garbage in, garbage out, as the saying goes.
All of the above applies perfectly to audio. The only difference is that far fewer people are aware of the capabilities and limitations of audio enhancing.
A similar, but different, set of processes can be applied to degraded recordings such as old records or broadcast materials, with the aim of restoring them to their originally good quality. This situation is sometimes referred to as ‘enhancing’ though the correct term is ‘restoration’.
Again it is usually performed on music, but sometimes also involves speech. Here are some examples. You’ll notice immediately that the scope for improvement is far less with degraded audio, and results are generally (with some exceptions) far less spectacular.
And again, the analogy with photography is quite useful. If you have ever tried restoring old photos, you’ll know it is quite a delicate operation. You need a range of different techniques, that you apply on a ‘try-it-and-see-how-it-looks’ basis. It is really more of an art than a science. You certainly can’t just put seriously degraded photos through a set of automatic filters and expect a great outcome.
All the same points apply to audio.
From civilian to forensic applications
When the legal system started to use covert recordings on a regular basis, the problem arose almost immediately that the audio was often of very poor quality.
From the point of view of common knowledge, this seems like a similar situation to the ‘restoration’ described above, and on that basis law enforcement turned to audio engineering for assistance with covert recordings.
Some engineers were happy to transfer their skills to this new task, and the term ‘enhancing’ transferred with it as the most common way to refer to any process of attempting to improve the intelligibility of forensic speech recordings.
Unfortunately, however, from the point of view of phonetic science, the situation with poor quality is completely different to those of either enhancing or restoration. Failure to recognise that has created some major anomalies in the handling of indistinct covert recordings.
The term ‘enhancing’ is clearly inappropriate for poor quality audio, since it implies starting with a product that is at least fairly good in quality. ‘Restoration’ is not quite right either since it implies that the product was once of good quality but has become degraded through some process.
But neither of these are the main issues that separate civilian from forensic scenarios, which are different in at least two crucial ways.
The big differences
First, forensic audio deemed to require ‘enhancing’ is generally of far poorer quality than even the degraded recordings that are the subject of ‘restoration’.
The second difference is more important, however. In normal enhancing or restoration scenario the engineers have a clear idea of what the audio is supposed to sound like. The question of what words were said is never an issue. The issue is to make the words sound clear, sharp, beautiful, or compelling.
In forensic contexts, the issue with poor quality audio is precisely that it is not clear what was said. Let’s revert to our photography analogy to see what that means. Suppose you gave this picture to a photography expert to ‘enhance’. It could make quite a difference whether the expert knew or assumed it was a duck or a rabbit (or something else completely).
With forensic audio, the last thing we want is for engineers to enhance covert recordings according to their own idea of what was said, to create a compelling effect on those who listen to the audio.
The courts want the enhancing to make clear what was actually said, so that objective listeners (in particular the jury) can interpret it and evaluate its weight as evidence in the trial. That is very different from enhancing a known sample – and far more difficult, as we demonstrate in this module.
Who does the ‘enhancing’ of forensic audio?
Forensic audio ‘enhancing’ (or ‘restoration’, or ‘processing’) is a vast industry. Try googling to see for yourself the numerous commercial agencies advertising these services. For a small sum, you can even invest in some of the widely advertised software applications for forensic audio processing, and set yourself up in business. If you want some training, you can buy that online too.
Police forces have in-house agencies that deal with covert recordings. They are not so easy to find by googling.
In the vast majority of cases, this kind of work is done by audio engineers. Audio engineering is the science of sound. Speech is a kind of sound, but it is a very special kind of sound, that is perceived in very special kinds of ways, and is studied by a quite different discipline, phonetics (a branch of linguistics). Audio engineers typically have no knowledge at all of phonetics. Indeed many are technicians (often from music and entertainment industries), with limited scientific training.
Most audio engineers who undertake forensic work are well aware of the limited effectiveness of the ‘enhancing’ techniques they use. They can be classified into several broad groups.
First, there are well-qualified, reputable technicians, who indicate clearly that the processes they use can at best make audio more ‘listenable’, and rarely if ever improve intelligibility. Unfortunately, these distinctions are rather opaque to end-users, who are keen to try anything that might make their audio evidence clearer.
Next, there are poorly qualified technicians, who just do the jobs they are given, using the techniques they have been told to use.
As well, it must be said, there are audio engineers who deliberately exploit the ignorance of their clients, using fabricated before-and-after examples to tout their wares.
In a tiny fraction of cases, ‘enhancing’ is done by fully qualified phonetic scientists, simply using (perhaps unfortunately) the term that has become standard in the law. These typically work on small, one-off samples, using techniques completely different from those used by audio engineers of any level of expertise, and reporting their methods and conclusions in completely different ways.