You know what a cough sounds like. Do you know what does a cough look like though?

RELATED: Coughing After Eating I What Does It Mean?

What Does A Cough Look Like in Pictures?

Humans are fairly good at classification. At a very young age, we begin learning to differentiate between the sound of a door closing, a sibling laughing, a dog-barking, etc. Moreover, nobody describes to us what these things sound like. And no one gives us a guidebook on how to differentiate between them. Parents don’t “teach” these skills, per se. Instead, we learn by example. And even though every dog’s bark is different, we pick up on those underlying similarities between a poodle and a dane, thereby developing the skill of classification. After about 1,000-2,000 days on earth, humans are capable, with a very high degree of accuracy, of distinguishing between a bark and a non-bark.

Until very recently, computers did this poorly. Algorithms were guided by explicit, pre-set rules, incapable of learning through unstructured examples. But machine learning has changed this. “Deep learning,” or “neural networks,” is a technology or method inspired by how the human brain itself works. However, the artificial has caught up with humans in terms of classification abilities – in many cases, they have surpassed humans. Whereas a child can only listen to so many barks before she grows bored (or old), computers have several orders more tolerance to “learn” by example. And thanks to the combination of large datasets and ever-increasing processing power, computers can learn to distinguish patterns that are undetectable to us mere mortals.

Image classification is the area in which there has been most progress in recent years. You’ve probably seen it, for example, when Facebook suggests that you tag a specific person based on a photo. You’ve also likely contributed to it. For example, in google security checks where you identify photos with certain characteristics – they’re not only checking that you’re human, they’re also using you to do free labor (classify images to “teach” their computers). Image classification can do seemingly trivial tasks – like distinguish between two faces on an Instagram post. It can also do things like detect tumors invisible to the eye or guide a self-driving car.

A picture of sound

Audio is a bit trickier than image classification. But image classification techniques can be leveraged on sound by converting sound to sight. That is, one can convert an utterance (a word, a cough, a song) to a visual representation, and use that visual representation in machine learning.

The most common way to visualize sound is the spectogram. It shows pitch (or frequency) on the y-axis, time on the x-axis and uses color to represent intensity (volume). It’s a fairly simple visualization, but also very unfamiliar to those of us who are used to hearing, not seeing, sound.

Pretty cool, right? Sound is kind of beautiful to see. But it’s also very information-rich. When you hear a piano, you perceive the melody, the up and down movements of pitch, the silence between the keystrokes, and perhaps the tone of the instrument itself. But you only perceive a small subset of all of the sound data that is generated. In fact, human cognition actively filters out most information – it’s a bandwidth thottling function, and without it we would be simply overwhelmed with our senses. Perhaps you’ ve seen this video of a basketball game, which shows just how good (or bad) are brains are at filtering information.

Okay, so sound is pretty to look at, and it contains a lot of information in its visual form. But what does that have to do with coughing and what does a cough look like?

But to get there, we have to turn sounds into pictures. Let’s have a look.

Keyboard Clickety-Clackety

Here is some clickety-clackety on a keyboard:

You’re probably used to seeing the wave-form of sounds (on the left), and not the spectogram (right). In both, one can distinguish clearly the 4 clicks. But volume only is not sufficient for high-quality classification. We also need frequency (pitch). Unlike a wave-form plot, a spectogram shows a third dimension (through color).



A Baby’s Squeal

Let’s have a look at another sound: a baby’s squeal:

In terms of decibels-only, it looks like this:

But the primary differentiator between a baby’s squeal and an adult’s (do adults squeal?) is not volume, but pitch. Thus, the utility of the specotgram.

What Does A Cough Look Like

Let’s have a look at what a cough looks like. Let’s check 3 coughs, from 3 different people. You’ll note some similarities, both listening and seeing, across all coughs. They start with an explosive increase in volume, and fade-out more slowly than they fade-in.

Cough 1 (below) is a prolonged cough with a slight uptick in volume towards the end, withone final expiratory contraction from the diaphragm.


Cough 2 (below) is more archetypical. A steep abrupt explosion in sound followed by a disminuendo.

Cough 3 (below) is also fairly typical, albeit less pronounced than 2, both in terms of duration and decibel variation.

What’s notable between coughs 2 and 3 is how similar they are in terms of decibel profiles, but how different they are in terms of frequency.


Maybe by now the novelty of seeing sounds as spectograms has worn off. But that feeling you have – that desire to go do something else rather than keep staring at spectograms – computers don’t get that feeling. And that’s why we use computers, and not humans, to do deep learning. They can look at thousands and thousands of images of sounds and detect patterns in them, patterns which we are neither patient nor detailed-oriented enough to perceive. Once they’ve seen enough examples, they can “predict” on images (made from sounds) they’ve never seen.

Just as a child can hear a bark from a dog species they’ve never encountered and still say “that’s a bark”, a computer can be trained to detect a cough and, with time and sufficient examples, perhaps differentiate between different kinds of coughs. There are a lot practical applications to this, ranging from diagnostics, to medication adherence, to public health surveillance. But it all starts with turning sound into pictures.

Leave a Reply

Your email address will not be published. Required fields are marked *

Post comment