You know what a cough sounds like. But do you know what it looks like?
What Does A Cough Look Like in Pictures?
Humans are fairly good at classification. We begin learning, at a very young age, to differentiate between the sound of a door closing, a sibling laughing, a dog-barking, etc. Nobody describes to us what these things sound like, or gives us a guidebook on how to differentiate between them. Parents don’t “teach” these skills, per se. Rather, we learn by example. And even though every dogs’ bark is different, we pick up on those underlying similarities between a poodle and a dane, and thereby develop the skill of classification. After about 1,000-2,000 days on earth, humans are capable, with a very high degree of accuracy, of distinguishing between a bark and a non-bark.
Until very recently, computers were not very good at this. Algorithms were guided by explicit, pre-set rules, incapable of learning through unstructured examples. But machine learning has changed this. “Deep learning”, or “neural networks” (the name and method is inspried by how the human brain itself works) have not only caught up with humans in terms of classification abilities – in many cases they have surpassed humans. Whereas a child can only listen to so many barks before she grows bored (or old), a computer doesn’t mind the tedious nature of learning by example. And thanks to the combination of large datasets and ever-increasing processing power, computers can learn to distinguish patterns which are undetectable to us mere mortals.
Image classification is the area in which there has been most progress in recent years. You’ve probably seen it, for example, when facebook suggests that you tag a specific person based on a photo. You’ve also probably contributed to it, for example, when a google security check asks that you identify photos with certain characteristics – they’re not only checking that you’re human, they’re also using you to do free labor (classify images so as to “teach” their computers). Image classification can do seemingly trivial tasks – like distinguish between two faces on an instagram post, but it can also do things like detect tumors invisible to the eye or guide a self-driving car.
Audio is a bit tricker than image classifiction. But image classification techniques can be leveraged on sound by converting sound to sight. That is, one can convert an utterance (a word, a cough, a song) to a visual representation, and use that visual representation in machine learning.
A picture of sound
The most common way to visualize sound is the spectogram. It shows pitch (or frequency) on the y-axis, time on the x-axis and uses color to represent intensity (volume). It’s a fairly simple visualization, but also very unfamiliar to those of us who are used to hearing, not seeing, sound.
Take a look at these examples from the Chrome Music Lab.
Pretty cool, right? Sound is kind of beautiful to see. But it’s also very information-rich. When you hear a piano, you perceive the melody, the up and down movements of pitch, the silence between the keystrokes, and perhaps the tone of the instrument itself. But you only perceive a small subset of all of the sound data that is generated. In fact, human cognition actively filters out most information – it’s a bandwidth thottling function, and without it we would be simply overwhelmed with our senses. Perhaps you’ ve seen this video of a basketball game, which shows just how good (or bad) are brains are at filtering information.