What is Computer Vision (CV) and where can it be applied?
Artificial Intelligence (AI) is revolutionizing everyday life, creating a positive disruption in several industries. There are many applications of AI such as natural language processing (NLP) and machine learning (ML), but one of the key ones that hasn’t had as much exposure is known as “Computer Vision.”
Well, outside of an episode on a hit HBO show. Regardless, CV is equally as groundbreaking as all the other emerging technologies.
What is Computer Vision?
Computer Vision (CV) is defined as a field of study that looks to develop techniques that allows computers to see and understand the content of digital images like photographs or videos. CV goes beyond putting a camera on your computer or laptop. The objective is to help machines view the world like people or animals do. This is no small feat. While the camera attached to your laptop might be able to see things, CV makes the laptop understand what is being seen. One example that we’ve all known for years is the ability to turn black lines into information about a product, commonly known as a barcode.
CV is like the part of the human brain that processes what an image means. The camera is only the part that sees the image, like our eyes.
Many popular CV applications involve trying to recognize things in photographs; for example:
Object Classification: What broad category of object is in this photograph? - Grouping items into different categories like animals, people or buildings could be a basic example of this.
Object Identification: Which type of a given object is in this photograph? - If we know the image is classified as an animal, is it a dog or a cat?
Object Verification: Is the object in the photograph? - Is the object we know as a dog or a cat in the image being presented?
Object Detection: Where are the objects in the photograph? - Also known as “edge detection”, this works out the outer edge of a landscape to better identify what is in the image.
Object Landmark Detection: What are the key points for the object in the photograph? - Could be checking whether there are key patterns to recognizing the object within the image? Shapes, colors and visual indicators.
Object Segmentation: What pixels belong to the object in the image? - Pieces of the image can be examined separately for a more accurate analysis.
Object Recognition: What objects are in this photograph and where are they? - Not only detecting that an image is there, but specifically identifying what it is.
Applications of CV will often only need to incorporate one of these techniques. However, more advanced cases such as driver-less cars rely on several different methods to accomplish their goals.
Remember when CV hit Hollywood?
While to a human, these tasks might not sound hugely complicated, machines struggle when an image is in a state that they might not expect. One of the most popular examples of CV in practice is the app, “Not Hotdog” made famous by the hit HBO show Silicon Valley: Season 4 Episode 4: Not Hotdog, two years ago. The concept of the app itself is incredibly unimpressive, but when you consider the neural networks that happen behind the scenes, suddenly it falls nothing short of amazing.
The app determines whether an image is a hotdog. Yes, it sounds ridiculous. However, what about when that hotdog is in different states? For example, in a bun or out of a bun, at different angles, in a jar or replaced with a banana in a hotdog bun. The machine needs to be smart enough to recognize whether the item is indeed a hotdog. The app called Not Hotdog is amazingly accurate to a point where it can even tell the difference between a hotdog and a bratwurst. That is where machines have become impressive with image recognition.
Typically, this type of image recognition happens through processes that take all the individual pixels of an image. A machine is trained with millions of images that humans will pre-label and help them recognize if future images are hotdogs or not. For example, the AI will create a view of what should be included within an image of a hotdog and make the appropriate decision having compared every pixel. Upon meeting a minimum threshold, the machine declares the result.
"Not Hotdog" is just a fun example, but CV is making important inroads across all sectors and having a big impact.
Data, data, data
Although we tend to focus on the end-user technology and results of any such system, the truth is that they are only effective if they have quality data from the start. This goes for virtually any application that operates using AI.
Think about a machine that needed to determine whether an image was of a cat or a dog. The best way to approach CV is in the same way that you would approach a jigsaw puzzle. You start with all the pieces and the task is to assemble them in such a way that makes sense. However with a puzzle, you at least know what the image is you are trying to assemble. The machine is instead fed up to millions of items such as data points/pixels (the puzzle pieces) to train it to put them together to recognize what the object might be.
So, to find a cat, we wouldn’t just be telling a computer to search for whiskers and pointy ears. Millions of photos defining a cat or dog would be uploaded for the model to learn on its own the types of features that make each of them up.
Major cloud vendors like Microsoft, AWS, IBM, Alibaba Cloud and others are now investing heavily in CV. And we are starting to see a greater adoption across the board as companies begin to realize the ground-breaking potential.
Right now, as the public still comes to terms with trusting machines over human interpretation, CV technology still requires supervision. There are no major use cases where it can completely replace human resource. For example, while some cars are driver-less, they still have a human at the wheel supervising what happens.
In the future, it is possible that some applications can completely eradicate the need for human input? Only time will tell.