Researchers at Google have come up with a machine vision technique which could bring high power visual recognition to simple desktop and even mobile computers.
IProgrammer said that the system can recognise 100,000 different types of object within a photo in a few minutes.
Google appears to have made improvements to the fairly standard technique of applying convolutional filters to an image to pick out objects of interest.
This is tricky because the filters needs a sample of at least one per object type. If you are scanning Facebook for cats you need a filter which finds cats, so the method is limited to a small number of categories – otherwise you need a huge database.
A report, co-authored by Googlers Tom Dean, Mark Ruzon, Mark Segal, Jonathon Shlens, Sudheendra Vijayanarasimhan and Jay Yagnik, describes technology that speeds things up by using hashing.
Locality sensitive hashing looks up the results of each step. Instead of applying a mask to the pixels and summing the result, the pixels are hashed and then used as a lookup in a table of results.
They also use a rank ordering method which indicates which filter is likely to be the best match for further evaluation.
The result of the change to the basic algorithm is a speed up of roughly 20,000 times faster. Using 100,000 object detectors, which required over a million filters to be applied to multiple resolution scalings of the target image, the set up recognised an object in less than 20 seconds. The hardware involved was a single multi-core machine with 20GB of RAM.