While the ease of use of the Internet is changing the way we teach, learn, conduct business, govern and live our every day lives, it also exposes pornography, hate speech and images and videos depicting violence. Such material is freely available to anyone with internet access. There is general agreement that much of this material can be inappropriate for children and even for adults in certain environments such as the workplace.
eVe's object-based visual search technology enables a method of automatically filtering objectionable images and videos based on their content. The same method can also be extended to include audio.
Current State of Filtering
Content filtering has recently attracted a lot of attention and is expected to grow by close to 50% per year reaching $636 million world-wide by 2004 (source: IDC).
Existing filtering technologies are strictly text based and have enjoyed only limited success in the marketplace. They deny access to information based on keywords or general topic areas and are hence subject to much criticism.
Three methods are used to achieve site filtering: Keyword blocking, Packet filtering and URL blocking. These can be used individually or in combination.
Keyword Blocking is a process by which a site is scanned for keywords as it is downloaded. If any of the material downloaded contains the designated keywords, that site is blocked. The main problem with this method is that it works only on text and without any regard to context (e.g. a login page that asks for the "sex" of a shopper, or useful medical & social information). Moreover, much of the content on the web consists of images, which usually have no text to scan for keywords.
Packet Filtering controls access by blocking requests to specific IP addresses, which define individual sites. This method is fast and simple. This method does not allow for fine-grained control and is also defeated by some of the newer technologies such as IP-independent virtual hosts.
URL Blocking is a process by which access is controlled by URL address. Most commercial software uses URL blocking. Either someone decides which sites are appropriate and the system is programmed to allow access to only those sites (inclusion), or the software keeps a list of objectionable sites and access is allowed to any site that is not on the objectionable list (exclusion).
URL blocking methods obviously have serious limitations. Inclusion methods are overrestrictive of which sites can be accessed and they indiscriminately block legitimate content that does not contain proscribed visual content. Exclusion methods have the impossible task of keeping up with the thousands of English-language sites that come online each day, or move IP addresses to avoid being blocked. This doesn't even begin to deal with foreign-language sites. The lists simply become unwieldy and impossible to maintain.
Content-Based Image Filtering (CBIF)
All of the text-based content filtering methods use some knowledge of the domain and text scanning to impose restrictions. This makes it very hard to filter visual and audio media. The three objectives of any filtering mechanism, Accuracy, Scalability, and Maintainability are not met by existing methods. Accurate blocking makes it hard to scale and maintain and easily scalable and maintainable systems are not as accurate.
Content-based image filtering can resolve many of these problems. The method consists of examining the image itself for patterns, detecting objectionable material and blocking the offending site.
The process of Content-Based Image Filtering consists of 3 specific steps.
Step1: Skin Tone Filter (Determine if the image contains large areas of skin color pixels)
First the images are filtered for skin tones. The color of human skin is created by a combination of blood (red) and melanin (yellow, brown). These combinations restrict the range of hues that skin can possess. (Except for people from the planet Rigel 7). In addition skin has very little texture. These facts allow us to ignore regions with high amplitude variations, and design a skin tone filter to separate images before they are analyzed.
Step2: Analyze (Automatically segment and compute a visual signature for the image)
Since we have already filtered the images for skin tones, any images that have very little skin tones will be accepted into the repository. The remaining images are then automatically segmented and their visual signatures are computed.
Step3: Compare (Match the new image against a reference set of objectionable images and object regions) The visual signatures of the potentially objectionable images are then compared to a pre-determined reference data set. If the new image matches any of the images in the reference set with over 70% similarity, then the image is rejected. If the similarity falls in the range of 40-70% then that images is set aside for manual intervention. An operator can look at these images and decide to accept or reject it. Images that fall below 40% are accepted and added to the repository. Please note that these threshold values are arbitrary and are completely adjustable.
Summary
eVe can offer an effective image-monitoring solution. The software can recognize regions of pixels that are mostly skin tones. Once these regions are extracted, they are compared with a reference set of images to filter for pornographic content. Three decisions are possible based on the degree of similarity between the new image and the reference set. An image can be accepted, rejected or set aside for manual review. These threshold values can be tuned to specific data sets to minimize manual intervention.
A version of this article is also available as a pdf (446k).