eVision - Visual Search Technology Overview

Content-Based Image and Video Search Technology Overview

Visual Search allows people to simply ask a computer, "Have you seen anything that resembles this?" regardless of the type of information they are seeking -- whether it's photographs, illustrations, a video clip, a face, or many other types of visual digital data (including spectrograms of audio information).

Visual Search can be useful for many different applications including photo and video cataloging & asset management, e-commerce comparison shopping & auctions, trademark policing, medical X-Ray & MRI diagnosis, pornography filtering, face recognition, and more. See Business Solutions for more information.

How eVe is functionally unique from other visual search engines

While several other commercial visual search solutions exist, the use of object segmentation, statistical modeling, and a common API for still images, video, and audio, make eVe unique.

Still Image Search

eVe is the most accurate solution in the market to visually search for photographs, illustrations, and other still imagery. Search results using eVe are generally perceived as more "accurate" or "relevant" than other image visual search solutions. In other words, what eVe picks out as similar images is very close to what a person would pick out as similar. Accuracy and relevance is what professional and consumer markets demand.

As an example, examine the following visual search results using photographs comparing eVe to another popular visual search tool. (See a review of non-object based tools). The query image was a picture of a leopard selected from a database consisting of 250+ animal images (lions, birds, giraffes, elephants, monkeys, lizards, tigers, big cats, etc.) Images of animals in the wild were chosen for this search because they are notoriously difficult to search visually. The complexity of the backgrounds, the heavy play of light and shadow, and the multiple elements that can obscure the foreground objects all make the task of finding similar images very difficult.

The Query Image is a photograph of a leopard. The Object Map shows how eVe interprets the 'objects' in the image. The results from the search are shown below.

The top 4 search results from the non-object based engine include two "relevant" results and two obviously incorrect results.

eVe's top four search results are all "relevant" images of large cats.

Motion Video Search

While most content-based visual search systems are focused on still image search/retrieval, eVe uniquely can also be used to search video & film content. The visual search capabilities of eVe offers a cost-effective way to turn video assets into useful, profitable resources for corporate communications, as well as broadcast, entertainment companies and personal use.

As an example, examine the following visual search result of video. The query image was a picture of a space shuttle launch and the video library was a diverse collection of videos ranging from historical to contemporary and sports to humanities.

The Query Image is a still photograph of a space shuttle launch.

The search is across a large collection of videos and the goal is to find a video sequence or part of a sequence which contains images visually similar to the query image.

In the example 35 mm filmstrip representation of the search result, eVe found 3 visual matches, indicated by the red arrows.

How eVe is technically unique from other visual search engines

The eVe Software Developers Kit is a programming toolkit for building image analysis, storage, indexing, and visual search applications for the recognition and retrieval of images and video. The components include Java class libraries, a sample program with source code, sample images, and complete reference documentation.

eVe achieves superior visual search results and offers greater flexibility than other search engines by integrating several unique technological capabilities.

Automatic Segmentation:

eVe breaks new ground in visual search by bringing automatic segmentation to the commercial world. Segmentation is the process by which an image is divided into regions, which correspond approximately to objects or parts of objects in an image. Once these object regions are identified, the four basic properties of color, texture, shape, and object shading are extracted and stored in a condensed descriptor called a visual signature. Similarity comparisons are then made to the visual signatures of objects in other images.

Object Map display and Partial Image Searches

Not only does eVe automatically segment images, but it can display the object map for the user to review or make selections. The object map lets the user see and understand what the computer sees to make more informed decisions on how to search for similar images or how the computer reached its search results. (See more examples of images and their object maps).
eVe will also allow the user to select which objects in the image are important, so the user can perform Partial Image Searches. With Partial Image Search you can specify which object within an image you want to search for. For example, in a picture of a man with a hat, specifying the man's hat as the object to be searched, not the man.

Clustered Indexing for fast retrievals

When searching for similar images other visual search engines compare the visual signatures of images in a linear, one-by-one manner. (See a review of non-object based tools). As the number of images increases, the search time increases correspondingly.

eVe significantly reduces the number of comparisons and thus the search time by using clustering. The visual signature of a query image is first matched to the visual signatures of a similar cluster. Then a comparative search is only conducted to the images in that cluster. This significantly reduces search times even as the number of images in the database grow.

eve search times

Object Shading/Region Search

In addition to color, shape, and texture, you can also search on object shading/regions. This search attribute works well for images with objects that are not clearly distinct from their backgrounds. Essentially, for every object in the image, an equation is defined which represents the color and shading values through out that object. This approximates a 3D likeness of the object. This aspect of the visual signature can then be compared just as you can with color, shape, and texture.

Visual Vocabulary & Visual MetaTagging

As eVe analyzes images, it automatically forms clusters of similar images and assigns each image to a cluster. A representative image from this cluster can then be used as a Visual Vocabulary element - a high level abstraction of all of the elements in the cluster. Visual Vocabulary can make it very easy for a user to visually narrow down a search.

Visual Vocabulary can also be used to facilitate keywording. Traditionally when you catalog images by attaching keywords, you open up each image one by one and determine which keywords are appropriate. For thousands of images, this can be a time-consuming and error prone task. By associating common keywords to all images that are classified under a Visual Vocabulary cluster, you can mass-tag images. This process is know as Visual Meta Tagging™.

eVe Toolkit Description

eVe is a Java-based toolkit that includes a high-level and low-level API, sample code, and complete documentation.

Built in functions distill an image or video keyframe to a condensed descriptor called a visual signature which is a vector composed of the calculated values for its four basic properties of color, shape, texture, and object shading/region.

Note that eVe is primarily a tool for building applications and is not a turn-key application itself. The toolkit includes sample programs with source, that demonstrate the sorts of things that can be built with the SDK. Some of these sample programs might be directly useful as tools, and some can serve as starting templates for building more elaborate SDK-based applications.

Technology Features/Benefits Summary

General Features
Technology	Feature	Benefit
Object-Based Search (Precise)	Next generation visual search recognizes and retrieves images based on the visual characteristics of the objects in the image. Object shading/region attribute lets you search for content in objects that are not clearly distinct from their backgrounds.	Search is simpler using visual search. User can simply ask: "Have you seen anything that resembles this?" Users can set more advanced search criteria and obtain more "accurate" search results.
Object Maps (Search Refinement)	Object mapping of images provides accurate and highly relevant search retrievals.	Gives the user an understanding of why certain images were returned and how to refine the query. With the aid of object maps, users can refine searches and develop more effective search queries by exploiting how the computer views the image and evaluates similarity.
Smart Index (Get Results Fast)	Proprietary data indexing leads to consistently fast searches, even as the number of images in the database grow.	Provides order of magnitude (up to 10x) retrieval performance improvement over competition. eVe supports serialized file indexing to integrate with customer’s index structures.
Partial Image Search (Flexible & Precise)	Users can isolate region(s) within an image and conduct focused searches.	Generates search results that correspond more closely to users’ cognitive interpretation of image contents and intent.
Visual Vocabulary (Ease-of-Use)	Ability to create representative search images (a Visual Vocabulary) for an image or visual database base. Based on search behavior of the user, Visual Vocabularies can be customized and refined.	Visual Vocabulary can make it very easy for a user to visually initiate a search. Customized vocabularies can accelerate search, create loyalty, and drive transactions.
Visual Meta-Tagging (Tagging En-mass)	Systematic and fast automatic addition of keywords to database content (meta-tagging).	Users can associate meta tags with multiple images simultaneously, which facilitates keywording consistency & accuracy.
Images, Video, & Audio (same API for many media)	Any type of data that can be represented visually can be indexed and searched, including photographs, illustrations, animation, video and audio	One tool with a common set of APIs for multiple media types requires less development costs and time. Multiple media types offer additional sources of revenue applications.

Developer Features
Technology	Feature	Benefit
High Scalability	Provides multi-threaded support for all services. Fully utilizes multiple CPUs.	Developers can increase performance of each service as needed.
Platform Independence*	100% JAVA APIs. Ability to run on any JVM-supported environments. Fast and easy porting of eVe to new OSes.	Developers can easily integrate eVe into existing applications on any OS platform. A foundation in Java means that there is a very large market of qualified developers.
Multiple Media Collection Support	Provides a framework to manage asset growth. Search results can be aggregated from multiple collections. Each collection can manage up to 50,000 assets.	Developers can manage large asset databases easily by segregating them into meaningful groups.
Support for any data situation	Users can search and retrieve matching images on the Web or as part of an Intranet or asset management system.	Can be used in any enterprise or consumer situation.
Automatic Asset Ingestion	Provides automatic generation of visual metadata, proxy, and thumbnails.	Developers can easily ingest assets with automatic asset analysis and indexing.
Database Independence*	Supports flat files and SQL-based database management systems.	Developers can choose the optimal data storage environment for their application.

* eVe 3 Professional is certified for:

Sun Solaris	Microsoft Windows NT/2000	Red Hat Linux	SGI IRIX	Mac OS X	Oracle