Content-Based Image and Video Search Technology Overview
Visual Search allows people to simply ask a computer, "Have you seen anything that resembles this?" regardless of the type of information they are seeking -- whether it's photographs, illustrations, a video clip, a face, or many other types of visual digital data (including spectrograms of audio information).
Visual Search can be useful for many different applications
including photo and video cataloging & asset management,
e-commerce comparison shopping & auctions, trademark policing,
medical X-Ray & MRI diagnosis, pornography filtering,
face recognition, and more. See Business
Solutions for more information.
How eVe is functionally unique from other visual search engines
While several other commercial visual search solutions exist,
the use of object segmentation, statistical modeling, and
a common API for still images, video,
and audio, make eVe unique.
Still Image Search
eVe is the most accurate solution in the market to visually
search for photographs, illustrations, and other still imagery.
Search results using eVe are generally perceived as more
"accurate" or "relevant" than other
image visual search solutions. In other words, what eVe
picks out as similar images is very close to what a person
would pick out as similar. Accuracy and relevance is what
professional and consumer markets demand.
As an example, examine the following visual search results
using photographs comparing eVe to another popular visual
search tool. (See a review of non-object
based tools). The query image was a picture of
a leopard selected from a database consisting of 250+ animal
images (lions, birds, giraffes, elephants, monkeys, lizards,
tigers, big cats, etc.) Images of animals in the wild were
chosen for this search because they are notoriously difficult
to search visually. The complexity of the backgrounds, the
heavy play of light and shadow, and the multiple elements
that can obscure the foreground objects all make the task
of finding similar images very difficult.
The Query Image is a photograph of a leopard. The
Object Map shows how eVe interprets the 'objects'
in the image. The results from the search are shown
below.
The top 4 search results from the non-object based engine include two "relevant" results and two obviously incorrect results.
eVe's top four search results are all "relevant" images of large cats.
Motion Video Search
While most content-based visual search systems are focused on still image search/retrieval, eVe uniquely can also be used to search video & film content. The visual search capabilities of eVe offers a cost-effective way to turn video assets into useful, profitable resources for corporate communications, as well as broadcast, entertainment companies and personal use.
As an example, examine the following visual search result of video. The query image was a picture of a space shuttle launch and the video library was a diverse collection of videos ranging from historical to contemporary and sports to humanities.
The Query Image is a still photograph of a space shuttle launch.
The search is across a large collection of videos and the goal is to find a video sequence or part of a sequence which contains images visually similar to the query image.
In the example 35 mm filmstrip representation of the search result, eVe found 3 visual matches, indicated by the red arrows.
How eVe is technically unique from other visual search engines
The eVe Software Developers Kit is
a programming toolkit for building image analysis, storage,
indexing, and visual search applications for the recognition
and retrieval of images and video. The components include
Java class libraries, a sample program with source code, sample
images, and complete reference documentation.
eVe achieves superior visual search results and offers greater flexibility than other search engines by integrating several unique technological capabilities.
Automatic Segmentation:
eVe breaks new ground in visual search by
bringing automatic segmentation to the commercial world.
Segmentation is the process by which an image is divided into
regions, which correspond approximately to objects or parts
of objects in an image. Once these object regions are identified,
the four basic properties of color, texture, shape, and object
shading are extracted and stored in a condensed descriptor
called a visual signature. Similarity comparisons are then
made to the visual signatures of objects in other images.
Object Map display and Partial Image Searches
Not only does eVe automatically segment images,
but it can display the object map for the user to review or
make selections. The object map lets the user see and understand
what the computer sees to make more informed decisions on
how to search for similar images or how the computer reached
its search results. (See more examples
of images and their object maps).
eVe will also allow the user to select which objects in the
image are important, so the user can perform Partial
Image Searches. With Partial Image Search you can specify
which object within an image you want to search for. For example,
in a picture of a man with a hat, specifying the man's hat
as the object to be searched, not the man.
Clustered Indexing for fast retrievals
When searching for similar images other visual search
engines compare the visual signatures of images in a
linear, one-by-one manner. (See
a review of non-object
based tools). As the number of images increases,
the search time increases correspondingly.
eVe significantly reduces the number of comparisons
and thus the search time by using clustering. The visual
signature of a query image is first matched to the visual
signatures of a similar cluster. Then a comparative
search is only conducted to the images in that cluster.
This significantly reduces search times even as the
number of images in the database grow.
Object Shading/Region Search
In addition to color, shape, and texture,
you can also search on object shading/regions. This search
attribute works well for images with objects that are not
clearly distinct from their backgrounds. Essentially, for
every object in the image, an equation is defined which represents
the color and shading values through out that object. This
approximates a 3D likeness of the object. This aspect of the
visual signature can then be compared just as you can with
color, shape, and texture.
Visual Vocabulary & Visual MetaTagging
As eVe analyzes images, it automatically forms clusters of similar images and assigns each image to a cluster. A representative image from this cluster can then be used as a Visual Vocabulary element - a high level abstraction of all of the elements in the cluster. Visual Vocabulary can make it very easy for a user to visually narrow down a search.
Visual Vocabulary can also be used to facilitate keywording. Traditionally when you catalog images by attaching keywords, you open up each image one by one and determine which keywords are appropriate. For thousands of images, this can be a time-consuming and error prone task. By associating common keywords to all images that are classified under a Visual Vocabulary cluster, you can mass-tag images. This process is know as Visual Meta Tagging.
eVe Toolkit Description
eVe is a Java-based toolkit that includes a high-level and
low-level API, sample code, and complete documentation.
Built in functions distill an image or video keyframe to
a condensed descriptor called a visual signature which is
a vector composed of the calculated values for its four basic
properties of color, shape, texture, and object shading/region.
Note that eVe is primarily a tool for building applications
and is not a turn-key application itself. The toolkit includes
sample programs with source, that demonstrate the sorts of
things that can be built with the SDK. Some of these sample
programs might be directly useful as tools, and some can serve
as starting templates for building more elaborate SDK-based
applications.
Technology Features/Benefits Summary
General Features
Technology
Feature
Benefit
Object-Based Search (Precise)
Next generation visual search
recognizes and retrieves images based on the visual characteristics
of the objects in the image.
Object shading/region attribute lets you search for content in objects that are not clearly distinct from their backgrounds.
Search is simpler using visual search. User can simply ask: "Have you seen anything that resembles this?"
Users can set more advanced search criteria and obtain more "accurate" search results.
Object Maps (Search Refinement)
Object mapping of images provides accurate and highly relevant search retrievals.
Gives the user an understanding of why certain images were returned and how to refine the query.
With the aid of object maps, users can refine searches and develop more effective search queries by exploiting how the computer views the image and evaluates similarity.
Smart Index (Get Results Fast)
Proprietary data indexing leads
to consistently fast searches, even as the number of images
in the database grow.
Provides order of magnitude (up to 10x) retrieval performance improvement over competition.
eVe supports serialized file indexing to integrate with customers index structures.
Partial Image Search (Flexible & Precise)
Users can isolate region(s) within an image and conduct focused searches.
Generates search results that correspond more closely to users cognitive interpretation of image contents and intent.
Visual Vocabulary (Ease-of-Use)
Ability to create representative search images (a Visual Vocabulary) for an image or visual database base.
Based on search behavior of the user, Visual Vocabularies can be customized and refined.
Visual Vocabulary can make it very easy for a user
to visually initiate a search.
Customized vocabularies can accelerate search, create
loyalty, and drive transactions.
Visual Meta-Tagging (Tagging En-mass)
Systematic and fast automatic
addition of keywords to database content (meta-tagging).
Users can associate meta tags with multiple images simultaneously, which facilitates keywording consistency & accuracy.
Images,
Video, & Audio (same API for many media)
Any type of data that can be represented visually can be indexed and searched, including photographs, illustrations, animation, video and audio
One tool with a common set of APIs for multiple media types requires less development costs and time.
Multiple media types offer additional sources of revenue applications.
Developer Features
Technology
Feature
Benefit
High Scalability
Provides multi-threaded support for all services. Fully utilizes multiple CPUs.
Developers can increase performance of each service as needed.
Platform
Independence*
100% JAVA APIs. Ability to run on any JVM-supported environments. Fast and easy porting of eVe to new OSes.
Developers can easily integrate eVe into existing applications on any OS platform.
A foundation in Java means that there is a very large market of qualified developers.
Multiple Media Collection Support
Provides a framework to manage asset growth. Search results can be aggregated from multiple collections. Each collection can manage up to 50,000 assets.
Developers can manage large asset databases easily by segregating them into meaningful groups.
Support for any data situation
Users can search and retrieve matching images on the Web or as part of an Intranet or asset management system.
Can be used in any enterprise or consumer situation.
Automatic Asset Ingestion
Provides automatic generation
of visual metadata, proxy, and thumbnails.
Developers can easily ingest assets with automatic asset analysis and indexing.
Database
Independence*
Supports flat files and
SQL-based database management systems.
Developers can choose the optimal data storage environment for their application.