Affiliation:
1. Tsinghua University, China
Abstract
Along with the progress of imaging modality and the wide utility of digital images (including video) in various fields, many potential content producers have emerged, and many image databases have been built. Because images require large amounts of storage space and processing time, how to quickly and efficiently access and manage these large, both in the sense of information contents and data volume, databases has become an urgent problem. The research solution for this problem, using content-based image retrieval (CBIR) techniques, was initiated in the last decade (Kato, 1992). An international standard for multimedia content descriptions, MPEG-7, was formed in 2001 (MPEG). With the advantages of comprehensive descriptions of image contents and consistence to human visual perception, research in this direction is considered as one of the hottest research points in the new century (Castelli, 2002; Zhang, 2003; Deb, 2004). Many practical retrieval systems have been developed; a survey of near 40 systems can be found in Veltkamp (2000). Most of them mainly use low-level image features, such as color, texture, and shape, etc., to represent image contents. However, there is a considerable difference between the users’ interest in reality and the image contents described by only using the above low-level image features. In other words, there is a wide gap between the image content description based on low-level features and that of human beings’ understanding. As a result, these low-level featurebased systems often lead to unsatisfying querying results in practical applications. To cope with this challenging task, many approaches have been proposed to represent and describe the content of images at a higher level, which should be more related to human beings’ understanding. Three broad categories could be classified: synthetic, semantic, and semiotic (Bimbo, 1999; Djeraba, 2002). From the understanding point of view, the semantic approach is natural. Human beings often describe image content in terms of objects, which can be defined at different abstraction levels. In this article, objects are considered not only as carrying semantic information in images, but also as suitable building blocks for further image understanding. The rest of the article is organized as follows: in “Background,” early object-based techniques will be briefly reviewed, and the current research on object-based techniques will be surveyed. In “Main Techniques,” a general paradigm for object-based image retrieval will be described; and different object-based techniques, such as techniques for extracting meaningful regions, for identifying objects, for matching semantics, and for conducting feedback are discussed. In “Future Trends,” some potential directions for further research are pointed out. In “Conclusion,” several final remarks are presented.