LEARNING STRUCTURED VISUAL DETECTORS FROM USER INPUT AT MULTIPLE LEVELS

Author:

JAIMES ALEJANDRO1,CHANG SHIH-FU1

Affiliation:

1. Department of Electrical Engineering, Columbia University, 500 West 120th Street MC 4712 New York, NY 10027, USA

Abstract

In this paper, we propose a new framework for the dynamic construction of structured visual object/scene detectors for content-based retrieval. In the Visual Apprentice, a user defines visual object/scene models via a multiple-level Definition Hierarchy: a scene consists of objects, which consist of object-parts, which consist of perceptual-areas, which consist of regions. The user trains the system by providing example images/videos and labeling components according to the hierarchy she defines (e.g., image of two people shaking hands contains two faces and a handshake). As the user trains the system, visual features (e.g., color, texture, motion, etc.) are extracted from each example provided, for each node of the hierarchy (defined by the user). Various machine learning algorithms are then applied to the training data, at each node, to learn classifiers. The best classifiers and features are then automatically selected for each node (using cross-validation on the training data). The process yields a Visual Object/Scene Detector (e.g., for a handshake), which consists of an hierarchy of classifiers as it was defined by the user. The Visual Detector classifies new images/videos by first automatically segmenting them, and applying the classifiers according to the hierarchy: regions are classified first, followed by the classification of perceptual-areas, object-parts and objects. We discuss how the concept of Recurrent Visual Semantics can be used to identify domains in which learning techniques such as the one presented can be applied. We then present experimental results using several hierarchies for classifying images and video shots (e.g., Baseball video, images that contain handhakes, skies, etc.). These results, which show good performance, demonstrate the feasibility and usefulness of dynamic approaches for constructing structured visual object/scene detectors from user input at multiple levels.

Publisher

World Scientific Pub Co Pte Lt

Subject

Computer Graphics and Computer-Aided Design,Computer Science Applications,Computer Vision and Pattern Recognition

Cited by 4 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Incorporating feature hierarchy and boosting to achieve more effective classifier training and concept-oriented video summarization and skimming;ACM Transactions on Multimedia Computing, Communications, and Applications;2008-01

2. CLAIRE;ACM Transactions on Information Systems;2006-07

3. From Partition Trees to Semantic Trees;Multimedia Content Representation, Classification and Security;2006

4. Ontology for Nature-Scene Image Retrieval;On the Move to Meaningful Internet Systems 2004: CoopIS, DOA, and ODBASE;2004

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3