Separating Style and Content with Bilinear Models-Reference-Cited by-同舟云学术

Separating Style and Content with Bilinear Models

Published:2000-06-01 Issue:6 Volume:12 Page:1247-1283
ISSN:0899-7667
Container-title:Neural Computation
language:en
Short-container-title:Neural Computation

Author:

Tenenbaum Joshua B.¹,Freeman William T.²

Affiliation:

1. Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, U.S.A.

2. MERL, a Mitsubishi Electric Research Lab, 201 Broadway, Cambridge, MA 02139, U.S.A.

Abstract

Perceptual systems routinely separate “content” from “style,” classifying familiar words spoken in an unfamiliar accent, identifying a font or handwriting style across letters, or recognizing a familiar face or object seen under unfamiliar viewing conditions. Yet a general and tractable computational model of this ability to untangle the underlying factors of perceptual observations remains elusive (Hofstadter, 1985). Existing factor models (Mardia, Kent, & Bibby, 1979; Hinton & Zemel, 1994; Ghahramani, 1995; Bell & Sejnowski, 1995; Hinton, Dayan, Frey, & Neal, 1995; Dayan, Hinton, Neal, & Zemel, 1995; Hinton & Ghahramani, 1997) are either insufficiently rich to capture the complex interactions of perceptually meaningful factors such as phoneme and speaker accent or letter and font, or do not allow efficient learning algorithms. We present a general framework for learning to solve two-factor tasks using bilinear models, which provide sufficiently expressive representations of factor interactions but can nonetheless be fit to data using efficient algorithms based on the singular value decomposition and expectation-maximization. We report promising results on three different tasks in three different perceptual domains: spoken vowel classification with a benchmark multi-speaker database, extrapolation of fonts to unseen letters, and translation of faces to novel illuminants.

Publisher

MIT Press - Journals

Subject

Cognitive Neuroscience,Arts and Humanities (miscellaneous)

Link

https://www.mitpressjournals.org/doi/pdf/10.1162/089976600300015349

Reference25 articles.

1. Encoding of Spatial Location by Posterior Parietal Neurons

2. Statistical Approach to Shape from Shading: Reconstruction of Three-Dimensional Face Surfaces from Single Two-Dimensional Images

3. An Information-Maximization Approach to Blind Separation and Blind Deconvolution

4. Image Representations for Visual Learning

5. The Helmholtz Machine

Cited by 475 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A comprehensive review of visual–textual sentiment analysis from social media networks;Journal of Computational Social Science;2024-09-08

2. Neuromorphic visual scene understanding with resonator networks;Nature Machine Intelligence;2024-06-27

3. Adap-BDCM: Adaptive Bilinear Dynamic Cascade Model for Classification Tasks on CNV Datasets;Interdisciplinary Sciences: Computational Life Sciences;2024-05-17

4. A Learning-Based Multi-Node Fusion Positioning Method Using Wearable Inertial Sensors;ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2024-04-14

5. Improving Realism of Facial Interpolation and Blendshapes with Analytical Partial Differential Equation-Represented Physics;Axioms;2024-03-12