1. Becker, S. (1992). An Information-Theoretic Unsupervised Learning Algorithm for Neural Networks. PhD thesis, University of Toronto.
2. Bub, U., Hunke, M., and Waibel, A. (1995). Knowing Who to Listen to in Speech Recognition: Visually Guided Beamforming. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 848–851.
3. Casey, M., Gardner, W., and Basu, S. (1995). Vision Steered Beam-forming and Transaural Rendering for the Artificial Life Interactive Video Environment (ALIVE). In Proceedings of the 99th Convention of the Audio Engineering Society (AES). Preprint 4052.
4. Checka, N., Wilson, K., Rangarajan, V., and Darrell, T. (2003). A Probabilistic Framework for Multi-modal Multi-person Tracking. In Proceedings of Workshop on Multi-Object Tracking. http://www.ai.mit.edu/projects/vip/papers/checka-et-al-womot.pdf.
5. Collobert, M., Feraud, R., LeTourneur, G., Bernier, O., Viallet, J. E., Mahieux, Y., and Collobert, D. (1996). LISTEN: A System for Locating and Tracking Individual Speakers. In Proceedings of Second International Conference on Face and Gesture Recognition, pages 283–288.