Associative relations among words, concepts and percepts are the core building blocks of high-level cognition. When viewing the world ‘at a glance’, the associative relations between objects in a scene, or between an object and its visual background are extracted rapidly. The extent to which such relational processing requires attentional capacity, however, has been heavily disputed over the years. In the present manuscript I review studies investigating scene-object and object-object associative processing. I then present a series of studies in which I assessed the necessity of spatial attention to various types of visual-semantic relations within a scene. Importantly, in all studies, the spatial and temporal aspects of visual attention were tightly controlled in an attempt to minimize unintentional attention shifts from ‘attended’ to ‘unattended’ regions. Pairs of stimuli - either objects, scenes, or a scene and an object - were briefly presented on each trial, while participants were asked to detect a pre-defined category of stimuli (e.g., an animal, a nonsense shape). Response times (RTs) to the target detection task were registered when visual attention spanned both stimuli in a pair vs. when attention was focused on only one of two stimuli. Findings consistently demonstrated rapid associative processing when stimuli were fully attended, i.e., shorter RTs to associated than unassociated pairs. Focusing attention on a single stimulus only, however, largely impaired this relational processing. The only exception to this result pattern was observed with the target stimuli that were prioritized by task demands: such stimuli continued to affect performance even when positioned at an unattended location, indicating that their relations with the attended items were well processed and analyzed. Our findings suggest that attention plays a critical role in processing visual-associative relations when these involve stimuli that are irrelevant to one's immediate goals.