Humans rely on the integration of information from multiple sensory modalities to interact successfully with their environment. In the present series of studies, we investigated how the visuomotor system integrates congruent and incongruent visual and tactile sensory inputs for goal-directed action comprehension and execution. Specifically, we investigated whether orienting of attention towards either vision or touch enhances the impact of one modality over another. In Experiment 1 participants were presented with either visual (on a computer monitor) and/or tactile (in unseen left hand) components of an action, and made button-press responses to categorize it as ‘wide' or ‘narrow'. Responses were significantly faster when attending to vision compared with touch, and faster in fully congruent compared with category congruent and incongruent conditions. Moreover, responses to wide grasp actions were significantly faster than to narrow grasp actions. Thus, both vision and touch are significant in action comprehension, but visual inputs are in general more influential than tactile inputs. In Experiment 2 the same task was performed but participants made reach-to-grasp movements, recorded with ProReflex motion capture. Although actions to wide objects produced wider peak grasp overall than to a narrow object, in contrast to action comprehension, there was no systematic effect of attended modality or tactile input in action execution. We speculate that action comprehension and execution utilize visual and tactile inputs differentially.