Security is a crucial issue in collaborative robotics. A robot must perceive its environment with precision to maximize its productivity without compromising the safety of the workers around it. Visual recognition is a popular modality, but it is often limited to a small field and involves a high computational load. The proposed project aims to combine vision with sound recognition for a collaborative robot. This audio-visual recognition will be possible by using a microphone array combined with a camera to take advantage of the spatial information of these two modalities. In addition to allowing an omnidirectional attention that goes beyond the restricted visual field of the camera, this approach will allow a more robust recognition and reduce the computational load. Indeed, it will be possible to locate the direction arrival of a sound event and to orient the camera to analyze only the visual area of interest related to the event. With this new technology, collaborative robots will be able to better monitor the presence of operators and interact safely.