To build a recognition system for any application these days, one's first inclination is to turn to the most recent machine learning breakthrough from the area of deep learning, which no doubt has been enabled by access to millions of training images, clean and correctly labelled from the Internet. But there are many circumstances where such an approach cannot be used as an off-the-shelf component to assemble the system we desire, because even the largest training dataset does not take into account all of the artifacts that can be experienced in the environment. As computer vision pushes further into real-world applications, what should a software system that can interpret images from sensors placed in any unrestricted setting actually look like? Can we leverage inputs from multiple sensory modalities, to built such systems? How can we utilize the knowledge acquired from such generalized setting to create tools for specific applications in neuroscience? In this dissertation I answer these questions in the following ways.First, to study the impact of difficult scenarios on automatic recognition, I explore the usefulness of state-of-the-art visual recognition combined with image restoration algorithms as off-the-shelf components, operating on the idea that restored images should be somewhat easy to classify. Remarkably, little thought has been given to image restoration and enhancement algorithms as pre-processing methods for visual recognition: the goal of computational photography thus far has simply been to make images look appealing after correction. Our work on this research led to the following contributions – 1) a new video benchmark dataset, UG2 representing both ideal conditions and common aerial image artifacts, which is publicly available to facilitate new research 2) an extensive evaluation of the influence of image aberrations and problematic conditions on common object recognition and one stage detection models, 3) an analysis of the impact and suitability of basic and state-of-the-art image and video processing algorithms used in conjunction with common object recognition and detection models. Second, I investigate the impact of multimodal data (in the form of raw neural activations due to olfactory signals) to improve recognition. We use zebrafish, (Danio rerio), a vertebrate belonging to class Pisces for the experiments as they share striking resemblance to human retina. We found out that the neural activations due to a single modality such as vision is remarkably different from the activations in response to multiple sensory modalities (vision and olfaction) as they come from different distributions. Based on our observations, we create a computational model that can distinguish between activations when stimulus due to a single or two modalities are present or not. As zebrafish maintains high evolutionary proximity to mammals, our model can be extended to humans as well. Finally, in order to study behavioral changes in fish due to multi-modal sensory activation, I create a new fully automatic end-to-end tool that requires minimum human supervision. Our tool extracts meaningful information such as a fish's trajectory using state-of-the-art deep learning and supervised classification methods to predict behavioral changes when the sensory integration system is instantiated. Additionally, it can be generalized to a number of experiments that involve studying motion-based animal behavior. To date, this is the only tool that can automatically predict changes in behavior in zebrafish due to odor stimulation.