VISTA-Funded Projects

Awarded March 2018

Video analytics of human behaviour
PI: James Elder

In Canada, finding and rescuing people from downed aircraft or vessels in distress is challenging, as our nation's area of responsibility encompasses 18 million km2 of land and water. While the military still relies on the naked eye to conduct searches, advanced sensor systems are poised to transform airborne search and rescue (SAR). Canada's new fleet of search and rescue aircraft will come equipped with state-of-the-art electro-optical/infrared (EO/IR) stabilized sensor systems that will make it possible to image objects and people several kilometers away. However, even with this technology it is still difficult for a human operator to reliably detect a target that may subtend only a few pixels of the display and be partially obscured by variable illumination, weather, terrain and ground cover, from a moving aircraft.
Partial automation of target detection would alleviate some of the operator's workload. Prior work on assisted target detection (ATD) systems has shown promise but has still not achieved the level of reliability needed to meet operational requirements. A key limitation has been the lack of a large, realistic and ground-truthed search and rescue dataset. This project will take advantage of the new SAR dataset created by the NRC Flight Research Laboratory (FRL) to train more advanced deep neural network (DNN) detectors to detect SAR targets from airborne IR imagery. Human psychophysical experiments will also be conducted to determine the optimal threshold criterion and perceptual modality for alerting human operators to automatically detected potential targets, in order to maximize overall (human + machine) SAR target detection performance. The resulting human-adapted technology will improve the efficacy of airborne SAR and thus save lives.

Computational, behavioural, and neuroimaging approaches to decoding human scene processing
PI: Shayna Rosenbaum

Visual memory is a key cognitive function that helps us navigate our social and physical environments, from recognizing friends and family to driving home or to work. But how our brain codes the outside visual world into internal visual memory representations remains largely unknown. The difficulty lies in the fact that visual information from the outside environment goes through a series of complex transformations, by dozens of interconnected brain regions, before it becomes a memory. Conventional experimental methods can typically only manipulate a few properties of visual stimuli at a time to understand how brain activity and memory are affected. Such methods are likely inefficient or incapable of dealing with this high degree of complexity, and a new approach is needed. To address this issue, we will take advantage of the state-of-the-art, multi-layer, convolutional neural network (CNN), a type of deep-learning computational model that mimics how neural networks operate. Specifically, we will first use functional magnetic resonance imaging (fMRI) to measure participants’ brain activity while they are building visual memory representations. Participants’ eye movements will be recorded to reveal which features of scenes are viewed, and for how long. The fMRI, eyetracking, and memory performance data will be used to train a CNN-based model to predict how likely visual images will be remembered by human observers and to predict the neural activity that supports visual memory processing. Then, by examining and manipulating the properties within different layers of the trained CNN-based model, we will obtain novel insights regarding how the brain transforms early visual information into memories. These insights will be used to design future research that targets specific neural computations in individuals with documented memory difficulties to improve memory function.

Enhancing human vision via computer vision
PI: Richard Wildes

Digital camera technologies are increasingly prevalent in the domain of defense and security. A large and successful body of research has focused on the development of computer vision algorithms that can search digital images for targets with pre-defined characteristics. A natural application of these algorithms is the automated surveillance of imagery from electro-optic sensor imagers. However, in the present work we considered a novel application for these search algorithms: to augment the natural visual search performance of a human viewer. In particular, we hypothesize that the output of computer vision processing can help guide and focus human visual search on important aspects of the image. This might depend critically, however, on the mode of presenting the computer-vision algorithm outputs to the human viewer. To investigate these issues, we will compare human performance across several different modes of presenting the output of a state-of-the-art computer vision algorithm that detects interesting objects based on their features. In the proposed experiments, subjects are required to make forced-choice detection of human targets that could appear within simulated natural scenes. The goal of the research is to determine whether human visual search performance in the defense and security context can be assisted or enhanced with support from the computer vision algorithm. This work focuses on an important application of computational vision science.

Deep networks for assisted target detection in airborne search and rescue
PI: Richard Wildes

Airline boarding efficiency and security are of great importance in contemporary society. For example, in 2015 41,000,000 passengers passed through Pearson Airport in Toronto; facilitating the passage of such many individuals will have a nontrivial positive impact on the passenger experience. The proposed project will develop computer vision technology for monitoring airline passengers throughout their journey as an aide to providing better support. For example, if passenger movement or flow deviates significantly from the usual, then airport personnel could be assigned to restore normal operations. More specifically, the project will develop computer vision-based technology to track a passenger through a network of cameras in a model airline passenger terminal from the time they enter the terminal until they are seated in their assigned location in the aircraft, during their journey and then on the boarding/deplaning side until they leave the terminal area. Moreover, their particular trajectory through the site will be analyzed relative to a collection of model trajectories to classify their behaviour and detect anomalies. Capturing such data will provide key information to improve the passenger experience, e.g., through immediate intervention or through longer-term improvements in the boarding, flight and deplaning operations. This work focuses on a novel and important application of computer vision with ties to human factors and therefore is consistent with the intent of the VISTA program.

Awarded October 2017

Depth from Shadows
PI: James Elder

Shadows are everywhere around us but we rarely notice them. Nevertheless, shadows do give us a wealth of information about the relative locations of the objects that cast them and the surfaces on which they fall. Our visual systems pick up this information to help us understand the layout of the scene around us and then suppress the shadows to help analyze the objects that we do pay attention to. In this project, we will examine the cues that human and machine vision systems can use to detect shadows and the depth information they reveal. We propose a specific application in screening out shadows from video images of traffic where we want to monitor the number of vehicles and where shadows often lead to errors in counting. There is also potential to use the shape and size of the shadow to improve estimates of vehicle dimensions.

Using your senses optimally to support safe mobility
PI:Laurence Harris, Denise Henriques

In order to remain balanced and to navigate, the brain must combine information from multiple sensory sources –notably vision and movement information from the inner ear (vestibular inputs). The brain also has to instantly decide how much importance to place on each input to deal with common situations in which sensory inputs may become unstable (e.g., when vision is blurred) or inconsistent (e.g., while walking through a moving train). With age, there is often a decline in our senses, and changes in the way that sensory information is combined, sometimes resulting in errors. In this project we will assess whether older adults can process their self-movement as effectively as younger adults and whether changes in performance can be explained by changes in how they combine multisensory inputs. We will also assess whether performance can be improved by training. To characterize how healthy adults from 18-85 years old are able to integrate and adjust the processing of visual and vestibular inputs important for safe mobility. To assess whether training can make this integrative process more effective. A motion platform will provide physical motion, and an immersive virtual reality display will provide visual self-motion information. The directions provided by each of these simultaneous motions will be varied. Participants will decide if the directions agree or not and the amount of integration will be calculated. We will attempt to improve performance by providing feedback. This research will advance basic vision science and result in real-world applications associated with promoting safe mobility in older adults. The fundamental knowledge generated will be used to develop a technology-based, multisensory falls screening tool and falls prevention training application that will be deployable using widely available immersive VR systems.

Interactive avatar for human-robot interaction
PI: Michael Jenkin

There are very few robotic systems that are fully autonomous. Rather such systems areintended to interact with and respond to instructions given to them by human operators. As a consequence human-robot interaction and technologies to support such interaction have become key research problems in the robotics field. An enabling technology in the development of human-robotic interaction systems has been the development of cloud-based technologies for speech understanding and utterance generation. The IAHRI project proposes to exploit advances in cloud-based speech understanding and utterance generation to better understand effective interaction between humans and robots and to develop a general set of software tools that support naïve human interaction with autonomous systems. In particular it proposes to exploit one of a number of cloud-based speech understanding systems (e.g., Amazon’s Alexa) and to combine the capabilities of such systems with a cloud-based rendering mechanism to support a 3D avatar or puppet that is synchronized to the utterances of the system. The general nature of the approach allows the avatar to be customized in terms of the simulated emotional state of the autonomous system and the state of the humans with whom the system is interacting. The resulting human-robot system will be evaluated in terms of acceptance and utility in an academic setting. Furthermore, through our industrial partners we plan to deploy a human-robot interaction system based on these general tools on both the CloudConstable platform as well as the VirtualMe platform from Crosswing. These industrial implementations will not only inform the development of the software infrastructure planned in the IAHRI project, but they will also provide a unique opportunity to communicate the results of this project to a much wider audience. The CloudConstable platform, for example will provide the opportunity to integrate the IAHRI system within an IBM Watson AI XPRIZE consumer-oriented solution.

Commercialization readiness for the Mobility Assessment Tool
PI: Lauren Sergio

Computer vision technologies have become widely employed in the biomedical domain. Such technology is preferred for its objectivity, efficiency and precision. The objective of this project is to finalize development of the modified-Tinetti Mobility Assessment Tool (MAT) for assessment of mobility deterioration following natural aging, neurodegeneration, or brain injury. This is a validation of the algorithms for our non-invasive, unique, and purely objective computer vision video-tool. This tool was recently advanced from an early laboratory-based beta stage to one with upgraded hardware that accurately tracked humans. The system has now gone from a prototype to a operationally validated prototype able to perform a completely automated Modified Tinetti gait/balance test on adults. In this final phase of the project we will finalize the algorithms of the system by comparing its performance with independent clinician manual assessment and our own functional assessment tools in adults who have no clinical issues, have suffered concussion, or are in early-stage dementia. Validating the functionality, in particular its ability to accurately discriminate healthy from clinically anomalous posture and gait, and finalizing the user interface of this device will allow PhD Associates to advance a patent application and approach investors for commercialization with a market-ready product. PhD Associates will also be able to approach practitioners with an easy, precise, and low cost tool to estimate fall risk and incapacity from gait and balance (mobility) measurements. This work is important for the advancement in computer-vision assisted medical assessment, which will impact Canadian research in the field of rehabilitation, diagnosis, and function assessment. Another target market for the MAT is the personal injury law and the insurance industry, both of which would benefit from simple, computerized quick assessment of functional mobility resulting from an accident.

Bio-markers of non-invasive neuromodulation to early visual cortex
PI: Jennifer Steeves

Sometimes in order to understand how something works, it needs to be taken apart to see how components work together. The same reverse engineering approach can be used to understand the brain and its components. One technique that allows a researcher to do this is with non-invasive brain stimulation. Noninvasive brain stimulation is a relatively new therapeutic tool for treatment of clinical depression. It is also a new experimental approach in cognitive neuroscience to understand brain networks and function. In both cases, the mechanisms of how brain stimulation works are not known. The goal of this research is to understand what is going on in the brain during non-invasive brain stimulation. How is the brain affected at the site of brain stimulation and how are the networks to which is connected affected? The approach will use brain imaging techniques (magnetic resonance imaging, MRI) to measure local and global effects of brain stimulation. This research relates to the goals of VISTA by providing fundamental advances to vision science with potential application to health technologies. The findings from this work will inform and guide policy for non-invasive neuromodulation use in clinical and laboratory settings.

Validation of novel neurofeedback training engine for improving brain health in aging and neurodevelopmental disorders
PI: W. Dale Stevens

There is a clear gap between modern neuroscience research and translation into effective products that can be self-administered by the public. Areas that require immediate attention given heavy caregiver burden and health care costs are memory impairments in aging and emotional regulation deficits in Autism Spectrum Disorder (ASD). Extensive evidence shows that neurofeedback training (NFT), an established technique for self-regulating brain waves, can have long-lasting effects on brain function in aging and ASD. It is now possible to administer NFT using accessible and inexpensive brain sensing technology. The challenge is to create highly-interactive visual interfaces that engage users to comply with extended NFT, critical for inducing neuroplasticity. xSensa Labs has developed mobile NFT for cognitive and emotional impairments in aging and ASD, respectively. Feedback during training is typically provided in visual form, yet there have been no comprehensive studies comparing different visual interfaces and their subsequent interactions with training efficacy. In a new partnership with York University, Mitacs Elevate, Unionville Home Society, and potentially VISTA, xSensa seeks to rigorously test its NFT engine and user interface. Over-arching aims of the proposal are to replicate previous laboratory findings and examine visual environments that are best-suited for augmenting NFT outcomes. Compelling work shows that greater immersion in nature even in its minimalist form (e.g. viewing photographs) can improve brain function. This project will compare three dynamic interfaces: an immersive natural landscape based on real footage, immersive 3D computer-generated artificial environment, and non-immersive standard interface with objects moving in 2D. The main hypothesis is that in both aging and ASD, real nature will lead to faster acquisition of NFT strategies, improvements in memory and attention for aging, and reduction in anxiety and enhanced mood in ASD. After validation, these innovative products will create large-scale impact on brain health in Canada and globally.