From Restoring Human Vision to Enhancing Computer Vision
Author: Liu, Yang
Year: 2020
Degree: Dissertation (Ph.D.)
Advisor: Meister, Markus
Committee Members: Perona, Pietro; Siapas, Athanassios G.; Yue, Yisong; Meister, Markus
Option: Computation and Neural Systems
DOI: 10.7907/sq58-z682
Abstract
The central theme of this work is enabling vision, which includes two subtopics: restoring vision for blind humans, and enhancing computer vision models in visual recognition. Chapter 1 first provides a gentle introduction to relevant high level principles of human visual computations and summarizes two fundamental questions that vision answers: "what" and "where." Chapters 2, 3, and 4 contain three published projects that are anchored by those two fundamental questions.
Chapter 2 introduces a cognitive assistant to restore visual function for blind humans by focusing on an interface powered by audio augmented reality. The assistant communicates the "what" and "where" aspects of visual scenes by a combination of natural language and spatialized sound. We experimentally demonstrated that the assistant enables many aspects of visual functions for naive blind users.
Chapters 3 and 4 develop data augmentation methods to address the data inefficiency problem in neural network based computer visual recognition models. In Chapter 3, a 3D-simulation based data augmentation method is developed for improving the generalization of visual classification models for rare classes. In Chapter 4, a fast and efficient data augmentation method is developed for the newly formulated panoptic segmentation task. The method improves performance of state-of-the-art panoptic segmentation models and generalizes across dataset domains, sizes, model architectures, and backbones.
Files
- PhD_Thesis_v3.0.pdf (application/pdf)