From Restoring Human Vision to Enhancing Computer Vision

Author: Liu, Yang

Year: 2020

Degree: Dissertation (Ph.D.)

Advisor: Meister, Markus

Committee Members: Perona, Pietro; Siapas, Athanassios G.; Yue, Yisong; Meister, Markus

Option: Computation and Neural Systems

DOI: 10.7907/sq58-z682

Abstract

The central theme of this work is enabling vision, which includes two subtopics: restoring vision for blind humans, and enhancing computer vision models in visual recognition. Chapter 1 first provides a gentle introduction to relevant high level principles of human visual computations and summarizes two fundamental questions that vision answers: "what" and "where." Chapters 2, 3, and 4 contain three published projects that are anchored by those two fundamental questions.

Chapter 2 introduces a cognitive assistant to restore visual function for blind humans by focusing on an interface powered by audio augmented reality. The assistant communicates the "what" and "where" aspects of visual scenes by a combination of natural language and spatialized sound. We experimentally demonstrated that the assistant enables many aspects of visual functions for naive blind users.

Chapters 3 and 4 develop data augmentation methods to address the data inefficiency problem in neural network based computer visual recognition models. In Chapter 3, a 3D-simulation based data augmentation method is developed for improving the generalization of visual classification models for rare classes. In Chapter 4, a fast and efficient data augmentation method is developed for the newly formulated panoptic segmentation task. The method improves performance of state-of-the-art panoptic segmentation models and generalizes across dataset domains, sizes, model architectures, and backbones.

Files