CaltechTHESIS
A Caltech Library Service

Beyond Text: The Rudiments of Next Generation Foundation Models

Citation

Talukder, Sabera (2026) Beyond Text: The Rudiments of Next Generation Foundation Models. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/s12m-5692. https://resolver.caltech.edu/CaltechTHESIS:12152025-232943789

Abstract

This thesis builds non-text and multimodal foundation models that overcome the difficulties of non-text data. These challenges are namely image’s and time series’ (e.g. audio’s and video’s): data heterogeneity, data continuity, and large memory requirements. In order to overcome these attributes we must build models that are information dense, generalizable, and multimodal. By the end of this thesis we will empirically demonstrate that the recipe for performant non-text and multimodal foundation models is: create discrete information dense representations, train models with large scale data in the most generalizable manner possible, and fuse data modalities early in the modeling stack.

Item Type: Thesis (Dissertation (Ph.D.))
Subject Keywords: foundation model, machine learning, artificial intelligence
Degree Grantor: California Institute of Technology
Division: Biology and Biological Engineering
Major Option: Neurobiology
Thesis Availability: Not set
Research Advisor(s):
  • Yue, Yisong (advisor)
  • Gkioxari, Georgia (co-advisor)
Thesis Committee:
  • Perona, Pietro (chair)
  • Yue, Yisong
  • Gkioxari, Georgia
  • Wierman, Adam C.
Defense Date: 11 December 2025
Non-Caltech Author Email: sabera.j.talukder (AT) gmail.com
Funders:
Funding Agency Grant Number
Chen Institute of Neuroscience at Caltech 25550075
NSF Graduate Research Fellowship DGE-1144469
PIMCO Fellows Program UNSPECIFIED
Record Number: CaltechTHESIS:12152025-232943789
Persistent URL: https://resolver.caltech.edu/CaltechTHESIS:12152025-232943789
DOI: 10.7907/s12m-5692
Related URLs:
URL URL Type Description
https://arxiv.org/pdf/2406.03044 arXiv Article adapted for chapter 6
https://arxiv.org/pdf/2402.16412 arXiv Article adapted for chapter 2
https://arxiv.org/pdf/2402.18546 arXiv Article adapted for chapter 5
https://arxiv.org/pdf/2206.08094 arXiv Article adapted for chapter 4
https://arxiv.org/pdf/2011.02712 arXiv Article adapted for chapter 7
https://arxiv.org/pdf/2011.07191 arXiv Article adapted for chapter 8
Default Usage Policy: No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code: 17802
Collection: CaltechTHESIS
Deposited By: Sabera Talukder
Deposited On: 17 Dec 2025 17:59
Last Modified: 17 Dec 2025 17:59

Full text not available from this repository.

Repository Staff Only: item control page