CaltechTHESIS
  A Caltech Library Service

Beyond Text: The Rudiments of Next Generation Foundation Models

Citation

Talukder, Sabera (2026) Beyond Text: The Rudiments of Next Generation Foundation Models. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/s12m-5692. https://resolver.caltech.edu/CaltechTHESIS:12152025-232943789

Abstract

This thesis builds non-text and multimodal foundation models that overcome the difficulties of non-text data. These challenges are namely image’s and time series’ (e.g. audio’s and video’s): data heterogeneity, data continuity, and large memory requirements. In order to overcome these attributes we must build models that are information dense, generalizable, and multimodal. By the end of this thesis we will empirically demonstrate that the recipe for performant non-text and multimodal foundation models is: create discrete information dense representations, train models with large scale data in the most generalizable manner possible, and fuse data modalities early in the modeling stack.

Item Type:Thesis (Dissertation (Ph.D.))
Subject Keywords:foundation model, machine learning, artificial intelligence
Degree Grantor:California Institute of Technology
Division:Biology and Biological Engineering
Major Option:Neurobiology
Thesis Availability:Not set
Research Advisor(s):
  • Yue, Yisong (advisor)
  • Gkioxari, Georgia (co-advisor)
Thesis Committee:
  • Perona, Pietro (chair)
  • Yue, Yisong
  • Gkioxari, Georgia
  • Wierman, Adam C.
Defense Date:11 December 2025
Non-Caltech Author Email:sabera.j.talukder (AT) gmail.com
Funders:
Funding AgencyGrant Number
Chen Institute of Neuroscience at Caltech25550075
NSF Graduate Research FellowshipDGE-1144469
PIMCO Fellows ProgramUNSPECIFIED
Record Number:CaltechTHESIS:12152025-232943789
Persistent URL:https://resolver.caltech.edu/CaltechTHESIS:12152025-232943789
DOI:10.7907/s12m-5692
Related URLs:
URLURL TypeDescription
https://arxiv.org/pdf/2406.03044arXivArticle adapted for chapter 6
https://arxiv.org/pdf/2402.16412arXivArticle adapted for chapter 2
https://arxiv.org/pdf/2402.18546arXivArticle adapted for chapter 5
https://arxiv.org/pdf/2206.08094arXivArticle adapted for chapter 4
https://arxiv.org/pdf/2011.02712arXivArticle adapted for chapter 7
https://arxiv.org/pdf/2011.07191arXivArticle adapted for chapter 8
Default Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:17802
Collection:CaltechTHESIS
Deposited By: Sabera Talukder
Deposited On:17 Dec 2025 17:59
Last Modified:17 Dec 2025 17:59

Full text not available from this repository.

Repository Staff Only: item control page