Abstract: Analysts and scientists are increasingly interested in automatically analyzing the semantic contents of unstructured, non-tabular data (videos, images, text, and audio). In order to extract the semantic contents, analysts have turned to machine learning (ML) methods, which can be used in unstructured data analytics systems. Unfortunately, using these ML methods requires expertise to deploy and can be incredibly expensive to execute.
To address these issues, I have built AIDB, a database for allowing users to query unstructured data via SQL. In AIDB, a database administrator specifies mappings between virtual columns that are generated via ML models. The application user can then query the tables in AIDB as with any other SQL database. I have also developed new optimizations to accelerate these ML-based queries via approximations and new query optimization techniques, which can provide up to 300x speedups at 95% accuracy.