introspection ft. harsehaj -> harsehaj.substack.com
Posts
breaking down a vector database

breaking down a vector database

backbone of ai models #️⃣

Harsehaj :)
February 17, 2025

In partnership with

welcome to introspection ft. harsehaj! ⭐️ i’m harsehaj, and always up to something in social good x tech.

this publication is a place for me to reflect on productivity, health and tech, and drop unique opportunities in the space right to your inbox daily. if you’re new here, sign up to tune in!💌

scroll to the end for my daily roundup on unique opportunities!

onto today’s topic: breaking down a vector database #️⃣

i’m making more of intentional effort to learn ai/ml on a deeper level, so what better way to document my understanding and interpretation than to share it?

i dove into solidifying what a vector database is in my brain today.

a vector database is essentially the backbone of all ai models — it’s where the vectors, numerical representations of objects’ features or meaning, derived from different feature patterns are stored and referenced. 📚️

most ai models’ primary use case is providing something similar: music recommendations on spotify, image searching, chatbots (finding previous similar chat conversations and retrieving responses). so, in order for an ai model to grow more accurate iteratively, the database needs to similarly reflect that iterability and continue adding vectors that are more and more similar.

okay pause — how do these vectors capture the meaning or features of an object?

well, that’s too much computation for humans, so we also leave that to ai, but it’s still worth understanding at a high-level. ai models, especially deep learning models like neural networks, learn to extract meaningful features from data. they do this by analyzing patterns and relationships within the data, which involves a whole lot of math. 🤓

for example, with textual data: a language model would convert words/sentences into vectors by capturing their meaning and context. “happy” and “joyful” would have similar vectors because they mean similar things. additionally, “king” and “queen” would have vectors that reflect their relationship (gender difference but similar roles).

so now that the vectors have been established, how exactly is similarity captured?

vectors are stored in a multi-dimensional space where similar objects are close together and different objects are far apart. a cosine similarity or euclidean distance formula measures how close two vectors are. as an example, “dog” and “puppy” would have similar vectors and thus, would also be close in this space, whereas “dog” and “car” wouldn’t.

once we have these meaningful vectors, a vector database can quickly find the closest matches. this is much more useful in ai applications compared to a regular database, which are only built for exact matches.

think of a vector database as a super smart librarian who understands what you’re looking for, even if you don’t ask perfectly. 😁

The gold standard of business news

Morning Brew is transforming the way working professionals consume business news.

They skip the jargon and lengthy stories, and instead serve up the news impacting your life and career with a hint of wit and humor. This way, you’ll actually enjoy reading the news—and the information sticks.

Best part? Morning Brew’s newsletter is completely free. Sign up in just 10 seconds and if you realize that you prefer long, dense, and boring business news—you can always go back to it.

Join 4.3 Million Readers Now

daily opportunity + resource drops 🔍️

Subscribe to keep reading

This content is free, but you must be subscribed to introspection ft. harsehaj -> harsehaj.substack.com to continue reading.

Already a subscriber?Sign in.Not now

Reply

or to participate.