Article

What Are Vector Embeddings?

A visual, from-scratch guide to the numbers behind semantic search, RAG, and recommendations

Points in Space

A GPS coordinate is two numbers — latitude and longitude — and those two numbers are enough to pinpoint any location on Earth’s surface. A vector embedding works the same way, except instead of two dimensions you get hundreds or thousands, and instead of geographic location the numbers capture meaning.

The key insight: the math doesn’t change as you add dimensions. A point in 2D space is a list of two numbers. A point in 768-dimensional space is a list of 768 numbers. The operations — distance, angle, nearest neighbor — all work identically.

Vector representation

[3, 4]

Two numbers pinpoint a location on a flat plane — like GPS coordinates.

When an embedding model processes a word, sentence, or image, it outputs exactly this kind of list — a fixed-length array of floating-point numbers. The model has learned, through training on vast amounts of data, which position in this high-dimensional space best represents the concept behind the input.

What Each Dimension Means

To build intuition, imagine dimension #0 measures “animal-ness.” A cat scores +0.82 (strongly animal), a car scores −0.90 (strongly not animal), and love scores +0.03 (barely relevant). Every number has two parts: the sign tells you which direction, and the magnitude tells you how strongly. (In real models the dimensions are entangled — each one encodes a mixture of features rather than a single clean concept — but the sign-and-magnitude mechanics are the same.)

+0.82

Sign — Direction

+ means the concept has this feature

− means the concept lacks this feature

Magnitude — Strength

82% association strength

Dimension #0 “animal-ness” across concepts

cat

+0.82Strong positive

car

-0.90Strong negative

love

+0.03Weak positive

Now scale that to all 32 dimensions at once. Each row below is a concept, each column is a dimension, and the color tells you the value — green for positive, red for negative, brighter for stronger. Notice how “cat” and “dog” share similar color patterns while “car” looks completely different.

Scroll horizontally to explore all 32 dimensions →

cat

dog

car

love

king

Negative

Weak

Strong positive

Similar Things Cluster Together

If two concepts have similar embeddings — their lists of numbers point in roughly the same direction — they end up close together in the vector space. The standard measure is cosine similarity: it compares the angle between two vectors, ignoring their length. A score of 1.0 means identical direction, 0 means orthogonal (no directional relationship), and −1 means opposite.

In the constellation below, thicker lines mean higher similarity. Hover over a concept to see its connections. Cat and dog are close because they share many semantic features. Car is far from both because it lives in a different region of the space entirely.

Tap a node above, or browse all pairs

cat ↔ dog0.84

cat ↔ love0.40

dog ↔ love0.40

car ↔ king0.24

dog ↔ king0.19

love ↔ king0.18

This is the core property that makes embeddings useful in practice. Vector search finds the nearest neighbors to a query embedding — the documents, products, or images whose meaning is closest to what the user asked for. RAG (Retrieval-Augmented Generation) uses the same principle to fetch relevant context before an LLM generates a response. Recommendation engines surface items whose embeddings are close to what the user has already engaged with.

How Embeddings Are Learned

Embedding models don’t have a hard-coded list of dimension meanings. Instead, they learn useful positions through training. A text embedding model reads billions of text examples and learns to assign positions such that words appearing in similar contexts end up nearby. The technique was popularised by Mikolov et al. (2013): “king” and “queen” share context (royalty, leadership, ceremony) and land close together; “king” and “sandwich” do not.

The same principle extends beyond text. Image embedding models learn from labelled or unlabelled images, placing visually similar images close together. Multi-modal models like CLIP learn from image-text pairs and map both modalities into the same space, so a text query can retrieve a relevant photograph.

The practical takeaway: you don’t need to train your own model. Pre-trained embedding models from OpenAI, Cohere, Hugging Face, and others produce high-quality vectors out of the box. Your job is to generate embeddings for your data, store them in a vector database (like MongoDB Atlas Vector Search), and query by similarity at read time.

That pipeline — embed, store, search — is the foundation of most modern AI applications that need to understand meaning rather than match keywords.