$ miric.dev

Article

Euclidean, Cosine, or Dot Product?

A visual guide to the three distance metrics behind vector search

01

Three Ways to Measure Closeness

You have two vectors. How similar are they? That seemingly simple question has three different answers depending on what you mean by similar. Euclidean distance measures the straight-line gap between their endpoints. Cosine similarity measures the angle between them, ignoring length entirely. And the dot product blends both direction and magnitude into a single number.

Each metric answers a different question. Euclidean asks “how far apart are these points?” Cosine asks “do these vectors point the same way?” Dot product asks “how much do these vectors agree, weighted by how strong each signal is?”

Euclidean Distance

AB

d = √Σ(Aᵢ − Bᵢ)²

Straight-line distance between endpoints

Range: 0 → ∞

Cosine Similarity

θAB

cos(θ) = (A · B) / (‖A‖ × ‖B‖)

Angle between vectors, ignoring magnitude

Range: −1 → 1

Dot Product

AB

A · B = Σ(Aᵢ × Bᵢ)

Direction + magnitude combined

Range: −∞ → ∞

In practice, the choice of metric can completely change your search results. Two documents about the same topic might be “far apart” by Euclidean distance (because one is longer) yet “identical” by cosine similarity (because they point in the same direction). Understanding why is the key to choosing the right metric.

02

The Same Vectors, Different Answers

Drag the vector endpoints below and watch all three metrics update in real time. Start with the presets to build intuition, then explore freely. Pay special attention to what happens when two vectors point in the same direction but have different lengths — that’s where the metrics diverge most dramatically.

A (2.0, 1.5)B (5.0, 3.8)

Euclidean Distance

3.75

Moderate distance

Cosine Similarity

1.0000

Nearly identical direction

Dot Product

15.63

Strong agreement

Try the “Same direction, different length” preset. Cosine similarity reads 1.0000 — perfect agreement — while Euclidean distance shows a gap of nearly 5 units. This is the core trade-off: Euclidean cares about magnitude; cosine strips it away.

Now try “Orthogonal.” Cosine drops to zero (no directional relationship), but the dot product also hits zero because it factors in the angle. Opposite vectors push both cosine and dot product into negative territory — the strongest possible disagreement.

03

Why Cosine Wins for Text

Imagine two documents about cats. One is a short tweet; the other is a 10-page essay. Both point in roughly the same direction in embedding space (they’re about the same topic), but the longer document has a much larger magnitude because it contains more information.

Euclidean distance sees the gap between their endpoints and declares them far apart. Cosine similarity measures the angle between them and says same direction. Which answer is more useful for search? Almost always cosine — because direction encodes meaning and magnitude is noise.

Euclidean Distance

short doclong doc

“Far apart” — distance = 7.12

Euclidean measures the gap between endpoints. Longer document = bigger vector = larger distance.

Cosine Similarity

θ ≈ 0°short doclong doc

“Same direction” — similarity = 1.00

Cosine measures the angle between vectors. Same topic, different length? Cosine doesn’t care.

This is why cosine similarity is the default metric for text search in almost every vector database. It strips out the magnitude dimension, focusing purely on semantic direction. When a user searches “how to feed a cat,” you want to match any relevant document regardless of its length.

04

The Normalization Trick

Here’s a powerful shortcut: if you pre-normalize every vector to unit length (magnitude = 1), then the dot product becomes cosine similarity. The formula simplifies because ‖A‖ and ‖B‖ are both 1, leaving just cos(θ). And dot product is cheaper to compute — it’s just a sum of products, no square roots needed.

A · B = ‖A‖ × ‖B‖ × cos(θ)

Most modern embedding models already output normalized vectors. OpenAI’s text-embedding-3-small, Cohere’s embed-english-v3.0, Voyage, and Nomic all produce unit-length embeddings by default. If your vectors are normalized, use dotProduct as your distance metric — you get cosine semantics at dot product speed.

In MongoDB Atlas Vector Search, you can set "similarity": "dotProduct" in your index definition. This is the fastest option when your embeddings are pre-normalized — and for most modern models, they are.

05

Choosing the Right Metric

The right metric depends on two things: what kind of data you’re working with, and whether your vectors are normalized. For text embeddings from modern models, the answer is almost always dot product (because the vectors are pre-normalized). For image features or custom models that produce unnormalized vectors, cosine similarity is the safer default.

ScenarioEuclideanCosineDot Product
Text search (unnormalized)
Image feature matching
Recommendations (unnormalized)
Pre-normalized vectors
Mixed-magnitude data

Rule of thumb

Use the same distance metric your embedding model was trained with.

If your model outputs normalized vectors (OpenAI, Cohere, Voyage, Nomic), use dotProduct — it’s mathematically equivalent to cosine on unit vectors, but faster to compute.

When in doubt, use the metric your embedding model was trained with. Model documentation typically specifies whether the output is normalized and which similarity function was used during training. Mismatching the metric can degrade search quality even if the embeddings themselves are excellent.

MongoDB Atlas Vector Search supports all three: cosine, euclidean, and dotProduct. You choose the metric when you define the index — and you can always reindex if your model or requirements change.