Achieving High-Precision Search Similarity & Vector-Based Ranking at Scale

How Alkane Live Engineered a Hybrid Search System for Relevance, Speed & Semantic Accuracy

Author: Guillaume Barbat, Founder & R&D – Noélabs / Alkane Live

Abstract

This white paper describes how Alkane Live implemented a high-performance hybrid search system combining trigram-based similarity and vector-powered ranking on top of PostgreSQL and Django. By leveraging GIN indexes with pg_trgm and PostgreSQL full-text search via SearchVector and SearchRank, we significantly improved relevance, robustness to typos, and latency for product search across our live commerce ecosystem. The approach is aligned with state-of-the-art work in information retrieval and neural ranking models, including research published on arXiv.

1. Context: Search in a Live Commerce Environment

Alkane Live operates in a fast-moving marketplace with:

Thousands of product variants across TCG, sneakers, collectibles, and more.
Highly dynamic availability: items move from “upcoming” to “live auction” to “sold” in minutes.
User queries that are often noisy, multi-lingual, and full of approximate names or typos.

Classic LIKE-based search or naive full-text search quickly showed their limits:

Poor tolerance to spelling mistakes and shorthand terms.
Low semantic relevance (exact tokens but wrong intent).
Unacceptable response times when scaling to thousands of SKUs.

We needed a system capable of:

Fuzzy matching (“charizard ex” vs “charizrd ex”).
Multi-field ranking (product title, variant title, description).
Stable performance under concurrent traffic from viewers and streamers.

2. Design Overview: A Hybrid Search Stack

Our solution is a hybrid search architecture that combines:

Trigram similarity with GIN indexes for fuzzy text matching and typo tolerance.
PostgreSQL full-text search (via SearchVector + SearchRank) for semantic ranking across multiple fields.
Business-aware ordering layered on top (freshness, activity, availability).

This hybrid composition gives us:

High recall (we rarely “miss” the right product).
Better precision (the most relevant variants appear first).
Predictable performance under load.

3. Indexing Strategy: GIN + Trigram

We use PostgreSQL GIN indexes with the gin_trgm_ops operator class on key text fields that influence search:

GinIndex(
    name="streamer_pv_title_trgm_gin_idx",
    fields=["title"],
    opclasses=["gin_trgm_ops"],
),
GinIndex(
    name="streamer_pv_desc_trgm_gin_idx",
    fields=["description"],
    opclasses=["gin_trgm_ops"],
),

With the pg_trgm extension enabled, PostgreSQL computes trigram signatures of strings, allowing us to:

Measure similarity between user queries and product titles/descriptions.
Handle approximate matches and minor spelling errors without custom logic.
Use GIN indexes to keep lookups fast, even on large tables.

Conceptually, this aligns with classical approximate string matching techniques and practical work on n-gram–based indexing frequently discussed in information retrieval literature.

4. Query Pipeline: Combining Similarity & Vector Ranking

On the Django side, we built our query pipeline using SearchVector, SearchQuery, SearchRank and TrigramSimilarity:

from django.contrib.postgres.search import SearchVector, SearchQuery, SearchRank
from django.contrib.postgres.search import TrigramSimilarity
from django.db.models import Q

product_qs = (
    ProductVariant.objects.filter(is_active=True)
    .select_related('product', 'product__user', 'user')
    .annotate(
        similarity=TrigramSimilarity('product__title', query) * 8,
        search_rank=SearchRank(
            SearchVector('product__title', 'title', 'description'),
            SearchQuery(query, search_type='websearch')
        )
    )
    .filter(
        Q(similarity__gt=0.15) |
        Q(search_rank__gt=0.1)
    )
    .order_by('-search_rank', '-similarity', '-updated_at')[:15]
)

print(f'Product variants found: {product_qs.count()}')

Key aspects of this design:

Hybrid scoring: we compute both a similarity score (trigram) and a search_rank score (full-text vector).
Weighted similarity: we multiply trigram similarity by 8 to emphasize product-level matching: TrigramSimilarity('product__title', query) * 8.
Web search mode: using SearchQuery(query, search_type='websearch') gives us a more natural query interpretation (handling AND/OR, quotes, etc.).
Thresholding: we keep only results that pass at least one relevance threshold (similarity__gt=0.15 or search_rank__gt=0.1).
Ordering logic: we sort by search_rank, then similarity, then updated_at to promote fresher items when relevance is comparable.

This effectively turns PostgreSQL into a lightweight ranking engine, while keeping the implementation fully compatible with Django ORM and avoiding external search infrastructure for this layer.

5. Results & Impact on Alkane Live

After deploying this hybrid search strategy in production on Alkane Live, we monitored both system-level metrics and user behavior. Over several weeks, we observed:

Search relevance: a significant drop in “no result” scenarios and user re-queries.
Typos & robustness: users who typed imperfect or partial card names still reached the intended product variants.
UX & conversion: an uplift in click-through rate from search results to product pages, and a higher rate of bids or adds-to-cart originating from search.
Latency: median search latency stabilized below our target threshold thanks to GIN indexing and trimmed result sets ([:15]).

While exact numbers are internal, the combination of trigram similarity + search vectors clearly outperformed both our initial naive implementation and pure full-text approaches.

6. Relation to Academic Research

Our implementation sits at the intersection of classical full-text search and more recent work on neural ranking models and semantic similarity. While we intentionally started with a pragmatic PostgreSQL-based stack, the conceptual architecture is compatible with more advanced vector and neural ranking extensions.

The following arXiv papers explore similar problematics around ranking, vector representations, and semantic retrieval:

“Neural Models for Information Retrieval” – Bhaskar Mitra, Nick Craswell https://arxiv.org/abs/1705.01509
“A Deep Look into Neural Ranking Models for Information Retrieval” – Jiafeng Guo et al. (Survey on neural ranking models, vector representations, and ranking strategies.) https://arxiv.org/abs/1903.06902

These works provide the theoretical and empirical foundation for moving from lexical matching to representation learning and vector-based ranking. Our current implementation is a deliberately infrastructure-light, production-oriented step in that direction, and serves as a bridge toward deeper neural retrieval integration in future iterations of Alkane Live.

7. Roadmap: From Hybrid Search to Full Semantic Retrieval

The hybrid PostgreSQL approach has proven robust and effective as a first layer. Our roadmap builds on this foundation in three directions:

Richer feature fields: progressively integrating additional fields into the search vectors (category, tags, language, collection metadata) to capture more context.
Embedding-based retrieval: introducing dense vector embeddings (e.g. from transformer models) for product titles and descriptions and combining them with the current score as an additional relevance signal.
Personalized ranking: using user-level behavioral signals to adapt score weighting (e.g. boosting categories or sets that a user is statistically more likely to engage with).

In all cases, the guiding principle remains the same: combine relevance, performance and interpretability in a way that respects production constraints and the expectations of our community.

8. Conclusion

By carefully combining trigram-based similarity, PostgreSQL full-text vectors, and business-aware ranking, Alkane Live has built a search experience that is not only fast and robust, but also genuinely helpful to users navigating a dense and evolving catalog.

This white paper, authored by Guillaume Barbat, documents the current iteration of our search engine and positions it in the broader context of modern information retrieval research. It is both a snapshot of what powers Alkane Live today and a foundation for the next generation of semantic and personalized search capabilities within our platform.

Achieving High-Precision Search Similarity & Vector-Based Ranking at Scale

Guillaume.B

Achieving High-Precision Search Similarity & Vector-Based Ranking at Scale

How Alkane Live Engineered a Hybrid Search System for Relevance, Speed & Semantic Accuracy

Abstract

1. Context: Search in a Live Commerce Environment

2. Design Overview: A Hybrid Search Stack

3. Indexing Strategy: GIN + Trigram

4. Query Pipeline: Combining Similarity & Vector Ranking

5. Results & Impact on Alkane Live

6. Relation to Academic Research

7. Roadmap: From Hybrid Search to Full Semantic Retrieval

8. Conclusion

Voir aimerez aussi

Unlocking real-time recommendations: A Noelabs deep dive in…

Par Guillaume.B

GIN indices and advanced search vector

Par Guillaume.B

Achieving High-Precision Search Similarity & Vector-Based R…

Par Guillaume.B

Let's talk about your AI agency goals

Prêt à commencer ?

Lead Enrichment AI

Fraud Detection AI

Achieving High-Precision Search Similarity & Vector-Based Ranking at Scale

Guillaume.B

Abstract

1. Context: Search in a Live Commerce Environment

2. Design Overview: A Hybrid Search Stack

3. Indexing Strategy: GIN + Trigram

4. Query Pipeline: Combining Similarity & Vector Ranking

5. Results & Impact on Alkane Live

6. Relation to Academic Research

7. Roadmap: From Hybrid Search to Full Semantic Retrieval

8. Conclusion

Voir aimerez aussi

Unlocking real-time recommendations: A Noelabs deep dive in…

Par Guillaume.B

GIN indices and advanced search vector

Par Guillaume.B

Achieving High-Precision Search Similarity & Vector-Based R…

Par Guillaume.B

Let's talk about your AI agency goals