Semantic models in AI-based semantic search

4 min read

Artificial intelligence (AI) has become a recurrent element in people’s professional and personal life, more and more present in daily tasks thanks to hundreds of thousands of AI-based applications.

Natural language processing (NLP), understanding (NLU), and generation (NLG) programs are systems generated with AI and built to understand, for example, how humans talk, write, communicate, or even interact. These systems have allowed the creation of semantic models that can learn how to do specific tasks, like predicting the end of the sentence you’re typing. Thanks to NLP, NLU, and NLG, a revolution has come to the shoppers’ search experience, as they can leverage an advanced search based on vectors that use algorithms that understand the semantic meaning and context of search queries and documents.

However, the generation of these systems requires a considerable amount of time and resources, as it must be ensured that a large, well-labeled dataset is used to tackle a specific task. That’s why the starting point for semantic search is the semantical foundation models, which are AI large language models (LLM) trained on a vast corpus of unlabeled data, often using self-supervised learning, that can power a wide variety of downstream tasks.

interact

Wondering how these foundation models are fine-tuned and labeled? Keep on reading the section Fine-tuning semantic models.

Foundation models in Empathy Platform

Like in the software world, there are two types of foundation models depending on their source:

Closed-source foundation models: normally, they are end-to-end applications used to create these models or integrations with APIs.
Open-source foundation models: these normally are model hubs that host the foundation models. Applications or APIs can be created based on these hubs.

Semantic models in semantic search

Empathy Platform leverages open-source foundation models to create semantic search experiences mainly for data privacy and integrity reasons, establishing privacy and consent controls that reinforce customers' trust and brands' safeguarding of reputation.

Fine-tuning semantic models

Semantic models can extend into any domain with the following step, the tuning. They’re trained with proprietary tunning data, specific well-labeled domain information to fine-tune the model for the performance of specific tasks.

At Empathy Platform, open-source foundation models are trained with query-click and query-product combinations to create semantic associations grounded in consent integrity, anonymous, and session-based customers’ interactions. Therefore, the domain-based model ensures data privacy and integrity.

Protecting privacy and integrity

The creation of a foundation model requires a huge amount of training data that leads to consent integrity and privacy problems. It’s not possible to ensure and track consent and privacy integrity when working with trained models. As they hold and process this underlined information, it’s required that individuals who performed the actions that feed the models have given their consent (via banners, pop-ups, and roll-downs).

Empathy is establishing privacy controls as a firewall against legal and reputational risks for brands. If there is no data subject consent, there is no data integrity. Therefore, Empathy Platform tries to verify the data integrity, user confidentiality, and consent despite the origin of the data the models were trained with cannot be controlled.

To avoid the impact of dirty datasets from the foundation models used, first single-domain content integrity should be ensured by tuning the models to the specific use case. Besides fine-tuning the model training, the impact of the foundational model weighting can be minimized, reducing the impact of potential non-integral data while working on strategies to validate the integrity of these sources. Then, an ePrivacy stress test is executed to guarantee and safeguard the reputation of retailers and brands, leveraging the AI opportunities based on confidentiality that do not compromise trust.

Leveraging semantic models with Semantics API

Based on NLP foundation models fine-tuned with the customer domain proprietary datasets, the Empathy Platform Semantics API brings semantic similarities between queries, thus, between products from the merchandiser’s product catalog. The Semantics API is leveraged to create semantic-based search experiences that complement keyword search by yielding faster and more relevant results.

Semantic search overcomes the most frustrating shoppers’ search experiences, such as zero or partial results, misspellings, or low results.

interact

Check out real use cases to get more insights about how Empathy Platform helps you manage these situations.

As well, semantic models are used to improve search effectiveness and relevance by combining the strengths of keyword-based and semantic-based indexing. Thus, the product catalog can be enhanced with an attribute enrichment at index time, which helps avoid possible consequential issues from semantics implementation on search time as well as improving the search performance at query time.

Combining the strengths of keyword and semantic search allows the development of a hybrid search solution that enables the effective addressing of both long-tail scenarios and enhances search results' relevance.

interact

Read more about how Semantic and keyword search as a unified index can enhance your shoppers' search experience.