Fine-tuning Mistral for an enhanced content search experience (part III)

Legacy article

This post reflects an earlier stage of Empathy Platform development. Some of the tools, integrations, or approaches described here are no longer in use in our current stack.

Since then, our focus has evolved towards self-hosted, private, and sustainable AI infrastructure, where compute is treated as part of the product itself. All AI compute is currently run on Empathy’s own GPU environment, hosted in our net-zero energy, bioclimatic private cloud in Asturias.

Rather than updating this article to fit our current approach, we’ve chosen to preserve it as a record of our R&D history and innovation journey.

For our current approach and latest developments, explore our main blog section.

In our ongoing series about fine-tuning Mistral for an enhanced developer portal search experience, we’ve explored the general overview and the backend perspective in our previous posts. In this third part of the series, we’ll dive into the infrastructure perspective, highlighting how we ensured flexible, secure, and efficient deployment of the Mistral model. The fourth post will dive deep into the UX perspective.

Bridging backend and frontend with robust infrastructure

To achieve a seamless and intuitive search experience, it’s essential to have a solid infrastructure that supports both the backend data processing and the frontend user interaction. Our infrastructure strategy focused on providing a flexible and scalable environment that could adapt to the evolving landscape of AI technologies.

So, let’s take a look at some key considerations in infrastructure design:

Avoiding vendor lock-in

One of our main objectives was to avoid vendor lock-in, ensuring that our architecture could evolve without being tied to a specific provider. We achieved this by:

Choosing open standards that allow our infrastructure to remain adaptable and not depend on any single vendor.
Implementing AI gateways to manage interactions between clients and multiple AI model providers, offering a unified API for smooth integration.

Leveraging AI gateways

We chose to implement an open-source AI gateway called LiteLLM (opens new window). This gateway serves as a proxy between the client and various AI model providers, offering a unified API compatible with OpenAI standards. This setup provides several benefits:

Flexibility in model selection so we can easily switch between different models and providers behind the scenes without disrupting the user experience.
Unified API for developers to interact with a single API, simplifying integration and reducing complexity.
Risk mitigation so we can reduce risks associated with potential service availability or pricing changes by not being dependent on a single vendor.

Current infrastructure setup

Our current infrastructure setup integrates both self-hosted and vendor-provided solutions. Let’s take a look at how we develop our current infrastructure:

Fig. Infrastructure to facilitate the deployment of the Mistral model

Base models on AWS Bedrock: We use AWS Bedrock (opens new window) because it seamlessly integrates with our existing AWS infrastructure. This provides a reliable and scalable environment for running foundation models.
Self-hosted customized model: The fine-tuned Mistral model is deployed inside a container on our Kubernetes architecture. This allows us to control the model and ensure it meets explicitly our requirements. However, we’ve kept the option open to integrate other model providers as needed.
AI gateway integration: The LiteLLM gateway acts as a router, providing a single and OpenAI compatible API for accessing both the foundation and self-hosted customized models, enhancing thus the developer experience.

Future-proofing our infrastructure

As the landscape of AI technology evolves, we remain committed to exploring new ways to enhance our infrastructure. For instance, we’re awaiting approval to import custom models into AWS Bedrock, giving us even more flexibility and options for future development.

Summing up infrastructure advancements

The infrastructure perspective facilitates the integration of backend data processing and frontend user interaction. By focusing on flexibility, avoiding vendor lock-in, and leveraging AI gateways, we’ve created a robust infrastructure that supports our enhanced developer portal search experience, as well as other initiatives revolving around LLMs.

Building search, differently

What started as early experimentation has evolved into a more integrated way of building AI search. Today, our work is centered around Empathy.AI, the space where we design and develop AI search systems grounded in self-hosted, private, and sustainable infrastructure.

By treating compute as part of the product itself, we gain greater control, efficiency, and long-term scalability.

Want to explore how this approach shapes what we build today? Discover more about Empathy.AI (opens new window) or dive into our latest articles.