Fine-tuning Mistral for an enhanced content search experience (part III)

In our ongoing series about fine-tuning Mistral for an enhanced developer portal search experience, we’ve explored the general overview and the backend perspective in our previous posts. In this third part of the series, we’ll dive into the infrastructure perspective, highlighting how we ensured flexible, secure, and efficient deployment of the Mistral model.

Bridging backend and frontend with robust infrastructure

To achieve a seamless and intuitive search experience, it’s essential to have a solid infrastructure that supports both the backend data processing and the frontend user interaction. Our infrastructure strategy focused on providing a flexible and scalable environment that could adapt to the evolving landscape of AI technologies.

So, let’s take a look at some key considerations in infrastructure design:

Avoiding vendor lock-in

One of our main objectives was to avoid vendor lock-in, ensuring that our architecture could evolve without being tied to a specific provider. We achieved this by:

Choosing open standards that allow our infrastructure to remain adaptable and not depend on any single vendor.
Implementing AI gateways to manage interactions between clients and multiple AI model providers, offering a unified API for smooth integration.

Leveraging AI gateways

We chose to implement an open-source AI gateway called LiteLLM (opens new window). This gateway serves as a proxy between the client and various AI model providers, offering a unified API compatible with OpenAI standards. This setup provides several benefits:

Flexibility in model selection so we can easily switch between different models and providers behind the scenes without disrupting the user experience.
Unified API for developers to interact with a single API, simplifying integration and reducing complexity.
Risk mitigation so we can reduce risks associated with potential service availability or pricing changes by not being dependent on a single vendor.

Current infrastructure setup

Our current infrastructure setup integrates both self-hosted and vendor-provided solutions. Let’s take a look at how we develop our current infrastructure:

Fig. Infrastructure to facilitate the deployment of the Mistral model

Base models on AWS Bedrock: We use AWS Bedrock (opens new window) because it seamlessly integrates with our existing AWS infrastructure. This provides a reliable and scalable environment for running foundation models.
Self-hosted customized model: The fine-tuned Mistral model is deployed inside a container on our Kubernetes architecture. This allows us to control the model and ensure it meets explicitly our requirements. However, we’ve kept the option open to integrate other model providers as needed.
AI gateway integration: The LiteLLM gateway acts as a router, providing a single and OpenAI compatible API for accessing both the foundation and self-hosted customized models, enhancing thus the developer experience.

Future-proofing our infrastructure

As the landscape of AI technology evolves, we remain committed to exploring new ways to enhance our infrastructure. For instance, we’re awaiting approval to import custom models into AWS Bedrock, giving us even more flexibility and options for future development.

Summing up infrastructure advancements

The infrastructure perspective facilitates the integration of backend data processing and frontend user interaction. By focusing on flexibility, avoiding vendor lock-in, and leveraging AI gateways, we’ve created a robust infrastructure that supports our enhanced developer portal search experience, as well as other initiatives revolving around LLMs.

Try it out!

Love AI. Love privacy. Type away! on our search bar above and try out our new HolonSearch Privacy-First Generative Content Search right now!

Keep reading!

This infrastructure perspective is part of our broader effort to innovate and improve user search interactions. Read more about our journey from the UX perspectives.