Integrate Custom LLMs from SageMaker with Strands Agents

Organizations are increasingly deploying custom Large Language Models (LLMs) on Amazon SageMaker AI real-time endpoints, utilizing powerful serving frameworks like SGLang, vLLM, or TorchServe. This approach offers enhanced control, optimized costs, and compliance alignment. However, a common hurdle arises when integrating these custom LLMs with Strands agents: a mismatch in response formats.

What It Does

Strands agents are designed to expect model responses in the Bedrock Messages API format. In contrast, custom LLMs hosted on SageMaker AI endpoints, especially when served with frameworks like SGLang, typically return OpenAI-compatible formats. This incompatibility leads to integration errors, such as a TypeError: 'NoneType' object is not subscriptable, because the Strands Agents' default SageMakerAIModel expects a specific structure that isn't provided.

The solution involves implementing custom model parsers. These parsers extend the SageMakerAIModel class, acting as a translator that converts your custom model server's OpenAI-compatible response into the Bedrock Messages API format that Strands agents understand. This enables seamless communication between your custom SageMaker-hosted LLM and your Strands agents.

For instance, this approach can be used to deploy Llama 3.1 with SGLang on SageMaker, and then integrate it effectively with Strands agents via a custom parser.

Why It Matters

This capability is crucial for organizations that need the flexibility and control of hosting custom LLMs on SageMaker AI. By addressing the response format challenge, you can:

Leverage Preferred Frameworks: Continue using serving frameworks like SGLang, vLLM, or TorchServe that offer specific performance or cost benefits.
Maintain Control: Gain greater control over your model deployments, data, and infrastructure, aligning with unique compliance requirements.
Unlock Agent Potential: Fully utilize Strands agents with a broader range of custom LLMs, enhancing conversational AI applications and workflows without being restricted by API formats.

While the Amazon Bedrock Mantle distributed inference engine has supported OpenAI messaging formats since December 2025, the flexibility of SageMaker AI means customers can host various foundation models, some requiring esoteric prompt and response formats that necessitate custom parsing for optimal integration.

How to Get Started

Implementing this solution involves a few key steps. First, you'll deploy your LLM on SageMaker. A powerful tool for this is awslabs/ml-container-creator, an AWS Labs open-source Yeoman generator. This generator automates the creation of SageMaker BYOC (Bring Your Own Container) deployment projects, generating essential artifacts like Dockerfiles, CodeBuild configurations, and deployment scripts for your LLM serving containers. You can find more details and contribute at the ml-container-creator on GitHub.

Once your model is deployed and returning responses, you'll then focus on building your custom parser. This involves creating a class that extends SageMakerAIModel and implements the necessary logic to transform your model's output into the Bedrock Messages API format. The overall implementation typically consists of three layers: the Model Deployment Layer (your LLM on SageMaker), the Parser Layer (your custom LlamaModelProvider or similar), and the Agent Layer (your Strands Agents Documentation instance utilizing the custom provider).

Read more: Building Custom Model Provider for Strands Agents for a detailed walkthrough and code examples to integrate your custom LLMs with Strands Agents today.

Integrate Custom LLMs from SageMaker with Strands Agents

What It Does

Why It Matters

How to Get Started

Read next

Cursor Enhances Design Mode with Multi-Select and Voice Input

Get notified when our newsletter launches