Overview

Bringing Enterprise-Ready AI to Cost-Efficient Compute

Small language models (SLMs) are revolutionizing AI today. They're purpose-built for specific tasks, data, and requirements, with significantly fewer parameters—making them fast, efficient, and cost-effective. They deliver comparable performance to larger models while significantly reducing hardware and operational costs.


Arcee AI specializes in SLMs optimized for cost-effective inference, making them ideal for enterprise workflows, edge applications, and agentic AI systems. To maximize efficiency and scalability, Arcee AI can run their models on Arm-based CPUs, leveraging the unique combination of performance, cost-efficiency, and scalability.

Impact
Arm cost effective with dollar sign icon

No need for expensive GPU instances.

Arm 4 times accelerate GPU development icon

Up to 4x acceleration using quantized models and Arm Kleidi.

Arm multi AI agent icon

Enables multiple AI agents to work in parallel.

“We are at the tipping point where we need to run SLMs to deliver the best ROI for enterprise use cases. That means running on CPU platforms. Our obvious choice today is to use Arm platforms in the cloud and outside of the cloud.”
Julien Simon, Chief Evangelist at Arcee AI
dimension entrance combine neon-electric mesh network
Technologies Used

Unlocking up to 4x Performance Improvements With Arm Optimizations

Arcee AI has conducted benchmarking of its Virtuoso Lite 10-billion parameter model, demonstrating 3-4x acceleration by moving from 16-bit to 4-bit quantization on Arm CPUs while leveraging Arm KleidiAI technology.

This delivers significant cost-performance advantages, reducing cloud expenses while maintaining model quality. Rather than relying on expensive and increasingly scarce GPUs, Arcee’s models run efficiently on Arm-based cloud instances, including those from AWS, Google Cloud, and Microsoft Azure, as well as edge devices and data center hardware.

A few stack mixed color blocks under AI circus.

Enterprises Using Arm for Agentic AI

As enterprises increasingly desire scalable, cost-efficient AI, Arcee AI is at the forefront of this transformation. Imagine a future where AI is not just about a single, large model but rather a system of multiple specialized SLMs working together. This approach powers agentic AI workflows, allowing businesses to deploy 10, 20, or even 30 models in parallel for tasks such as customer support automation, fraud detection, and real-time decision-making.

By running distributed SLMs on Arm CPUs, businesses can process large-scale workloads in parallel—maximizing efficiency, scalability, and cost savings. Arcee AI’s industry-leading models combined with Arm-based CPUs enables enterprises to deploy high-performance SLMs today and, as we look ahead, will be the agentic AI platform of choice.

Explore Similar Stories

Stability AI

Transforming On-Device Audio AI

Stability AI partnered with Arm and used Arm KleidiAI to transform on-device audio creation, reducing response times from minutes to seconds on Arm mobile CPUs.

Meta AI Technologies

Seamless AI Development

Open-source frameworks and models from Meta pave the way for revolutionizing the future of AI innovation at scale on Arm.

Arm for GitHub Copilot Extension

Enabling Cloud Development on Arm

Turn a complex process into an intuitive, AI-guided experience with AI-powered development capabilities.

Discover More Success Stories