Seamless Performance Acceleration for AI Developers Everywhere 

To scale the AI opportunity, developers need access to the fastest methods of AI deployment, together with optimal performance for the most demanding next-generation workloads. Arm Kleidi, inspired by the Greek word for ‘key,’ aims to provide the key to effortless AI acceleration on Arm for every developer, every model, and every workload. Kleidi intends to encompass a broad program of software and community engagements for accelerating AI.

The First Deliverable of Arm Kleidi: Arm Kleidi Libraries

Kleidi libraries

The Arm Kleidi program begins with our new KleidiAI and KleidiCV performance libraries – lightweight kernels designed for rapid and easy integration into the most popular AI frameworks.

Arm Kleidi Libraries are a lightweight suite of highly performant open-source Arm routines that make it easy for any machine learning (ML) and computer vision (CV) framework to target optimum performance and leverage the latest features for enhancing AI and CV in initially Arm Cortex-A CPU designs. The libraries are designed for ease of adoption into C or C++ ML and AI frameworks and to achieve significant acceleration for the models that run on them.

Arm Kleidi can help ensure that frameworks and compilers now accelerate AI so that developers don’t have to.

Features and Benefits of Arm Kleidi Libraries

Flexible Kernel Assortment 

Arm Kleidi provides a flexible assortment of kernels for enhancing AI on frameworks. It offers broad scope for multifaceted AI advancement on Arm, from enabling more capability or accuracy for AI, to achieving accelerations or reducing memory overhead.

Concise, Efficient, and Light

The new KleidiAI and KleidiCV performance libraries are incredibly lightweight and concise. They carry no memory allocation and have no other library dependencies or binary release. This makes them easier to adopt and integrate into existing framework codebases quickly and efficiently.

Unleashing Mass-Market AI Performance

As Kleidi helps optimize AI at the framework level, each optimization can benefit hundreds of workloads across billions of Arm-based devices. Application developers simply run models on Kleidi-optimized frameworks to achieve top performance by default.

Enabling Generative AI at Scale

Kleidi helps maximize the ease and speed with which the most demanding AI inference workloads can be deployed on Arm. The KleidiAI library helps bring best-in-class performance to the exploding market of generative AI and large language models (LLMs), deployed from cloud datacenters to constrained devices at the edge.

Optimizing AI for Everyone - Everywhere

Kleidi aims to enable easy optimization from cloud to edge across the full range of Arm Neoverse and Arm Cortex-A CPUs. The performance libraries leverage specific technologies for enhancing AI functions in the Arm architecture, such as Arm Neon, Arm Scalable Vector Extensions (SVE), and Arm Scalable Matrix Extensions (SME).

Low Overhead for Developers

The vision for Arm Kleidi libraries is to directly integrate them into key AI frameworks, including MediaPipe (via XNNPACK), LLAMA.ccp, PyTorch (via ExecuTorch), and TensorFlow Lite (via XNNPACK).

Once integrated, developers can automatically benefit from performance enhancements for the Kleidi-optimized frameworks without any direct overhead for them.

Get Started with Arm Kleidi

Access the available software within our growing suite.

 KleidiAI on GitLab

Performance library for all AI frameworks

 KleidiCV on GitLab

Performance library for computer vision frameworks

Use Cases

Advancing Inference Everywhere on Arm CPU

Generative AI

KleidiAI is enabling optimal performance for some of the world’s most advanced language models on Arm Cortex-A CPUs. The KleidiAI library has already demonstrated accelerated performance for LLAMA, Meta’s advanced open-source LLM and Phi, Microsoft’s highly capable small language model (SLM), by up to 190 percent based on framework optimizations.

Computer Vision

Alongside emerging AI use cases, Arm Kleidi also benefits traditional computer vision use cases. An example of this is OpenCV, the world’s largest computer vision library containing over 2,500 algorithms and supporting hundreds of thousands of developers.

After running a variety of image processing operations based on KleidiCV integrations, OpenCV identified a typical performance uplift of 75 percent.

AI in Gaming

Unity Sentis empowers game developers to create innovative, AI-driven gameplay experiences on all Unity Engine-supported devices. After using quantization to reduce AI model size, Unity Sentis subsequently used KleidiAI to improve model speed across Arm Cortex-A CPU architecture. By integrating KleidiAI, Unity Sentis achieved a 660 percent performance uplift for AI on its platform.

Talk with an Expert

If you have any questions about Kleidi, talk to an Arm expert.

Contact Us