Accelerate AI, ML, and GenAI Workloads on Arm CPUs

These educational materials cater to cloud app developers of all levels, from beginners to advanced. Topic 3 specifically targets developers of AI tools, AI frameworks, and AI ISVs. These resources emphasize coding best practices, optimized AI libraries and tools, and techniques for optimizing AI and ML workloads on Arm CPUs.

TOPIC 1

Learning Objective

Build AI/ML Apps

Optimize ML inference and training performance, PyTorch 2.0, and more.

Read Tutorials

TOPIC 2

Learning Objective

Build GenAI Apps

Learn the capabilities of Arm CPUs running LLMs, SLMs, and HF models.

Learn More

TOPIC 3

Learning Objective

Accelerate GenAI, AI, and ML

Accelerate your AI/ML framework with open-source Arm libraries.

Learn More

TOPIC 1

Build AI/ML Apps

Optimize ML inference and training performance on Arm Neoverse, as well as best practices for ML inference using PyTorch 2.0, and more.

Google’s Axion powered by Arm Neoverse: Faster inference and higher performance for AI workloads

Google Axion processors, powered by Arm Neoverse V2 CPU platform, is now generally available to the public on Google Cloud. The first Axion based cloud VMs, C4A, delivers giant leaps in performance for CPU-based AI inferencing and general-purpose cloud workloads.

Best Practices to Optimize ML Performance on AWS Graviton

Improve ML performance on AWS Graviton: A series of blogs covering how to improve performance and reduce costs for ML inference, as well as NN training, and more.
Migrating to Arm – 1.8x faster Deep Learning Inference workloads in AWS Graviton3:
A case study that compares the ML inference performance with x86, achieving 1.8x faster inference workloads.
AWS Graviton3: Performance Improvement of XGBoost and LightGBM, XGBoost is used to solve regression and classification problems in data science using machine learning. LightGBM is another open-source GDBT-based tool developed by Microsoft, mostly known for more efficient training compared to XGBoost.

Deep learning inference performance on the Yitian 710

In this blog post, we focus on Alibaba Elastic Cloud Service (ECS) powered by Yitian 710 to test and compare the performance of deep learning inference.

Optimizing Inference Performance with PyTorch 2.0

Example tutorial showcasing how to achieve the best inference performance with bfloat16 kernels, and the right back-end selection.

Docker Images for TensorFlow and PyTorch on Arm

Learn how to build and use Docker images for TensorFlow and PyTorch for Arm.

TOPIC 2

Build GenAI Apps

Learn the capabilities of Arm Neoverse CPUs running LLMs and SLMs, and accelerate Hugging Face (HF) models on Arm.

LLM Performance on Arm Neoverse

Demoing LLM inference with PyTorch on Arm using Llama and AWS Graviton4.
Learn about the capabilities of Arm Neoverse v1-based AWS Graviton3 CPUs in running LLMs, showcasing the key advantages compared to other CPU-based server platforms.
Accelerated LLM inference on Arm Neoverse N2: this blog post explores the capabilities of Arm Neoverse N2 based Alibaba Yitian710 CPUs running industry-standard Large Language Models (LLMs), such as LLaMa3 and Qwen1.5, with flexibility and scalability.

LLM Chatbot on Arm

Discover how you can run an LLM chatbot with llama.cpp using KleidiAI on Arm-based servers.
Learn how to run an LLM chatbot with PyTorch using KleidiAI on Arm-based servers.

Small Language Models (SLMs)

Overview of the usability of SLMs in a more efficient and sustainable way, requiring fewer resources, and being easier to customize and control compared to LLMs.

Accelerate HF Models using Arm Neoverse

Learn about the key features in Arm Neoverse CPUs for ML, with a Sentiment Analysis use case.

Accelerate and Deploy NLP Models from HF

Learn how to Accelerate Natural Language Processing (NLP) models from Hugging Face on Arm-based servers.
A getting started guide on Running a Natural Language Processing (NLP) model from Hugging Face using PyTorch on Arm-based servers.

TOPIC 3

Accelerate GenAI, AI, and ML

Accelerate your AI/ML framework, tools, and cloud services with open-source Arm libraries and optimized Arm SIMD code.

Accelerating Pytorch Inference

Faster Pytorch Inference using Kleidi technology on Arm Neoverse.
Optimized Pytorch 2.0 Inference with AWS Graviton processors: A collaboration between AWS, Arm, and Meta, increasing performance up to 3.5 times compared to the previous PyTorch release, and more.

Arm Compute Library (ACL)

ACL is an open-source fully featured library, with a collection of low-level ML functions optimized for Arm Neoverse and other Arm architectures.

Arm Kleidi

Arm Kleidi open-source libraries are a lighter weight performance library (compared to ACL) for accelerating AI and ML workloads and frameworks.
A Getting started guide on how to accelerate GenAI workloads using Arm KleidiAI.
The Arm KleidiCV library is designed for image processing and integrates into any CV framework to enable best performance for CV workloads on Arm.
Blog presenting how Kleidi technology delivers best price-performance for Automatic Speech Recognition (ASR) on Arm Neoverse N2.

Arm SIMD code

Optimize your AI/ML workloads with Arm SIMD code, either in assembly or using Arm Intrinsics in C/C++, to leverage huge performance gains.

Arm Developer Program

Accelerate AI, ML, and GenAI Workloads on Arm CPUs

Build AI/ML Apps

Build GenAI Apps

Accelerate GenAI, AI, and ML

Build AI/ML Apps

Build GenAI Apps

Accelerate GenAI, AI, and ML

Join the Arm Developer Program

Community Support

Learn from the Community

Zach Lasiuk

Tell Us What We Are Missing

Arm Account

Register for an account

Accelerate AI, ML, and GenAI Workloads on Arm CPUs

Build AI/ML Apps

Build GenAI Apps

Accelerate GenAI, AI, and ML

Build AI/ML Apps

Build GenAI Apps

Accelerate GenAI, AI, and ML

Join the Arm Developer Program

Community Support

Learn from the Community

Zach Lasiuk

Tell Us What We Are Missing