Cortex-A9 Processor

Cortex-A9 Processor Image (View Larger Cortex-A9 Processor Image)
The ARM® Cortex®-A9 processor is the power-efficient and popular high performance choice in low power or thermally constrained cost-sensitive devices.

It is currently shipping in increasing volumes in smartphones, digital TV, consumer and enterprise applications enabling your connected life. Cortex-A9 is available as a single processor solution offering an overall performance enhancement of well above 50% compared to ARM Cortex-A8 solutions. Cortex-A9 MPCore offers up to four processors delivering when needed, on lightweight workload as well as peak performance. Its configurability and flexibility allows Cortex-A9 to scale across a wide variety of markets and applications.

Cortex-A9 is available with either synthesizable or hard-macro implementations. ARM Physical IP is available to support a synthesizable flow optimized for lowest power or highest performance, as well as a choice of hard-macros reducing risk and shortening time-to-market to a minimum. Enhanced ARM Graphics IP like Mali-T624 as well as ARM System IP such as CoreLink NIC-400/301 network interconnect and CoreLink DMC-342 dynamic memory controller allow a rapid system design. ARM Development Suite 5 (DS-5™) tools and enhanced CoreSight Debug & Trace IP like CoreSight SoC-400 and CoreSight Design Kit for Cortex-A9 (DK-A9) allow instant software development that is backed by a broad software ecosystem. 



The ARM Cortex-A9 processor delivers exceptional capabilities for less power than consumed by high performance computer platforms, including

  • Scalable up to four coherent cores with advanced MPCore™ technology
  • Increased power efficiency with higher performance for lower power consumption
  • Increased peak performance for most demanding applications
  • Low power targeted single core implementations for cost sensitive devices
  • Optional NEON™ media and/or floating point processing engine

Cortex-A9 is a high performance ARM processor implementing the full richness of the widely supported ARMv7 architecture. Designed around the high efficiency, dual-issue superscalar, out-of-order, speculating dynamic length pipeline (8 – 11 stages), Cortex-A9 delivers exceptional levels of performance and power efficiency with the functionality required for leading edge products across the broad range of consumer, networking, enterprise and mobile applications.

The Cortex-A9 micro-architecture is supporting the configuration of 16, 32 or 64KB four way associative L1 caches, with up to 8MB of L2 cache through the optional L2 cache controller. The scalable multicore and single processor solution provide the broadest flexibility and are each suited to a variety of applications and markets. Download the Cortex-A9 whitepaper

Download the Cortex-A9 whitepaper


Cortex-A9 applications range from mobile handsets through to high-performance consumer and enterprise products, for example:

  • Mainstream Smartphones
  • Tablets
  • Set-top boxes
  • Home Media Player
  • Auto Infotainment
  • Residential Gateway
  • First generation ARM low-power server

Cortex-A9 Hard-macro Implementations for TSMC 40G and 40LP processes

In addition to the single and multicore soft processors, the Cortex-A9 product portfolio also includes two popular dual-core hard-macro implementations for the TSMC 40G and 40LP processes.

These hard macro implementations provide the fastest time to market for SoC designers, through a ready to use, best in class solution. They also significantly reduce the implementation costs. These macros utilize the ARM Physical IP POP and advanced, aggressive low-power implementation techniques to deliver a highly optimized and assured PPA within the power envelope of compact, high-density and thermally constrained environments. The integrated design allows SoC designers time to focus on their key capabilities.

Dual Cortex-A9 TSMC 40G Hard Macro (Power Optimized): In many thermally constrained applications such as set-top boxesDTVs, printers and other feature-rich consumer and high-density enterprise applications, energy efficiency is of paramount importance. The Cortex-A9 power-optimized hard macro implementation delivers its peak performance of 4000 DMIPS while consuming less than 250mW per CPU when selected from typical silicon.

Dual Cortex-A9 TSMC 40LP Hard Macro (Performance Optimized): This implementation is optimized to obtain the best performance on the low power TSMC 40LP process. Ideally suited to SoCs operating within a tight power budget, yet wishing to provide a high level of performance and functionality, this hard macro is capable of delivering 1GHz+ performance. The implementation builds upon the inherent low power capabilities of TSMC40LP process through a strong design focus on reducing leakage and dynamic power while maintaining high performance levels.

The hard macro implementations include ARM AMBA-compliant high performance system components to maximize data traffic speed and minimize power consumption and silicon area. Each Cortex-A9 hard macro implementation also includes the CoreSight™ Program Trace Macrocell (PTM) that provides full visibility into the processor’s instruction flow enabling the software community to develop code for optimal performance. Also included within the macro is the ARM high performance L2 cache controller supporting configurations between 128K and 8M of L2 cache memory.

Architecture ARMv7-A Cortex
Dhrystone Performance 2.50 DMIPS/MHz per core
Multicore 1-4 cores
Single core version also available
ISA Support
Memory Management Memory Management Unit
Debug and Trace CoreSight™ DK-A9 (available separately)
CoreSight™ SOC-400 (available separately)

Cortex-A9 Features
Thumb-2 Technology

Delivers the peak performance of traditional ARM code while also providing up to a 30% reduction in memory required to store instructions.

Cortex-A9 NEON Media Processing Engine (MPE)
The Cortex-A9 MPE, used with either of the Cortex-A9 processors, provides an engine that offers both the performance and functionality of the Cortex-A9 Floating-Point Unit plus an implementation of the NEON Advanced SIMD instruction set for further acceleration of media and signal processing functions. The MPE extends the Cortex-A9 processor's floating-point unit (FPU) to provide a quad-MAC and additional 64-bit and 128-bit register set supporting a rich set of SIMD operations over 8, 16 and 32-bit integer and 32-bit Floating-Point data quantities every cycle.
Cortex-A9 Floating-Point Unit (FPU)
When implemented along with the Cortex-A9, the FPU provides high-performance single, and double precision Floating-Point instructions compatible with the ARM VFPv3 architecture that is software compatible with previous generations of ARM Floating-Point coprocessor.
Optimized Level 1 Caches Performance and power optimized L1 caches combine minimal access latency techniques to maximize performance and minimize power consumption. Also providing the option for cache coherence for enhanced inter-processor communication or support of rich SMP capable OS for simplified multicore software development
Optional Level 2 Cache Controller Providing low latency and high bandwidth access to up to 8 MB of cached memory in high frequency designs, or design needing to reduce the power consumption associated with off chip memory access

Advanced Multicore Technologies
Snoop Control Unit

The SCU is the central intelligence in the ARM multicore technology and is responsible for managing the interconnect, arbitration, communication, cache-2-cache and system memory transfers, cache coherence and other capabilities for all multicore technology enabled processors. The Cortex-A9 MPCore processor can also expose these capabilities to other system accelerators and non-cached DMA driven mastering peripherals to increase the performance and reduce the system wide power consumption by sharing access to the processor's cache hierarchy. This system coherence also reduces the software complexity involved in otherwise maintaining software coherence within each OS driver.

Accelerator Coherency Port This AMBA® 3 AXI™ compatible slave interface on the SCU provides an interconnect point for a range of system masters that for overall system performance, power consumption or reasons of software simplification are better interfaced directly with the Cortex-A9 MPCore processor. This interface acts as a standard AMBA 3 AXI slave, and supports all standard read and write transactions without any additional coherence requirements placed on attached components. However, any read transactions to a coherent region of memory will interact with the SCU to test whether the required information is already stored within the processor L1 caches. If it is, it returns directly to the requesting component. If it missed in the L1 cache, then there is also the opportunity to hit in L2 cache before forwarding it to the main memory. Write transactions to any coherent memory region, the SCU will enforce coherence before forwarding the write to the memory system. The transaction may also optionally allocate into the L2 cache hence removing the power and performance impact of writing directly through to the off chip memory.
Generic Interrupt Controller Implementing the standardized and architected interrupt controller, the GIC provides a rich and flexible approach to inter-processor communication and the routing and prioritization of system interrupts. Supporting up to 224 independent interrupts, under software control, distribution of each interrupt across CPUs is made possible. Interrupts are hardware prioritized and can be routed between the operating system and TrustZone software management layer.
TrustZone® Technology Ensures reliable implementation of security applications ranging from digital rights management to electronic payment. Broad support from technology and industry partners
Jazelle RCT and DBX Technology Provides up to 3x reduction on code size for Just-in-time (JIT) and ahead-of-time compilation of bytecode languages while also supporting direct byte code execution of Java instructions for acceleration in traditional virtual machines

The Cortex-A9 processor is commonly integrated with other IP blocks as the center piece of many next-generation devices.

System IP

System IP components are essential for building complex system on chips and by utilizing System IP components developers can significantly reduce development and validation cycles, saving cost and reducing time to market.

Description AMBA Bus System IP Components
Advanced AMBA 3 Interconnect IP AXI NIC-400/301
DMA Controller AXI DMA-330
Level 2 Cache Controller AXI L2C-310
Dynamic Memory Controller AXI DMC-340
DDR2/DDR3/DDR3L/LPDDR2 Dynamic Memory Controller AXI DMC-400
Static Memory Controller AXI SMC-35x
TrustZone Address Space Controller AXI TZC-380
CoreSight™ Design Kit Multiple CDK-A9
CoreSight™ SoC Multiple CoreSight SoC-400
System MMU (using shadow page tables) AXI MMU-400

Media Processors
The Mali™ family of products combine to provide the complete graphics stack for all embedded graphics needs, enabling device manufacturers and content developers to deliver the highest quality, cutting edge graphics solutions across the broadest range of consumer devices.
Mali-400 GPU World's first OpenGL ES 2.0 conformant multi-core GPU provides 2D and 3D acceleration with performance scalable up to 1080p resolution
Mali-T624 GPU The ARM Mali™-T624 GPU is the second generation of the Midgard family of GPUs which bring together market leading graphics and GPU Compute functionality. In addition to an increase in performance it also extends the API support to include the recently announced Khronos® OpenGL® ES 3.0 and OpenCL™ Full Profile the first for mobile devices.

Physical IP
ARM® Physical IP Platforms deliver process optimized IP, for best-in-class implementations of the Cortex-A9 processor at 40nm and below.
Standard Cell Logic Libraries Available in a variety of different architectures ARM Standard Cell Libraries support a wide performance range for all types of SoC designs. Designers can choose between different libraries and optimize their designs for speed, power and/or area
Memory Compilers and Registers A broad array of silicon proven SRAM, Register File and ROM memory compilers for all types of SoC designs ranging from performance critical to cost sensitive and low power applications.
Interface Libraries A broad portfolio of silicon-proven Interface IP designed to meet varying system architectures and standards. General Purpose I/O, Specialty I/O, High Speed DDR and Serial Interfaces are optimized to deliver high data throughput performance with low pin counts.

Development Tools for Cortex-A9

ARM DS-5 Development Studio supports all ARM processors and IP, including the Cortex-A9, providing a comprehensive set of software tools to create, debug and optimize systems.

It incorporates DS-5 Debugger, whose powerful and intuitive graphical environment enables fast debugging of bare-metal, Linux and Android native applications. DS-5 Debugger provides pre-defined configurations for Fixed Virtual Platforms (built on ARM Fast Models technology) and ARM Versatile Express boards, enabling early software development before silicon availability.

Included in DS-5 is a Cortex-A9 Fixed Virtual Platform, for bare-metal, kernel and user-space software development ahead of silicon availability.

In addition, Streamline performance analyzer simplifies the identification of hot spots in software and load balancing between cores and clusters with a brilliantly intuitive graphical display.

ARM Compiler 5 includes specific optimizations for the Cortex-9 processor, enabling code generation from the earliest stages of a project and is included in DS-5.

Cortex-A9 Powered
Go Left
Go Right


Help Us Serve You Better

Please help us understand how you use ARM.com by completing this survey. Your input will enable us to improve your overall website experience. Thank you!


We use cookies to give you the best experience on our website. By continuing to use our site you consent to our cookies.

Change Settings

Find out more about the cookies we set