Login

ARM The Architecture For The Digital World  

Cortex-A9 Processor

Cortex-A9 Processor Image (View Larger Cortex-A9 Processor Image)
The ARM Cortex™-A9 processor provides unprecedented levels of performance and power efficiency making it an ideal solution for designs requiring high performance in low power or thermally constrained cost-sensitive devices. 

Available as either a single core or configurable multicore processor, with both synthesizable or hard-macro implementations available. This processor can scale across a wide variety of applications while enabling a consistent software investment across multiple markets.

 

 


The ARM Cortex-A9 processor delivers exceptional capabilities for less power than consumed by high performance compute platforms, including
  • Unrivalled performance with 2GHz typical operation with the TSMC 40G hard macro implementation 
  • Low power targeted single core implementations into cost sensitive devices
  • Scalable up to four coherent cores with advanced MPCore technology
  • Optional NEON™ media and/or floating point processing engine

Applications

The Cortex-A9 processors provide a scalable solution across a wide range of market applications from mobile handsets through to high-performance consumer and enterprise products by sharing the common requirements of:

  • Increased power efficiency with higher performance for lower power consumption;
  • Increased peak performance for most demanding applications;
  • Ability to share software and tool investments across multiple devices;

Introducing the Cortex-A9

The Cortex-A9 processors are the highest performance ARM processors implementing the full richness of the widely supported ARMv7 architecture. Designed around the most advanced, high efficiency, dynamic length, multi-issue superscalar, out-of-order, speculating 8-stage pipeline, the Cortex-A9 processors deliver unprecedented levels of performance and power efficiency with the functionality required for leading edge products across the broad range of consumer, networking, enterprise and mobile applications.

The Cortex-A9 microarchitecture is delivered within either a scalable multicore processor, the Cortex-A9 MPCore™ multicore processor, or as a more traditional processor, the Cortex-A9 single core processor. Supporting the configuration of 16, 32 or 64KB four way associative L1 caches, with up to 8MB of L2 cache through the optional L2 cache controller, the scalable multicore processor and the single processor provide the broadest flexibility and are each suited to specific applications and markets.

Download the Cortex-A9 whitepaper

 

The Cortex-A9 MPCore Multicore Processor

 

The Cortex-A9 MPCore multicore processor integrates the proven and highly successful ARM MPCore technology along with further enhancements to simplify and broaden the adoption of multicore solutions. The Cortex-A9 MPCore processor provides the ability to extend peak performance to unprecedented levels while also supporting design flexibility and new features to further reduce and control the power consumption at the processor and system level. Targeted implementations of the Cortex-A9 MPCore processor can also offer mobile devices increased peak performance over today’s solutions by utilizing the design flexibility and advanced power management techniques offered by the ARM MPCore technology to maintain operation within thermally constrained and the tight mobile power budgets. Using the scalable peak performance, this processor is able to exceed the performance of today’s comparable high performance embedded devices and brings a consistent software investment over an extended breadth of markets.

 

Cortex-A9 Single Core Processor

The Cortex-A9 processor provides unprecedented levels of performance and power efficiency making it an ideal solution for any design requiring high performance in a low-power, cost sensitive, single processor based device. Using a convenient synthesizable flow and IP deliverables, the Cortex-A9 processor provides an ideal upgrade path for existing ARM11™ processor-based designs that require higher performance and increased levels of power efficiency within a similar silicon cost and power budget while maintaining a compatible software environment. The Cortex-A9 single core processor provides dual low-latency Harvard 64-bit AMBA® 3 AXI™ master interfaces for independent instruction and data transactions and are capable of sustaining four double word writes every five processor cycles when copying data across a cached region of memory. 

 

Cortex-A9 Hard-macro Implementations for TSMC 40G

 

In addition to the single and multicore soft-macros, a popular dual-core configuration is also available as a hard-macro implementation for the TSMC 40G/GL process to minimize the time, risks and costs associated with bringing a high performance Cortex-A9 processor implementation to market.  Utilizing the optimized ARM Physical IP and advanced implementation techniques, this hard macro is made available as either a power optimized, or performance optimized implementation

Speed Optimized: The speed-optimized hard macro implementation provides system designers with an industry standard ARM processor incorporating aggressive low-power techniques to further extend ARM’s performance leadership into high-margin consumer and enterprise devices within the power envelope necessary for compact, high-density and thermally constrained environments. This hard macro implementation operates in excess of 2GHz when selected from typical silicon and represents an ideal solution for high-margin performance-oriented applications.

Power Optimized: In many thermally constrained applications such as set-top boxes, DTVs, printers and other feature-rich consumer and high-density enterprise applications, energy efficiency is of paramount importance. The Cortex-A9 power-optimized hard macro implementation delivers its peak performance of 4000 DMIPS while consuming less than 250mW per CPU when selected from typical silicon.

The hard macro implementations include ARM AMBA-compliant high performance system components to maximize data traffic speed and minimize power consumption and silicon area.  Each Cortex-A9 hard macro implementation also includes the CoreSight™ Program Trace Macrocell (PTM) which provides full visibility into the processor’s instruction flow enabling the software community to develop code for optimal performance. Also included within the macro is the ARM high performance L2 cache controller supporting configurations between 128K and 8M of L2 cache memory.

 


ARM Cortex-A9 Performance Power & Area

 

Cortex-A9 Single Core
Soft Macro Trial Implementation

 Cortex-A9 Dual Core
 Hard Macro Implementations

Process

TSMC 65G

  TSMC 40G

Optimization method 

 Performance Optimized

Performance Optimized 

Power Optimized 

Standard Cell Library

 ARM SC12

 ARM SC12 + High Performance Kit

ARM SC12 + High Performance Kit

Performance (Total DMIPS)

2,075 DMIPS

10,000 DMIPS

4,000 DMIPS

Frequency

830 MHz

2000 MHz (typical)

800 MHz (wc/ss)

Energy Efficiency (DMIPS/mW)

  5.2

 5.26

8.0

Total power at target frequency

0.4 W

1.9 W

0.5 W

Silicon Area

1.5 mm2 (excludes caches)

6.7 mm2
(including L1 parity
and all DFT/DFM)

4.6 mm2
(including all DFT/DFM)

 

 

 

Core area, frequency range and power consumption are heavily dependent on process, libraries and optimizations.

The numbers quoted above are illustrative of synthesized cores using general purpose process technologies and standard cell libraries and RAMs.

Area for single core processor does not include NEON™ or Floating Point Units. Benchmarks were measured with 64 Entry TLB, and 32KB I-Cache and 32KB D-Cache.

Area for dual-core multi-processor includes SCU / GIC and supporting logic. Both cores included NEON Engine support, 128 entry TLB, 32K I-Cache and 32KB D-Cache.

 


 Cortex-A9

Architecture ARMv7-A Cortex

Dhrystone Performance

 2.50 DMIPS/MHz per core
Multicore

1-4 cores
Single core version also available

ISA Support
Memory ManagementMemory Management Unit
Debug and TraceCoreSight™ DK-A9 (available separately)

 

 Cortex-A9 Key Features

TrustZone®TechnologyEnsures reliable implementation of security applications ranging from digital rights management to electronic payment. Broad support from technology and industry Partners
Thumb-2 TechnologyDelivers the peak performance of traditional ARM code while also providing up to a 30% reduction in memory required to store instructions.
Jazelle RCT and DBX TechnologyProvides up to 3x reduction on code size for Just-in-time (JIT) and ahead-of-time compilation of bytecode languages while also supporting direct byte code execution of Java instructions for acceleration in traditional virtual machines
Optimized Level 1 CachesPerformance and power optimized L1 caches combine minimal access latency techniques to maximize performance and minimize power consumption. Also providing the option for cache coherence for enhanced inter-processor communication or support of rich SMP capable OS for simplified multicore software development
Optional Level 2 Cache ControllerProviding low latency and high bandwidth access to up to 8 MB of cached memory in high frequency designs, or design needing to reduce the power consumption associated with off chip memory access

Advanced Multicore Technologies

Snoop Control UnitThe SCU is the central intelligence in the ARM multicore technology and is responsible for managing the interconnect, arbitration, communication, cache-2-cache and system memory transfers, cache coherence and other capabilities for all multicore technology enabled processors. The Cortex-A9 MPCore processor for the first time also exposes these capabilities to other system accelerators and non-cached DMA driven mastering peripherals so as to increase the performance and reduce the system wide power consumption by sharing access to the processor’s cache hierarchy. This system coherence also reduces the software complexity involved in otherwise maintaining software coherence within each OS driver.
Accelerator Coherence PortThis AMBA® 3 AXI™ compatible slave interface on the SCU provides an interconnect point for a range of system masters that for overall system performance, power consumption or reasons of software simplification are better interfaced directly with the Cortex-A9 MPCore processor. This interface acts as a standard AMBA 3 AXI slave, and supports all standard read and write transactions without any additional coherence requirements placed on attached components. However, any read transactions to a coherent region of memory will interact with the SCU to test whether the required information is already stored within the processor L1 caches. If it is, it is returned directly to the requesting component. If it missed in the L1 cache, then there is also the opportunity to hit in L2 cache before finally being forwarded to the main memory. Write transactions to any coherent memory region, the SCU will enforce coherence before the write is forwarded to the memory system. The transaction may also optionally allocate into the L2 cache hence removing the power and performance impact of writing directly through to the off chip memory
Generic Interrupt ControllerImplementing the standardized and architected interrupt controller, the GIC provides a rich and flexible approach to inter-processor communication and the routing and prioritisation of system interrupts. Supporting up to 224 independent interrupts, under software control, each interrupt can be distributed across CPU, hardware prioritised, and routed between the operating system and TrustZone software management layer. This routing flexibility and the support for virtualization of interrupts into the operating system, provides one of the key features required to enhance the capabilities of a solution utilizing a paravirtualization manager.

Advanced Optional Technologies

Cortex-A9 NEON Media Processing Engine (MPE)The Cortex-A9 MPE can be used with either of the Cortex-A9 processors and provides an engine that offers both the performance and functionality of the Cortex-A9 Floating-Point Unit plus an implementation of the NEON Advanced SIMD instruction set for further acceleration of media and signal processing functions. The MPE extends the Cortex-A9 processor’s floating-point unit (FPU) to provide a quad-MAC and additional 64-bit and 128-bit register set supporting a rich set of SIMD operations over 8, 16 and 32-bit integer and 32-bit Floating-Point data quantities every cycle.
Cortex-A9 Floating-Point Unit (FPU)When implemented along with either of the Cortex-A9 processors, the FPU provides high-performance single, and double precision Floating-Point instructions compatible with the ARM VFPv3 architecture that is software compatible with previous generations of ARM Floating-Point coprocessor.  

 


As the centrepiece of many next-generation devices the Cortex-A9 processor is commonly integrated with many other IP blocks.

System IP

System IP components are essential for building complex system on chips and by utilizing System IP components developers can significantly reduce development and validation cycles, saving cost and reducing time to market.

DescriptionAMBA BusSystem IP Components
Advanced AMBA 3 Interconnect IP

AXI

NIC-301, PL301

DMA Controller

AXI

DMA-330 , PL330

Level 2 Cache Controller

AXI

L2C-310 , PL310

Dynamic Memory Controller

AXI

DMC-340 , PL340

DDR2 Dynamic Memory Controller

AXI

DMC-342

Static Memory Controller

AXI

SMC-35x , PL35x

TrustZone Address Space Controller

AXI

PL380

CoreSight™ Design Kit

ATB

CDK-11

Media Processors
The Mali™ family of products combine to provide the complete graphics stack for all embedded graphics needs, enabling device manufacturers and content developers to deliver the highest quality, cutting edge graphics solutions across the broadest range of consumer devices.
Mali-400 GPUWorld's first OpenGL ES 2.0 conformant multi-core GPU provides 2D and 3D acceleration with performance scalable up to 1080p resolution
Mali-200 GPUHigh performance graphical processor providing advanced 2D and 3D acceleration. Supports OpenGL ES 2.0

 

Physical IP

ARM® Physical IP Platforms deliver process optimized IP, for best-in-class implementations of the Cortex-A9 processor at 40nm and below.
Standard Cell Logic LibrariesAvailable in a variety of different architectures ARM Standard Cell Libraries support a wide performance range for all types of SoC designs. Designers can choose between different libraries and optimize their designs for speed, power and/or area
Memory Compilers and RegistersA broad array of silicon proven SRAM, Register File and ROM memory compilers for all types of SoC designs ranging from performance critical to cost sensitive and low power applications.
Interface LibrariesA broad portfolio of silicon-proven Interface IP designed to meet varying system architectures and standards. General Purpose I/O, Specialty I/O, High Speed DDR and Serial Interfaces are optimized to deliver high data throughput performance with low pin counts.

 

Tools Support

All ARM processors are supported by the ARM Development Studio 5 (DS-5™) tool suite, as well as a wide range of third party tools, operating system and EDA vendors. ARM DS-5 software development tools are unique in their ability to provide solutions that take full advantage of the complete ARM technology portfolio.

 


» 
Latest Forum Posts
 
» 
Cortex-A9 Powered
Go Left
Go Right

Maximise