Login

Cortex-A53 Processor

Cortex-A53 Processor Image (View Larger Cortex-A53 Processor Image)
The ARM® Cortex®-A53 processor is an extremely power efficient ARMv8 processor capable of supporting 32-bit and 64-bit code seamlessly. It makes use of a highly efficient 8-stage in-order pipeline balanced with advanced fetch and data access techniques for performance. It fits in a power and area footprint suitable for entry level smartphones, at the same time, capable of delivering high aggregate performance in scalable enterprise systems via high core density.

It delivers significantly higher performance than the highly successful Cortex-A7, and is capable of deployment as a standalone applications processor or paired with the Cortex-A57 processor in a big.LITTLE configuration for optimum performance, scalability and energy efficiency.

 


The Cortex-A53 delivers 64b capability and significantly increases performance over Cortex-A7, in a footprint suited for low-cost applications like entry level smartphones. It is smaller and lower power than the Cortex-A9 processor yet delivers more performance on many of the key benchmarks. It is highly scalable, from a single multi-core CPU cluster, to a dual-cluster big.LITTLE CPU subsystem in combination with the Cortex-A57, to a multi-cluster enterprise system connected through AMBA5 CHI coherent interconnect technology. Full ARMv8 support in a small and highly configurable package means that the Cortex-A53 is highly attractive to a broad range of mobile, consumer, general purpose, and enterprise applications.

The Cortex-A53 processor:

  • Delivers the compute power of today’s high-end smartphone, in lowest power and area footprint, enabling all-day battery life for typical device uses;
  • Runs legacy ARM 32-bit applications efficiently;
  • Features cache coherent interoperability with ARM Mali™ family graphics processing units (GPUs) for GPU compute applications;
  • Connects seamlessly to AMBA interconnect for 16-core and 32-core configurations, delivering the most aggregate performance per Watt to enterprise applications that reach high performance by maximizing core count in a thermally constrained rack;
  • Offers optional reliability and scalability features for high-performance enterprise applications.

The Cortex-A53 processor delivers significantly more performance than its predecessors at a higher level of power efficiency, effectively taking the performance of the LITTLE core above that of the Cortex-A9 processor, which defines many popular high-end and mainstream mobile platforms. The Cortex-A53 is able to deliver significantly more performance than the current low-cost solution for entry-level mobile devices, and edges out Cortex-A9 at the same frequency. The performance graph below shows measured results running various Android™ benchmarks. 

Cortex-A53 normalized performance

The graph below shows the relative performance of the high efficiency product line within the Cortex-A family, compared with the Cortex-A9. The performance measurements below are based on specint2000, so the memory system, the integer pipeline, and even the floating point pipeline contribute to the delivered performance. The graph shows the performance of each CPU at 1GHz as well as the expected performance at observed frequencies in production devices and anticipated frequencies for the Cortex-A53. 

Cortex-A53 relative performance


Cortex-A53 MPCore
Architecture ARMv8-A
Multicore
  • 1-4X SMP within a single processor cluster
  • Multiple coherent SMP processor clusters through AMBA® 4 technology
ISA Support
  • AArch32 for full backward compatibility with ARMv7
  • AArch64 for 64b support and new architectural features
  • TrustZone® security technology
  • NEON™ Advanced SIMD
  • DSP & SIMD extensions
  • VFPv4 Floating point
  • Hardware virtualization support
Debug & Trace CoreSight™ DK-A53


Cortex-A53 Architectural Features
Feature Benefits AArch32 AArch64
ARM v8 architecture 64 and 32-bit execution states for scalable high performance Yes Yes
Hardware-accelerated cryptography 3x-10x better software encryption performance Useful for small granule decrypt/encrypt too small to efficiently offload to HW accelerator (e.g. https) Yes Yes
NEON technology Can accelerate multimedia and signal processing algorithms such as video encode/decode, 2D/3D graphics, gaming, audio and speech processing, image processing, telephony, and sound synthesis. Also useful in accelerating floating point code with SIMD execution. Yes Yes
Floating point unit Hardware support for floating point operations in half-, single- and double-precision floating point arithmetic.Now with IEE754-2008 enhancements. Yes Yes
Load Acquire, Store Release instructions Designed for C++11, C11, Java memory models. Improves performance of thread-safe code by eliminating explicit memory barrier instructions Yes Yes
Large Physical Address reach Enables the processor to access beyond 4GB of physical memory. Yes Yes
TrustZone® Technology Ensures reliable implementation of security applications ranging from digital rights management to electronic payment. Yes Yes
Hardware Virtualization Enables multiple software environments and their applications to simultaneously access the system capabilities Yes Yes
Automatic event signalling For power-efficient, high-performance spinlocks. Yes Yes
Double Precision Floating Point SIMD Allows SIMD vectorisation to be applied to a much wider set of algorithms (e.g. scientific / High Performance Computing (HPC) and supercomputer). No Yes
64-bit Virtual address reach Enables virtual memory beyond 4GB 32b limit. Important for modern desktop and server software using memory mapped file i/o, sparse addressing. No Yes
Larger register files 31 x 64b general purpose registers: increases performance, reduces stack use. Fewer stack spills, enabling more aggressive compilers. SIMD usable for more applications, e.g. HPC No Yes
Efficient 64-bit immediate generation Less need for literal pools No Yes
Large PC-relative addressing range (+/-4GB) for efficient data addressing within shared libraries and position-independent executables No Yes
Tagged Pointers Useful for dynamically typed languages such as Javascript, and for garbage collection No Yes
64k pages Reduce TLB miss rates and depth of page walks No Yes
New exception model Reduces OS and Hypervisor software complexity No Yes
Enhanced Cache management User space cache operations improve dynamic code generation efficiency, Data Cache Zero for fast clear No Yes


Cortex-A53 Microarchitectural Features
Feature Benefits
In-Order Pipeline Lower power consumption. Performance improvements are sought elsewhere in the design, e.g. the memory system and issue capability.
Increased dual-issue capability Increased peak instruction throughput via duplication of execution resources, and dual instruction decoders.
Power optimized L2 cache Efficiency optimized L2 cache design delivers lower latency and balances performance with efficiency.
512 entry main TLB Improved performance on code with complex memory access patterns, e.g. web browsing. Larger main TLB than Cortex-A7 and Cortex-A9.
Small, fast uTLBs 10 entry uTLB with an extremely short miss penalty to reload from the main TLB allows excellent performance in a small area and power footprint.
Advanced Branch Predictor 4Kbit Conditional Predictor, 256 entry indirect predictor increase branch hit rate.
64B cache lines Fully aligned with Cortex-A57 microarchitecture to simplify cache management software in big.LITTLE systems. 64B line sizes a good tradeoff for modern memory access patterns.
Non-blocking I-fetch with multi-line pre-fetch Increased instruction throughput across more types of benchmarks, from control code to processing intensive loops.
Dual identical ALU pipelines Increased opportunity to dual-issue instructions, at a small additional area.
64b store path Balances store bandwidth with dynamic power consumption, focused on a highly efficient design tradeoff.
Multi-stream pre-fetcher Greater data flow into the main datapath increases overall performance on a wide range of code.
Increased D-side throughput 3-outstanding load miss capability (per-core, excluding prefetches); 8-outstanding transactions (per-core)
Extensive power-saving features Heirarchical clock gating, power domains, advanced retention modes.


Cortex-A53 Advanced Multicore Features
The processor also utilizes the widely established ARM MPCore multicore technology, enabling performance scalability and control over power consumption to exceed the performance of today's comparable high-performance devices while remaining within tight mobile power constraints. Multicore processing provides the ability for any of the four component processors, within a cluster, to shut down when not in use, for instance when the device is in standby mode, to save power. When higher performance is required, every processor is in use to meet the demand while still sharing the workload to keep power consumption as low as possible.
Snoop Control Unit The SCU is responsible for managing the interconnect, arbitration, communication, cache-2-cache and system memory transfers, cache coherence and other capabilities for the processor. The Cortex-A53 MPCore processor also exposes these capabilities to other system accelerators and non-cached DMA driven peripherals to increase performance and reduce system wide power consumption. This system coherence also reduces software complexity involved maintaining software coherence within each OS driver.
Accelerator Coherence Port This AMBA 4 AXI™ compatible slave interface on the SCU provides an interconnect point for masters that are better interfaced directly with the Cortex-A53 processor. This interface supports all standard read and write transactions without additional coherence requirements. However, any read transactions to a coherent region of memory will interact with the SCU to test whether the information is already stored in the L1 caches. The SCU will enforce write coherence before the write is forwarded to the memory system and may allocate into the L2 cache, removing the power and performance impact of writing directly to off chip memory
Generic Interrupt Controller Implementing the standardized and architected interrupt controller, the GIC provides a rich and flexible approach to inter-processor communication and the routing and prioritization of system interrupts. Under software control, each interrupt can be distributed across CPU, hardware prioritized, and routed between the operating system and TrustZone software management layer. This routing flexibility and the support for virtualization of interrupts into the operating system, provides one of the key features required to enhance the capabilities of a solution utilizing a hypervisor.

The Cortex-A53 MPCore processor incorporates a broad range of ARM technology including System IP, Physical IP, and development tools that also provide support. A broad range of SoC and software design solutions, tools and services from the ARM Connected Community™ compliment this technology. That provides ARM Partners with a smooth path through the development, verification and production of full function, compelling devices while significantly reducing time-to-market.

System IP

The ARM ™ interconnect and memory controller IP addresses the critical challenge of efficiently moving and storing data between up to 16 Cortex-A50 MPCore processors, high performance media processors and dynamic memories to optimize the system performance and power consumption of the SoC. The CoreLink system IP enables SoC designers to maximize the utilization of system memory bandwidth and reduce static and dynamic latencies. While the ARM CoreSight technology provides complete on-chip debug and correlated, real-time trace visibility for all cores of the Cortex-A53 MPCore processor, reducing risk and speeding development of high quality multiprocessing software. The new AMBA® 4 Cache Coherent Network (CCN) provides Optimum system bandwidth and latency. The CCN provides AMBA 4 AXI™ Coherency Extensions (ACE) compliant ports for full coherency between multiple Cortex-A53 MPCore processors, better utilizing caches and simplifying software development. This feature is essential for high bandwidth applications including gaming, servers and networking that require clusters of coherent single and multicore processors. Combined with the ARM CoreLink network interconnect and memory controller IP, the CCN increases system performance and power efficiency.


Physical IP

ARM Physical IP Platforms deliver process optimized IP, for best-in-class implementations of the Cortex-A53 processor at 40nm and below. A set of high performance POP™ IP containing advanced ARM Physical IP for 28nm technologies supports the Cortex-A53, to enabling rapid development of leadership physical implementations. ARM is also working early to assure a roadmap to 20nm optimizations. POP IP supports the ARM strategy of offering specifically targeted Physical IP to enable Partners to achieve tuned implementations of ARM cores. ARM is uniquely able to design the optimization packs in parallel with the Cortex-A53 MPCore processor architecture, enabling the processor and physical IP combination to deliver workstation class performance in a mobile power envelope while facilitating rapid time-to-market.


Tools Support

The ARM Development Suite 5 (DS-5™) for ARMv8 fully supports all ARMv8 processors as well as a wide range of third party tools, operating systems and EDA flows. ARM DS-5 software development tools are unique in their ability to provide solutions that take full advantage of the complete ARM technology portfolio. The ARM Development Studio 5 (DS-5™) provides a complete range of software tools to create, debug and optimize systems based on the Cortex-A53 MPCore processor. It incorporates the DS-5 Debugger, whose powerful and intuitive graphical environment enables fast debugging of bare metal, Linux and Android native applications. In addition, its new ARM Streamline™ Performance Analyzer simplifies the identification of hot spots in software and load balancing between cores. The ARM Compiler, which already includes specific optimizations for the Cortex-A15 MPCore processor, enables early software development before silicon availability and an ARM Versatile™ Reference Virtual Platform built on ARM Fast Models technology.


 

 

Graphics Processors

The Mali™ family of products combine to provide the complete graphics stack for all embedded graphics needs, enabling device manufacturers and content developers to deliver the highest quality, cutting edge graphics solutions across the broadest range of consumer devices.


 

Support

ARM training courses and Active Assist on-site system-design advisory services enable licensees to integrate efficiently the Cortex-A53 MPCore processor into their design to realize maximum system performance with lowest risk and fastest time-to-market.

 


Maximise


Cookies

We use cookies to give you the best experience on our website. By continuing to use our site you consent to our cookies.

Change Settings

Find out more about the cookies we set