Login

big.LITTLE Processing

big.LITTLE Processing Image
ARM® big.LITTLE™ processing is a power-optimization technology where high performance ARM CPUs are combined with the most efficient ARM CPUs to deliver peak performance capacity, higher sustained performance, and increased parallel processing performance, at significantly lower average power. The latest big.LITTLE software and platforms can save 75% of CPU energy in low to moderate performance scenarios, and can increase performance by 40% in highly threaded workloads. The underlying big.LITTLE software automatically moves workloads to the appropriate CPU based on performance needs, in microseconds, so quickly that it is completely seamless to the user. ARM big.LITTLE technology enables mobile SoCs to be designed for new levels of peak performance, in the same all day battery life users expect.  

This page contains more detailed information and links to further resources on the technology. The Think big.LITTLE microsite contains information on current partner implementations and a registration page for future updates on the technology.

 


Background

The performance demanded by smartphones and tablets is increasing at a much faster rate than the capacity of batteries or the semiconductor process node power savings. At the same time, users are demanding longer battery life within roughly the same form factor. This conflicting set of demands requires innovations in mobile SoC design beyond what process technology and traditional power management techniques can deliver. big.LITTLE is one of many power management technologies employed to save power in mobile SoCs – it works in tandem with Dynamic Voltage and Frequency Scaling (DVFS), clock gating, core power gating, retention modes, and thermal management to deliver a full set of power control for the SoC.

big.LITTLE technology takes advantage of the fact that the usage pattern for smartphones and tablets is dynamic: Periods of high processing intensity tasks, such as initial web page rendering and game physics calculation, alternate with typically longer periods of low processing intensity tasks such as scrolling or reading a web page, waiting for user input in a game, and lighter weight tasks like texting, e-mail and audio. The graph below shows the CPU utilization at various power states in a big.LITTLE SoC, with all the relevant power management techniques in operation. It clearly shows the usage of the big CPU cores in burst mode, or for short duration at peak frequency, while the majority of runtime is managed by LITTLE cores at moderate operating frequencies.

big Cores LITTLE Cores diagram

Innovative power savings techniques are required to sustain the pace of innovation in mobile through performance increases in the same power footprint. Many of the mobile use cases exhibit behavior like that shown in the graph above, presenting a very strong opportunity for big.LITTLE technology to save power while also delivering peak performance in modern mobile devices.

big.LITTLE Processing – How does it work?

The high performance and high efficiency CPU clusters are connected through a cache coherent interconnect fabric such as the ARM CoreLink™ CCI-400 . The processors look like one multicore CPU to the operating system. User space software on a big.LITTLE SoC is identical to the software that would run on a standard SMP processor. So how does the work get scheduled to the right processor?

ARM has developed a kernel space patch set gives the Operating System awareness of the big and LITTLE cores, and the ability to schedule individual threads of execution on the appropriate processor based on dynamic run-time behavior. The software also keeps track of load history for each thread that runs, and uses the history to anticipate the performance needs of a thread the next time it runs. This software is called Global Task Scheduling, and it is in operation on the use case graph above and in the measured results below, where big.LITTLE technology is delivering energy savings as high as 75% for the same or higher delivered performance.

CPU versus SoC power saving

Future Implementations

The latest CPUs from ARM, including the ARMv8 architecture-based Cortex®-A53 and Cortex-A57 processors, fully support big.LITTLE and can be deployed using similar software. The processors are connected using the same CoreLink CCI-400 cache coherent interconnect in big.LITTLE SoCs. The Global Task Scheduling software is written to allow a migration to ARMv8 architecture support, with updated ARMv8 architecture patch sets coming in time for SoC launches with the Cortex-A50 series processors.


Hardware Requirements

For a big.LITTLE processor to be invisible to software, the CPU subsystem must be fully cache coherent, and the big and LITTLE processors must be fully architecturally compatible - they must run all the same instructions and support the same extensions such as virtualization, large physical addressing, etc. ARM Cortex-A series processors are designed to meet these requirements in the recommended pairings below:

  1st Generation: ARMv7
(32-bit, 40-bit physical)
2rd Generation: ARMv8
(32-bit/64-bit)
High-performance CPU Cortex-A15 Cortex-A57
High-efficiency CPU Cortex-A7 Cortex-A53

In each of the combinations above, the high performance CPU cluster and the high efficiency cluster can each have a maximum of four cores. For many smartphone applications, one or two high performance cores are sufficient to handle the performance requirements. For higher end smartphone and tablet use cases, the software may be able to take greater advantage of four big and four LITTLE cores working together. In the Global Task Scheduling software, all of the processors in the big.LITTLE CPU subsystem can be operated at any time, providing the maximum opportunity for performance enhancement and power savings.

The big.LITTLE CPU system is built with the cache coherent interconnect, the global interrupt distributor, and other system IP components shown below.

big.LITTLE CPU system components


Software

big.LITTLE software automatically handles the allocation of threads of execution to the appropriate CPU. There are three different modes of operation for the software, all based on the same hardware description. The focus for ARM and silicon partners in future big.LITTLE systems is on the Global Task Scheduling model of software, where the operating system is directly aware of the high performance and high efficiency CPUs in the system. In that mode of operation, the software dynamically allocates each thread based on the performance required. The software uses a load history to remember previous performance demands, and a dynamic load tracker to adapt to runtime performance that may differ from the load history. The software reacts quickly to changes in load, and can move work to the big or LITTLE CPU cluster in less time than a DVFS state transition or SMP load balancing action – those activities happen invisibly in the background on every mobile device today.

The mechanics of the software operation are described in detail in white papers from ARM and ARM partners. The Global Task Scheduling software is available in the open source at the following repository: http://git.linaro.org/gitweb?p=arm/big.LITTLE/mp.git


Related Technology

CoreLink Cache Coherent Interconnect (CCI-400)

The ARM CoreLink™ CCI-400 Cache Coherent Interconnect provides full cache coherency between two clusters of multi-core CPUs, such as the ARM Cortex-A15, and Cortex-A7 processors enabling big.LITTLE.

The CoreLink CCI-400 enables system coherency in heterogeneous multicore and multi-cluster CPU/GPU systems, such as those required for the networking and high-performance computation markets, by enabling each processor in the system to access the other processor caches. This reduces the need to access off-chip memory, saving time and energy, which is a key enabler in systems based on ARM big.LITTLE™ processing.

CoreLink Cache Coherent Network (CCN-504)

The ARM CoreLink CCN-504 Cache Coherent Network offers scaling to 16 processor cores to give system architects an optimal solution for enterprise applications including servers and network infrastructure.

CoreLink CCN-504 can deliver up to one Terabit of usable system bandwidth per second. It will enable designers to provide high-performance, cache coherent interconnect for ‘many-core’ enterprise solutions built using the ARM Cortex-A15 MPCore processor and the latest ARM Cortex-A50 series processors with 64-bit support.

ARM Development Studio 5 (DS-5)

The ARM Development Studio 5 (DS-5™) toolchain is a suite of professional software development tools for ARM processors and extends its world-leading capabilities to the big.LITTLE performance analysis and debug.    

The DS-5™ toolchain enables engineers to develop robust and highly optimized embedded software for ARM application processors, and comprises tools such as the best-in-class ARM C/C++ Compiler, a powerful Linux/Android™/RTOS-aware debugger, the ARM Streamline™ system-wide performance analyzer and real-time system model simulators

ARM Fast Models  

ARM Fast Models provide the necessary models for constructing virtual platforms of ARM big.LITTLE processing-based systems along with templates of popular configurations. Customization of model content and configuration of items such as memory map and interrupt map, and the ability to export the platform to SystemC/TLM environments are supported.

Fast models are available for the Cortex-A15 and Cortex-A7 processors and the CoreLink CCI-400

  


Maximise


Cookies

We use cookies to give you the best experience on our website. By continuing to use our site you consent to our cookies.

Change Settings

Find out more about the cookies we set