Advantage of Non-Volatile Memory (NVM) for Edge AI

Ongoing innovations in semiconductor technology, algorithms, and data science are making it possible for a growing number of edge devices to incorporate some level of AI inference capabilities. Today we see this in computer vision applications such as object recognition, facial recognition, and image classification on products ranging from phones and laptops to security cameras. In industrial systems, inference makes it possible to predict equipment and allow robots to operate autonomously. For IoT and smart home products, AI inference makes it possible to monitor and respond to various sensor inputs in real time.

The lowest-cost processing solutions that support AI inference today are off-the-shelf single-chip microcontrollers used for IoT systems. Such chips combine general-purpose CPU, SRAM, and IO functions with non-volatile memory (NVM). However, these chips implement AI algorithms in software running on the CPU that can provide only modest performance and are only practical for basic inference. Scaling a single-chip solution to deliver high performance predictions is a challenge for designers.

Demanding inference algorithms in today’s solutions that require multiple teraflops of performance must use dedicated AI acceleration hardware. And to achieve the required performance while keeping energy consumption to a minimum, it must be manufactured in leading-edge processes. Indeed, there are many systems-on-chips (SoCs) on the market today with dedicated AI acceleration hardware developed in advanced process geometries that are quite efficient.

However, these are typically two-chip solutions with AI computing engines implemented on advanced processes (typically 22nm or less) and NVM devices implemented in older process technologies. This is because embedded flash does not shrink below 40nm; At 28nm, cost becomes prohibitive for most applications, so 28nm embedded flash does not exist. This means it is not possible to combine flash and high-performance inference engine in a single SoC.

For applications where cost is secondary to performance, such a two-chip solution may be viable (think products like autonomous cars that require massive AI models stored in solid state drives (SSD) and operate from DRAMs). However, for low-power edge AI products, the cost of a two-chip solution can be prohibitive. A two-chip solution also requires constant fetching from external memory, which increases latency and power consumption. Additionally, there are potential security risks with a two-chip solution because there is more potential to hack the system by reading or modifying the NVM via external buses.

The “holy grail” for low-cost, low-power systems is a single chip (SoC or MCU) that combines accelerators, NVM, SRAM, and IO on a single SoC. From a resource perspective, most small, low-power IoT and other AI edge applications do not require a two-chip solution. In these applications, AI models can be small enough to fit into the SoC’s internal NVM. Flash’s limited scalability alone presents a challenge.

A single chip solution will not only save costs; High bandwidth between memory and execution units and no need to carry weight across chip boundaries, it will be possible to achieve high performance and low power. And, since the AI ​​models in these applications are relatively small and not frequently updated, on-chip NVM cannot be used only for the traditional NVM function of code storage; It can also be used to hold AI weights and CPU firmware.

Today, AI weights and CPU firmware are read from on-chip SRAM. This approach has several disadvantages. First, storing weight in SRAM requires a larger SRAM array than would otherwise be necessary. This increases the cost because SRAM is inherently expensive and also increases the total die size, which further increases the cost. Additionally, since SRAM is a volatile memory technology, code must be loaded from external flash memory upon boot. No instant on.

Achieving a single chip solution with ReRAM

Resistive RAM (ReRAM or RRAM) is an innovative NVM technology that enables the vision of a low-cost, low-power single-chip solution for edge AI inference. ReRAM can scale to advanced processing nodes with the rest of the chip so it can be applied to advanced processing like AI engines.

ReRAM can be used to replace the large on-chip SRAM for storing AI weights and CPU firmware. Because the technology is non-volatile, there is no need to wait at boot time to load an AI model from an external NVM. It is denser than SRAM which makes it less expensive per bit than SRAM, so more memory can be integrated on-chip to support larger neural networks for the same size and cost. On-chip SRAM will still be required for data storage, the array will be smaller and the overall solution will be more cost-effective.

Weebit NVM Edge AI Image 1
While on-chip SRAM will still be required for data storage, replacing the large on-chip SRAM with ReRAM for storing AI weights and CPU firmware provides a smaller size and more cost-effective solution. (Image: Webit Nano)

With ReRAM, designers can implement single chip implementations of advanced AI in a single IC while saving die size and cost.

Looking Ahead: Future AI Architectures

As we look to the future, ReRAM will also be a building block for the future of edge AI: neuromorphic computing (also known as in-memory analog processing). In this paradigm, computation resources and memory stay in one place, so there is never any need to move weights. The neural network matrix becomes an array of ReRAM cells, and the synaptic weights become the conductance of NVM cells that perform multiplication operations.

Weebit NVM Edge AI Image 2
Future systems will mimic human brain behavior for fast real-time processing on massive amounts of data. (Image: Webit Nano)

Because ReRAM cells have physical and functional similarities to synapses in the human brain, it will be possible to simulate human brain behavior with ReRAM for fast real-time processing on large amounts of data. Such a solution would be orders of magnitude more energy-efficient than today’s neural network simulations on conventional processors. Weebit is working with numerous academic and business partners to advance this field.

Weebit Nano Eran Briman

Aaron Briman Weebit is the VP of Marketing and Business Development for Nano, deeply involved in the company’s ecosystem of partners and customers across various domains. He has nearly 30 years of experience in the semiconductor IP sector, including marketing, business development, engineering and engineering management roles. Eran has done B.Sc. in Electrical Engineering from Tel Aviv University and an MBA from Northwestern University’s Kellogg Business School.

Related Content:

Leave a Comment