I came across an IBM Research post recently (https://research.ibm.com/blog/why-von-neumann-architecture-is-impeding-the-power-of-ai-computing ). It turns out one of the biggest things holding back AI computing isn’t some exotic new problem—it’s a design choice from 1945. The culprit? The von Neumann architecture that’s powered nearly every computer.
The Problem: A Traffic Jam Inside Your Computer
Picture a brilliant chef (the processor) and a massive pantry (memory) separated by a narrow hallway. For each step of a recipe, the chef must walk down the hallway, grab a single ingredient, and walk back. This round trip is repeated thousands of times. That’s essentially what happens when AI models run on traditional computers.
In AI computing, the main energy drain isn’t the calculations—it’s moving data between memory and processor. The actual mathematical operations are relatively simple matrix multiplications. It’s the constant shuttling of billions of model weights that creates the bottleneck.
The 1945 Decision and Its Modern Consequences
The von Neumann architecture was revolutionary—separate processing and memory units connected by a bus. This brought enormous flexibility: you could design components independently, upgrade separately, and configure systems for different needs. This flexibility made perfect sense for general-purpose computing with diverse applications.
But AI workloads are different: repetitive operations with largely static data (model weights). A decade ago, this wasn’t a big issue because processors were less efficient relative to data transfer costs. But processors have gotten much faster while data transfer speeds improved more slowly. Now we have powerful processors sitting idle, waiting for data.
The scale makes this acute for AI. Large language models need billions of parameters loaded from memory—often distributed across multiple GPUs, worsening the data transfer problem. Training can take months and consume an enormous amount of energy, but inference has similar requirements, meaning this bottleneck affects every AI interaction.
Breaking Free: Industry-Wide Solutions
To overcome this bottleneck, companies and researchers are developing a diverse new generation of computer architectures.
Near-Memory Computing: Market leader NVIDIA is evolving the GPU. Their Grace Hopper Superchip tightly integrates a CPU and GPU with a high-speed connection, allowing them to share memory at extremely high bandwidth. This reduces the data-transfer penalty without a complete architectural overhaul.
Wafer-Scale Integration: Startup Cerebras has built the largest chip in the world. Their Wafer-Scale Engine is a single, massive processor with its memory distributed just microns away from the cores that need it. By keeping everything on one piece of silicon, they eliminate the processor-to-memory bottleneck entirely.
Custom Cloud Silicon: Major cloud providers are designing their own chips. Amazon’s Trainium and Microsoft’s Maia are custom-built AI accelerators optimized specifically for the workloads running in their massive data centers, giving them a performance and efficiency edge.
In-Memory and Neuromorphic Computing: As mentioned, companies like IBM are working on in-memory computing, where calculations happen inside the memory itself. Others, like Intel with its Loihi chip, are pursuing neuromorphic designs that mimic the structure of the human brain, where memory and computation are fundamentally intertwined.
Photonic Computing: Perhaps the most futuristic approach, companies like Lightmatter are building processors that compute with light instead of electrons. This promises calculations at the speed of light with near-zero energy cost for data movement, representing a radical departure from traditional designs.
Why Von Neumann Isn’t Going Away
Von Neumann architecture isn’t disappearing. For general-purpose computing, graphics processing, and high-precision operations, its flexibility remains unmatched. IBM researcher Geoffrey Burr compares it to “an all-purpose deli” that can switch between different orders, while specialized AI computing is like making “5,000 tuna sandwiches for one order.”
The future likely involves using the right tool for the right job—combining von Neumann and specialized processors. Even specialized AI chips include conventional hardware for high-precision operations. The goal isn’t replacement but creating specialized co-processors for AI workloads.
The End of an Assumption
For eighty years, the separation of memory and processing has been a fundamental assumption of computing. Now, the demands of AI are forcing us to question it. This moment is a powerful reminder that in technology, no solution is forever. The most groundbreaking ideas are often just waiting for the right problem to challenge them.