Compute efficient LLMs which could handle heavy task running on small PCs in 3 to 5 years

AIEnergyMay 15, 2029May 15, 2026•Dmytro Bavykin

Main link

Attention is All You Need

research.google

Do you remember the times when computer games were less hungry in consuming storage. The assets were compressed and they were loaded into RAM during the boot.

Nowadays, it seems to be a nothing special for a game to consume 60GB+, and we do not count game updates or add-ons. The partially happened due to lowering the costs of SSDs. Of course, until we recently had a spike in prices which affected multiple areas - SSD, RAM, GPU.

The headlines around it were translating the same message - demand for hardware is so huge. But when the time to pay has come it became obvious no one could afford such a deal. That is where it all may have a different direction.

Imagine you can afford only a small commercial place to start building your coffee house. What are your options?

You can take a loan to go for a bigger place or you can try optimising square meters within your budget. Less noisy furniture, more pragmatical staff. Same, in my opinion, should be applied soon to LLMs. Current demand dictates a much bigger need in compute for AI, not event saying yet about AGI.

Compute comes together with energy consumption and cooling. Better performance could be achieved not only by increasing compute power, but by improving efficiency of current algorithms.

Less data centers to build, less cooling systems to maintain, less GPUs to buy. Also, investors would fed up at some point of having bloated CAPEXes. Something similar happened already in 2017 with Google releasing a research paper for Transformer. This discovery became a starting point for modern LLM we have nowadays.

My prediction is - existing LLMs would help improve existing LLMs, reduce cost of running, reduce hardware need and would allow running LLMs on smaller devices.

Reviewable in 1085 days · May 15, 2029

Comments (1)

Loading comments...