Categories: Russian

Sparse models and cheap SRAM for language models • The Register | #datingscams | #russianliovescams | #lovescams

As compelling as the leading large-scale language models may be, the fact remains that only the largest companies have the resources to actually deploy and train them at meaningful scale.

For enterprises eager to leverage AI to a competitive advantage, a cheaper, pared-down alternative may be a better fit, especially if it can be tuned to particular industries or domains.

That’s where an emerging set of AI startups hoping to carve out a niche: by building sparse, tailored models that, maybe not as powerful as GPT-3, are good enough for enterprise use cases and run on hardware that ditches expensive high-bandwidth memory (HBM) for commodity DDR.

German AI startup Aleph Alpha is one such example. Founded in 2019, the Heidelberg, Germany-based company’s Luminous natural-language model boasts many of the same headline-grabbing features as OpenAI’s GPT-3: copywriting, classification, summarization, and translation, to name a few.

The model startup has teamed up with Graphcore to explore and develop sparse language models on the British chipmaker’s hardware.

“Graphcore’s IPUs present an opportunity to evaluate the advanced technological approaches such as conditional sparsity,” Aleph Alpha CEO Jonas Andrulius said in a statement. “These architectures will undoubtedly play a role in Aleph Alpha’s future research.”

Graphcore’s big bet on sparsity

Conditionally sparse models — sometimes called mixture of experts or routed models — only process data against the applicable parameters, something that can significantly reduce the compute resources needed to run them.

For example, if a language model was trained in all the languages on the internet, and then is asked a question in Russian, it wouldn’t make sense to run that data through the entire model, only the parameters related to the Russian language, explained Graphcore CTO Simon Knowles, in an interview with The Register.

“It’s completely obvious. This is how your brain works, and it’s also how an AI ought to work,” he said. “I’ve said this many times, but if an AI can do many things, it doesn’t need to access all of its knowledge to do one thing.”

Knowles, who’s company builds accelerators tailored for these kinds of models, unsurprisingly believes they’re the future of AI. “I’d be surprised if, by next year, anyone is building dense-language models,” he added.

HBM-2 pricey? Cache in on DDR instead

Sparse language models aren’t without their challenges. One of the most pressing, according to Knowles, has to do with the memory. The HBM used in high-end GPUs to achieve the necessary bandwidth and capacities required by these models is expensive and attached to an even more expensive accelerator.

This isn’t an issue for dense-language models where you might need all of that compute and memory, but it poses a problem for sparse models, which favor memory over compute, he explained.

Interconnect tech, like Nvidia’s NVLink, can be used to pool memory across multiple GPUs, but if the model doesn’t require all that compute, the GPUs could be left sitting idle. “It’s a really expensive way to buy memory,” Knowles said.

Graphcore’s accelerators attempt to sidestep this challenge by borrowing a technique as old as computing itself: caching. Each IPU features a relatively large SRAM cache — 1GB — to satiate the bandwidth requirements of these models, while raw capacity is achieved using large pools of inexpensive DDR4 memory.

“The more SRAM you’ve got, the less DRAM bandwidth you need, and this is what allows us to not use HBM,” Knowles said.

By decoupling memory from the accelerator, it’s far less expensive — the cost of a few commodity DDR modules — for enterprises to support larger AI models.

In addition to supporting cheaper memory, Knowles claims the company’s IPUs also have an architectural advantage over GPUs, at least when it comes to sparse models.

Instead of running on a small number of large matrix multipliers — like you find in a tensor processing unit — Graphcore’s chips feature a large number of smaller matrix math units that can address the memory independently.

This provides greater granularity for sparse models, where “you need the freedom to fetch relevant subsets, and the smaller the unit you’re obliged to fetch, the more freedom you have,” he explained.

The verdict is still out

Put together, Knowles argues this approach enables its IPUs to train large AI/ML models with hundreds of billions or even trillions of parameters, at substantially lower cost compared to GPUs.

However, the enterprise AI market is still in its infancy, and Graphcore faces stiff competition in this space from larger, more established rivals.

So while development on ultra-sparse, cut-rate language models for AI are unlikely to abate anytime soon, it remains to be seen whether it’ll be Graphcore’s IPUs or someone else’s accelerator that ends up powering enterprise AI workloads. ®

Click Here For The Original Source.

. . . . . . .

admin

Next Vulnerability discovered in Apple M1 chip • The Register | #youtubescams | #lovescams | #datingscams »

Previous « UK looks set to host Eurovision as organisers say it cannot be held in Ukraine | #ukscams | #datingscams | #european

Published by

admin

2 years ago

Crypto Fraud on Rise Again, Here’s Why — TradingView News | #datingscams | #lovescams

Recently, SEC Chair Gary Gensler issued fresh warnings about cryptocurrencies amid Bitcoin's surge to a…

1 month ago

Europe

My aunt has fallen in love with a scammer | #ukscams | #datingscams | #european

Pay Dirt is Slate’s money advice column. Have a question? Send it to Athena here. (It’s anonymous!) Dear…

1 month ago

Phillippines

Hundreds rescued from love scam centre in the Philippines | #philippines | #philippinesscams | #lovescams

By Virma Simonette & Kelly Ngin Manila and Singapore14 March 2024Image source, Presidential Anti-Organized Crime…

1 month ago

South Africa

Locals alerted of online dating scams | #daitngscams | #lovescams

Technology has disrupted many aspects of traditional life. When you are sitting at dinner and…

1 month ago

South Africa

‘Ancestral spirits’ scam: Fake sangomas fleece victims of millions | #daitngscams | #lovescams

Reports of suicides, missing bodies, sexual kompromat and emptied bank accounts as fake sangomas con…

1 month ago

South Africa

SA woman loses R1.6m to Ugandan lover | #daitngscams | #lovescams

A South African woman has been left with her head in her hands after she…

1 month ago

Sparse models and cheap SRAM for language models • The Register | #datingscams | #russianliovescams | #lovescams

Graphcore’s big bet on sparsity

HBM-2 pricey? Cache in on DDR instead

The verdict is still out

Recent Posts

Crypto Fraud on Rise Again, Here’s Why — TradingView News | #datingscams | #lovescams

My aunt has fallen in love with a scammer | #ukscams | #datingscams | #european

Hundreds rescued from love scam centre in the Philippines | #philippines | #philippinesscams | #lovescams

Locals alerted of online dating scams | #daitngscams | #lovescams

‘Ancestral spirits’ scam: Fake sangomas fleece victims of millions | #daitngscams | #lovescams

SA woman loses R1.6m to Ugandan lover | #daitngscams | #lovescams