• @TropicalDingdong@lemmy.world
    link
    fedilink
    English
    158 hours ago

    288 GB HBM4 memory

    jfc…

    Looking at the spec’s… fucking hell these things probably cost over 100k.

    I wonder if we’ll see a generational performance leap with LLM’s scaling to this much memory.

    • @AliasAKA@lemmy.world
      link
      fedilink
      English
      10
      edit-2
      7 hours ago

      Current models are speculated at 700 billion parameters plus. At 32 bit precision (half float), that’s 2.8TB of RAM per model, or about 10 of these units. There are ways to lower it, but if you’re trying to run full precision (say for training) you’d use over 2x this, something like maybe 4x depending on how you store gradients and updates, and then running full precision I’d reckon at 32bit probably. Possible I suppose they train at 32bit but I’d be kind of surprised.

      Edit: Also, they don’t release it anymore but some folks think newer models are like 1.5 trillion parameters. So figure around 2-3x that number above for newer models. The only real strategy for these guys is bigger. I think it’s dumb, and the returns are diminishing rapidly, but you got to sell the investors. If reciting nearly whole works verbatim is easy now, it’s going to be exact if they keep going. They’ll approach parameter spaces that can just straight up save things into their parameter spaces.

      • in_my_honest_opinion
        link
        fedilink
        English
        45 hours ago

        Sure, but giant context models are still more prone to hallucination and reinforcing confidence loops where they keep spitting out the same wrong result a different way.

    • in_my_honest_opinion
      link
      fedilink
      English
      45 hours ago

      Fundamentally no, linear progress requires exponential resources. The below article is about AGI but transformer based models will not benefit from just more grunt. We’re at the software stage of the problem now. But that doesn’t sign fat checks, so the big companies are incentivized to print money by developing more hardware.

      https://timdettmers.com/2025/12/10/why-agi-will-not-happen/

      Also the industry is running out of training data

      https://arxiv.org/html/2602.21462v1

      What we need are more efficient models, and better harnessing. Or a different approach, reinforced learning applied to RNNs that use transformers has been showing promise.

      • @TropicalDingdong@lemmy.world
        link
        fedilink
        English
        2
        edit-2
        5 hours ago

        Yeah I’ve read that before. I don’t necessarily agree with their framework. And even working within their framework, this article is about a challenge to their third bullet.

        I’m just not quite ready to rule out the idea that if you can scale single models above a certain boundary, you’ll get a fundamentally different/ novel behavior. This is consistent with other networked systems, and somewhat consistent with the original performance leaps we saw (the ones I think really matter are ones from 2019-2023, its really plateaued since and is mostly engineering tittering at the edges). It genuinely could be that 8 in a MoE configuration with single models maxing out each one could actually show a very different level of performance. We just don’t know because we just can’t test that with the current generation of hardware.

        Its possible there really is something “just around the corner”; possible and unlikely.

        What we need are more efficient models, and better harnessing. Or a different approach, reinforced learning applied to RNNs that use transformers has been showing promise.

        Could be. I’m not sure tittering at the edges is going to get us anywhere, and I think I would agree with just… the energy density argument coming out of the dettmers blog. Relative to intelligent systems, the power to compute performance (if you want to frame it like that) is trash. You just can’t get there in computation systems like we all currently use.

    • @boonhet@sopuli.xyz
      link
      fedilink
      English
      4
      edit-2
      7 hours ago

      LLMs can already use way more I believe, they don’t really run them on a single one of these things.

      The HBM4 would likely be great for speed though.

    • @panda_abyss@lemmy.ca
      link
      fedilink
      English
      26 hours ago

      Yeah they’re going to cost as much as a house.

      I think we’ll see much larger active portions of larger MOEs, and larger context windows, which would be useful.

      The non LLM models I run would benefit a lot from this, but I don’t know of I’ll ever be able to justify the cost of how much they’ll be.