HNNewShowAskJobs
Built with Tanstack Start
Impact of PCIe 5.0 Bandwidth on GPU Content Creation and LLM Performance(pugetsystems.com)
43 points by zdw 4 days ago | 20 comments
  • threeducks2 days ago

    Testing PCIe bandwidth with a model that fits entirely into VRAM (quantized Phi-3 Mini at 2.39 GB on RTX 5090 with 32GB VRAM) is stupid because there won't be any memory transfer over PCIe beyond the initial model load. They should have tested a large MoE model like Qwen3-235B-A22B-GGUF, where the difference will be huge.

    • sitkacka day ago |parent

      Yeah, I would have expected more out of Puget Systems, they know better.

  • dweekly2 days ago

    Fascinating and (to me) unintuitive result that these cards don't seem bandwidth constrained at PCI 4x16; the update to PCI 5x16 doesn't seem like it has any measurable impact on performance, allowing a PCI 5x8 configuration that "saves" some of your lanes at seemingly no penalty.

    I wonder if we will have to wait another generation of cards (and apps) to make full use of a PCI 5x16 connection?

    • pella2 days ago |parent

      If VRAM is low, data traffic between system memory and GPU VRAM increases greatly, and the PCIe link can become the bottleneck, especially if the GPU runs at x8. In extreme cases, performance may drop to ~ one eighth. ( 24.6 FPS -> 3.3 FPS )

        "Dragon Age: The Veilguard" - 2560 x 1440:
        - RTX 5060 Ti 8 GB on PCIe 5.0: 24.6 FPS
        - RTX 5060 Ti 8 GB on PCIe 4.0:  3.3 FPS   !!!!
      
      https://www.computerbase.de/artikel/grafikkarten/nvidia-gefo...
      • crote2 days ago |parent

        The irony is that it's the budget cards which have fewer PCIe links. That RTX 5060 Ti 8GB only has 8 lanes, but could really benefit from having 16 lanes. An RTX 5090 with 32GB of VRAM? It has 16 lanes, but would do just fine with 8...

      • Dylan168072 days ago |parent

        That has to be a bug.

        • Quarrel2 days ago |parent

          More likely, just lazy optimisation.

          • keyringlight2 days ago |parent

            From earlier in the article they say this testing is with ultra settings, and for a modern game it would be expected that they're creating content that shows off well at 4k resolution and easily push past what 8GB VRAM can contain. The PCIe5 system is still affected, it's just affected less. There's probably tricks they can use to stream detail levels in/out of VRAM depending on how objects are displayed on screen (like the early direct storage+sampler feedback demos), but that's more work to do effectively and would still put pressure on PCIe and storage being fast enough to keep up.

            I'd say it comes down to players picking appropriate settings for their system, and it being good if the developer can can provide the information on the consequences of different settings (showing VRAM usage), or a warning if you're below their required/recommended system.

    • ksec2 days ago |parent

      That has been the case some what for many years. It takes at least two generation for the GPU bandwidth usage to catch up. And may be we are finally arriving at a plateau with PCIe 5.0 x8 and x16.

      So next gen GPU could work with PCIe 6.0 x8 instead. We are looking at PCIe 7.0 devices hopefully by 2029. Plenty of headroom, possibly cost reduction as well.

      • FloatArtifact2 days ago |parent

        Can you comment why you think there's a cost reduction?

        • ksec2 days ago |parent

          Purely in terms of BOM, both on the GPU and packaging. Saving Die Space, PCB, Less lane, less testing etc. However the actual cost of implementing PCIe 6 and 7 will be also higher so both will likely cancel out.

    • zh32 days ago |parent

      I have noticed that for local AI at least, there is little difference between an old system (i3770/Asus P8Z77WS from 2012) which has 4 PCIe 2.0 x16 slots (supports 16/16, 16/8/8 or 8/8/8/8) running a pair of RTX3090s and up to date motherboards (with AMD9900x). Interesting to see a benchmark that bears this out - though for anyone who switches models a lot I'd assume a new system would be much quicker at that (from bandwidth improvements with newer CPUs,, nVME vs SATA and DDR5 vs DDR3).

    • 2 days ago |parent
      [deleted]
    • simooooo2 days ago |parent

      If you can use bifurcation I guess there’s scope to fully utilise it

  • sitkack2 days ago

    Really too bad that they ran only one test, and it was a small model.

  • Calwestjobs2 days ago

    i can not see graphs

    reason: "DataTables warning: table id=table_5 - Ajax error. For more information about this error, please see http://datatables.net/tn/7"

    • zh32 days ago |parent

      Click it enough times and the error goes away (howeever it looks like the tables further down the page are then broken).

    • ksec2 days ago |parent

      Confirmed on All three Browser, Chrome, Firefox and Safari.

    • jmrm2 days ago |parent

      Happened the same to me

  • bewd552 days ago

    [dead]