HNNewShowAskJobs
Built with Tanstack Start
Three types of LLM workloads and how to serve them(modal.com)
47 points by charles_irl 11 hours ago | 2 comments
  • ZsoltT39 minutes ago

    > we recommend using SGLang with excess tensor parallelism and EAGLE-3 speculative decoding on live edge Hopper/Blackwell GPUs accessed via low-overhead, prefix-aware HTTP proxies

    lord

  • rippeltippel5 hours ago

    > Gallia est omnis divisor in partes tres.

    OCD-driven fix: The correct Latin quote is "Gallia est omnis divisa in partes tres".