HNNewShowAskJobs
Built with Tanstack Start
Open (Apache 2.0) TTS model for streaming conversational audio in realtime(github.com)
47 points by SweetSoftPillow 4 days ago | 3 comments
  • ks20485 hours ago

    > Our work was heavily inspired by KyutaiTTS and Sesame

    I wish they’d describe the technical details of the differences between this and other TTS they were “inspired by”.

    So many projects like this, I will just have to assume they are vibe-coded clones to get some publicity unless there’s more technical details.

    • echelon2 hours ago |parent

      Sesame is an impressive real time conversational audio-to-audio model you can talk to on their website [1]. But it's closed source. They released some components, but nothing you could use to duplicate their work.

      Sesame is what this team (and lots of teams) want to build. I know another team trying to build a real time local NSFW girlfriend you can talk to. They're convinced they can reach $100M ARR quickly if they crack it and make it customizable.

      KyutaiTTS provides a lot of the ingredients for this work, but it isn't conditioned for audio to audio afaik or any of the streaming components.

      [1] https://app.sesame.com/

  • woodson4 hours ago

    Looks very similar to Kyutai’s models, given that it uses the same neural audio codec (Mimi) and Depformer module etc.