HNNewShowAskJobs
Built with Tanstack Start
Tosijs-schema is a super lightweight schema-first LLM-native JSON schema library(npmjs.com)
37 points by podperson 7 hours ago | 24 comments
  • kevmo3142 hours ago

    > For large arrays (>97 items) and large dictionaries

    How did we end up in a world where 97 items is considered large?

    • vages9 minutes ago |parent

      Mind your off-by-1s: 97 items is not large, 98 is.

  • podperson7 hours ago

    I wrote this library this weekend after realizing that Zod was really not designed for the use-cases I want JSON schemas for: 1) defining response formats for LLMs and 2) as a single source of truth for data structures.

    • taveras2 hours ago |parent

      Happy to see more tools in the data schema space.

      Will you support Standard Schema (https://standardschema.dev)? How does this compare to typebox (https://github.com/sinclairzx81/typebox)?

    • 7thpower7 hours ago |parent

      What led you to that conclusion?

      • dsabanin6 hours ago |parent

        Zod's validation errors are awful, the json schema it generates for LLM is ugly and and often confusing, the types structures Zod creates are often unintelligible in the and there's even no good way to pretty print a schema when you're debugging. Things are even worse if you're stuck with zod/v3

        • sesm5 hours ago |parent

          What's wrong with Zod validation errors?

        • light_hue_14 hours ago |parent

          None of this makes a lot of sense. Validation errors are largely irrelevant for LLMs and they can understand them just fine. The type structure looks good for LLMs. You can definitely pretty print a schema at runtime.

          This all seems pretty uninformed.

      • nerdponx7 hours ago |parent

        And what makes this different? What makes it LLM-native?

        • podperson5 hours ago |parent

          It generates schemas that are strict by default while Zod requires you to set everything manually.

          This is actually discussed in the linked article (READ ME file).

          • halayli5 hours ago |parent

            That's not true based on zod docs. https://zod.dev/api?id=objects

            most of the claims you're making against zod is inaccurate. the readme feels like false claims by ai.

            • podperson3 hours ago |parent

              It seems to be true to me. And aside from the API stuff (because I am far from an expert user of Zod) all of this has been carefully verified.

      • podperson5 hours ago |parent

        1. Zoe’s documentation, such as it is 2. Code examples

  • bbminner5 hours ago

    While llms accept json schemas for constrained decoding, they might not respect all of the constraints.

  • yunohn4 hours ago

    > It checks a fixed sample of items (roughly 1%) regardless of size

    > This provides O(1) performance

    Wouldn’t 1% of N still imply O(N) performance?

    • podperson3 hours ago |parent

      N is increasing. O(1) means constant (actually capped). We never check more than 100 items.

      • SkiFire132 hours ago |parent

        Then it's not 1%, because if you have 100k items and you check at most 100 you have checked at most 0.1% of items.

  • _heimdall6 hours ago

    Had you considered using something like XML as the transport format rather than JSON? If the UX is similar to zod it wouldn't matter what the underlying data format is, and XML is meant to support schemas unlike JSON.

    • podperson5 hours ago |parent

      JSON Schema is a schema built on JSON and it’s already being used. Using XML would mean converting the XML into JSON schema to define the response from the LLM.

      That said, JSON is “language neutral” but also super convenient for JavaScript developers and typically more convenient for most people than XML.

      • _heimdall2 hours ago |parent

        Maybe I missed a detail here, sorry if that's the case!

        Why would we need to concert XML, which already supports schemas and is well understood by LLMs, back to JSON schema?

        • verdverman hour ago |parent

          Because most of the world uses JSON and has rich tooling for JSONSchemas, notable many LLM providers allow JSONSchemas to be part of the request when trying to get structured output

      • yeasku4 hours ago |parent

        LLMs are not people.

        We want a format for LLMs or for people?

        • drowsspa3 hours ago |parent

          As a person myself, I very much prefer JSON

        • podperson4 hours ago |parent

          JSON schema is very human readable.