Howdy! I've just written a blog post about this, and I figured I would share it here: https://smoores.dev/post/announcing_smoores_epub/. As I've been working on Storyteller[1], I've been developing a library for working with EPUB files, since that's a large amount of the work that Storyteller does. After a friend asked for advice on creating EPUB books in Node.js, I decided to publish Storyteller's EPUB library as a standalone NPM package. I really love the EPUB spec, and I think the Node.js developer community deserves an actively maintained library for working with it!
I feel stupid. I read “automatically syncing ebooks and audiobooks” and thought StoryTeller was a file synchronization service (like SyncThing) that for some reason only supported certain file types.
Maybe “syncing ebooks with audiobooks” would be clearer? Also entirely possible this is just a me problem, not a general one.
Really cool project!
This is like... comically hard to express succinctly haha. We've been having several conversations about how to explain it quickly to new folks. I like the phrase that the Readium folks use, "guided narration", but I don't know how useful that is for folks that aren't already familiar with what Storyteller does
“Aligning” seems a perfectly good word to me. It’s the technical term that gets used, and it’s not particularly overloaded so people aren’t going to expect it to mean something else, and it’s easy to demonstrate even pictorially.
(I would also use the term “synchronising” rather than “syncing”, because “syncing” has kinda developed a meaning of its own, even though it came from the same place. And use the word “with”, and try to use the words “text”. “Synchronising ebook text with audiobooks” is way better than “syncing ebooks and audiobooks”.)
Maybe you can use examples of the 'why'?
- Switch between reading and listening whenever you want
- Follow the text while listening to audiobooks
- Enjoy audiobooks while reading along with the text
Or even simpler:
- Read or listen - switch anytime
- See the words as you listen
- Listen while you read along
I thought the exact same thing, especially since one of the main selling points is around self-hosting which I thought was for the file syncing.
As someone who has built custom EPUB authoring scripts in Node.js for work and also quite likes the EPUB spec: cool! I expect I'll still keep using said scripts for work, but I have some side project ideas which would involve actually reading EPUB files too - I'll probably give this a spin for those.
Thanks! Looking forward to your feedback!
BTW, I've had this idea in my mind for a while, after slowly working through my e-library infrastructure.
Do you think it might be a good idea to set up a site to share the aligned overlays from Storyteller? This way, people won't have to waste CPU/GPU time re-aligning the same files over and over again.
It should be OK from a copyright perspective, as it won't be distributing any copyrighted material, only the media overlay information.
That’s a really interesting idea! The more I think about it, the more I like it.
A challenge I foresee is that the media overlays are only reusable if you have the exact same input EPUB file, and have processed it with Storyteller to mark up the sentence boundaries. EPUBs have unique identifiers, though, so maybe this would be fine! We’d need to add a new processing flow to Storyteller, but it should be doable.
Feel free to hit me up in the Storyteller chat if you want to discuss more! Thanks for sharing this idea!
It would be cool to do this with Project Gutenberg and LibriVox files, since they're all public domain works anyway.
The entire Great Books of Western Civilization are on both, and I know I'd make more progress on reading it if I could hand off between reading and listening more easily!
You could require that the input files have the same sha256 hash, that would presumably be more robust than trusting an ID from the file itself
Yeah I was toying around with that, too… but folks often mess around with metadata in tools like Calibre and Audiobookshelf in ways that wouldn’t have an impact on Storyteller’s sync, but would change their hash. On the other hand, I don’t know how various publishers handle EPUB dc:identifiers and that may not be robust enough, either. We could try doing something like hashing only the contents of spine items (including their file names, since that’s how media overlays refer to content)
I was going to suggest to use the same approach as the old CD tagging systems. Count the number of words in each chapter to create a "book fingerprint".
It's highly likely to be globally unique, and it can also help with the missing forewords/afterwords/bonus content sections.
In addition, you can also add fuzzy matching for the title.
I think that the thing we need to account for (which, number of words per chapter would capture this, I think) is different publications of the same book, which would need different overlays if they have different chapter filepaths, etc.
Extracting a library from a real world project is one of my favorite parts of software.
I'm sure the march of LLMs will continue eating into this pie, and that's a good thing (most of it is a distraction from the real work), but I love polishing a library on my laptop in a cafe. It's like working on a painting or something.
It was, actually, very enjoyable! When we pulled React ProseMirror[1] out of the NYT text editor, it was a pretty laborious process that we had to careful plan and execute for months, and we still ended up with an internal fork for a while.
By contrast, this was mostly just moving a file around and then writing documentation and cleaning up the public API. I rather enjoy thinking about and modeling library APIs in general, so I actually had a lot of fun with it!
[1]: https://www.npmjs.com/package/@nytimes/react-prosemirror
You mentioned readium in one of your comments. I’m curious why you didn’t use the readium spec and corresponding ecosystem. Coming in cold and just glanced at your blog post, and I see you’re using RN. There is a RN package that leverages readium under the hood for ebook reading (I know because I’m the author), don’t know if it’s a weird actually has the api you need.
EDIT: storyteller is super interesting. Dug around a bit in the code and see that you do seem to be using readium for some things, so I must just be missing some nuance.
Readium is fantastic, and Storyteller uses it basically whenever possible. But Readium is exclusively for reading EPUB contents, and doesn’t have any support for modifying or creating them, which is the primary purpose of this library!
Cool! I totally could have used this earlier this year... can't remember what for...
Interesting choice to publish from the storyteller "monorepo." Is that because it evolved in situ, and you've no impetus to incur the overhead of extraction?
Hahaha, well if it comes to you, the library will still be there for you :)
Right, this was actually just a file within the Storyteller web package to start. It was fairly well defined, and so pretty easy to pull out into another package in the monorepo, but Storyteller is the primary consumer at the moment, and I want to be able to develop them in sync. Plus, it provides a great test bed for development of the library!
edit: I forgot to mention that the eventual goal is to (hopefully!) publish this package as @storyteller/epub, along with any other packages that end up split out of Storyteller. That will probably include at least a @storyteller/synchronize and a @storyteller/cli.
Unfortunately, someone seems to have snagged the @storyteller org on NPM several years ago and left it to languish without really using it, so I'm waiting to see whether GitHub will consider this squatting and transfer the org to me.
I've also tried reaching out to the developer that owns the org, but they don't seem to have been active on GitHub or NPM for the past 5 years or so, and my only real strategy for reaching out to them was to open an issue on one of their other GitHub projects!
Storyteller seems pretty cool in general. Can it be used to host books for other people?
Thanks! Absolutely. You can invite users to your Storyteller server and give them whatever permissions are appropriate (e.g. you can choose whether they can only download books, or can also manage uploading and syncing books and/or managing users). It has SMTP support for emailing invites, or it can just generate invite links for you to share yourself.
More info here: https://smoores.gitlab.io/storyteller/docs/administering#inv...
Oh my! This looks very neat, and I’ve been working on something similar to Storyteller (i think): https://github.com/project-kiosk/kiosk
I don’t get around working on it right now, but maybe there’s something useful there for you.
Something I've been wondering: why do ebooks take so long to render? My kindle seems good at it, but opening an ebook in calibre/fbreader/etc can take minutes or even fail in some readers depending on the ebook.
I would guess there are multiple potential pitfalls here. Firstly, not all ebook formats are created equal -- Storyteller only operates on EPUB files, because EPUB is an open source format and it supports Media Overlays (read-aloud) natively. I can only really speak to that format, but there are others (MOBI, PDF, etc).
An EPUB is just a ZIP archive of XML and XHTML files (plus other assets, like images). Partly, I suspect, because of the dearth of actively maintained open source projects in the space, and partly because of the nature of tech in the book publishing industry, EPUB generation software used by authors and publishers often messes up this spec, which means that EPUB readers sometimes need to have fairly complex fallback logic for trying to figure out how to render a book. Also, because EPUBs are ZIP archives, some readers may either unzip the entire book into memory or "explode" it into an unzipped directory on disk, both of which may result in some slowness, especially if the book has lots of large resources. The newest Brandon Sanderson novel, for example, is ~300MB _zipped_.
Additionally, and perhaps more importantly, EPUBs (and I believe MOBIs as well) represent content as XHTML and CSS, which means that readers very often need to use a browser or webview to actually render the book. Precisely how they deliver this content into the webview can have a huge impact on performance; most browser don't love to be told to format entire novels worth of content into text columns, for example.
Additionally the XHTML content can just be a single large file instead of one file per chapter/section. Paginating and rendering the large single file is going to be more effort than the same on a smaller file. This is all on top of the pitfalls and variability you mention.
Yup, great point. Especially if you've used some tool to convert from another file, like a PDF, into an EPUB, you can easily end up with the entire book in a single XHTML file, which, again, can be pretty heavy for a browser to parse and format! I also have no idea whether Calibre et al actually use native web views, or have their own renderers, which are almost certainly less performant than native web views!
> Additionally the XHTML content can just be a single large file instead of one file per chapter/section.
Terry Pratchett books are notorious for that. Some tools EPUB authoring tools artificially introduce breaks, but you can't rely on them.
I used Storyteller to align the most recent Sanderson's novel on audio and the result is 1.7Gb. That's... painful. It resulted in it crashing the reader on Remarkable2 tablet.
I'm now actually working on a Calibre-Web change to strip the audio and media overlay from the books it serves via OPDS.
Then I'll need to tackle cross-device progress sync. This turned out to be surprisingly tricky.
You can’t do much better than that; that’s the size of the audiobook! For what it’s worth, I also used Storyteller on Wind and Truth, and got it down to 1.2GB by using the OPUS codec with a 32 kb/s bitrate.
Yeah. My current workaround is to create KEPUBs (Kobo-optimized epubs), but that creates an issue with cross-format reading progress sync. This is an interesting task in itself, though.
So I'm trying to design a progress sync protocol. My current idea is to just use several words from the text itself to unambiguously pinpoint the position within a section (chapter).
Is the idea that you have some devices that you want to download just the text to, but have it sync with your other devices? I think we could support that natively, honestly! Storyteller already has the input files, and it uses a text-based position system that doesn’t require the audio to exist. If you’re already doing work on this, maybe we could add it to Storyteller?
Ooh, that's an interesting idea. I only have one device where I would ever want to switch to listening to my book, but a couple others where I would like to read it.
FWIW, I wrote an EPUB (well, it was called OEBPS at the time) reader that rendered pretty much all of the format ~21 years ago (including all of XHTML and CSS) and it had very decent performance. I seem to recall that someone tried it on the One Laptop Per Child XO and it was... well, slow, but it worked.
So it's possible :)
Thank you so much! That's incredibly enlightening!
Of course! I'm hoping to have a web reader with Media Overlay support built in to Storyteller available in the next few months, along with some much needed library management tooling, so maybe that will be useful for you! I'll try to make it snappy :)
I find Koreader (linux version) leaps and bounds faster than the calibre reader.
I'm going to be annoying and simply laugh at the fact it's a 2k line, single file in source. It's TypeScript! You HAVE to compile it to ship it as an npm package (_technically_ you don't but). Split it up into smaller modules with `import` statements!
Quite seriously: why? What actual advantage do you get? People tend to split things up into separate files far too eagerly. A single two-thousand-line file is nothing. Especially when most of that is in one class, and needs to be, so you can’t really split things up, or at the very least there would be runtime overhead to it.
Thanks, yeah I agree. The vast majority of the code in this library lives in a single class definition. Is it possible to move the implementations into separate files? Totally. Would that make the codebase more legible? I think at the moment, I would argue no, it would actually hurt legibility. If the class needs to grow dramatically, then maybe we’ll need a different approach, but I think this is actually the right thing to do for now!
Genius. Thanks.
sometimes life just delivers. was looking for this 2 days ago
Outstanding. Let me know if you run into any issues!