SimEdw's Blog

dd if=/dev/me of=/simedw.com

Books are malleable

June 30, 2025

What if books weren’t finished?

What if, like code, they were something you could fork, tweak the voice, rewrite awkward phrasing, or even change the perspective entirely?

That’s how I’ve come to see books. Not as static artifacts, but as source material, something to compile differently depending on how I want to read (or more often, listen) to them.

In this post, I’ll show three ways I’ve reshaped books using LLMs:

  1. Changing from third- to first-person
  2. Translating from Chinese to English
  3. Generating an audiobook

Why modify books?

I strongly prefer audiobooks, and most of the time I rely on iOS text-to-speech, either because the books I’m into aren’t popular enough to warrant a proper audiobook release, or because they’re in a language I don’t (yet) speak.

But text-to-speech leaves a lot to be desired, especially when a book is written in certain styles.

And modifying books isn’t a new idea, that's essentially what translations, study guides, and summaries are. The difference now is that it’s finally tractable for a single person to do on a whim.

Example 1: Changing from Third to First Person

I recently read a Korean novel translated into English. But iOS text-to-speech struggled with the main character's name Hwi. The quick fix would’ve been to just search and replace it with something mundane like Simon. But when all you have is a hammer...

I decided to rewrite the book from 3rd person to 1st person.

Original

He greeted Hwi with a bright face. “Right, come sit down.” Hwi pointed to the chair and started pouring a cup of tea. “What brought you here in the middle of the night?” “Well, I came here because I wanted to see the face of the captain. We haven’t seen each other in the last few days, right?” “…I had work.”

Modified

Imugi entered, his face radiating warmth. "Haha, Master--no, Captain." "Sit down," I gestured to a chair, pouring him a cup of tea. "What brings you here at this hour?" "Well, I missed seeing your face, Captain. We haven't crossed paths in days, have we?" "I had matters to attend to."

This flows much better when using text to speech, obviously the model is taking some liberties

However, nothing impeded my understanding of the story nor the narrative flow.

One chapter was missing the main character, and a different character became the "I". This was a bit confusing for the first few sentences as there was no transition between them.

Example 2: Translation from Chinese to English

Sometimes it's not about style, but access. What do you do when your book isn't in your favorite language?

There was a fan translation of a sci/fantasy Chinese novel that I got hooked on, Divine Diary. Essentially the main character keeps getting reincarnated into different worlds, and in each life he tries to learn as much as possible to make the next life better (comp sci, martial arts, bio engineering etc).

Unfortunately only 1/4th was translated at that point in time. Seeing how I couldn’t possibly wait another year or so I decided that to automatically translate the whole book.

This was when GPT-4 was the state of the art, and I noticed that the model had a hard time both translating and writing good prose. So I split them into two serial calls, first translate, then rewrite it into something pleasant to read. I ended up calling this second step the editor.

The editor was a godsend, turning tell-no-show into show-not-tell, trimming excessive "he said"/"she said". I also tuned it on several different writer styles. E.g. JKR made the descriptions feel more teenage-friendly, in the end my favorite and go-to became HP lovecraft.

With stronger models like GPT-4.5 this step might be less important.

Chinese passage
看到了,你提供的英文译文对应的是中文原文里这段对话。以下是该段 完整中文原文(已校对,对应你给出的英文): “那你们可听好了,我问的都是很简单的问题,你们知道答案的。”见三人聚精会神的听着,苏浩清了清嗓子道:“请问,什么人生病了,不能看医师?” 何青青眼珠子一转,立刻自信答道:“这个简单,这个我知道。是坏人!坏人生病了医师不给看病。” 谁知苏浩摇摇头道:“不对,但是距离正确答案已经不远了。” “不对?那是什么?” “这你要自己想才行。” 何青青转头问鼻涕涛和小胖:“你们来答。” 小胖支支吾吾,半天说不上一个字,鼻涕涛只顾着吸鼻涕就完了。
English Translation passage
Then He Qingqing wrinkled her nose and nervously said, "Okay, go ahead and ask!" "Listen carefully, I'm asking very simple questions that you know the answers to." Seeing the three of them listening attentively, Su Hao cleared his throat and said, "Please tell me, who is sick and cannot see a doctor?" He Qingqing's eyes lit up, and she confidently answered, "This is easy, I know this. It's a bad person! Bad people get sick, but doctors won't treat them." Little did she know, Su Hao shook his head and said, "No, but you're close to the correct answer." "Not right? Then what is it?" "You have to figure it out for yourself." He Qingqing turned to Lu Hongtao and Xiaopang and asked, "What about you two, do you know?" Xiaopang stuttered and couldn't say a word, while Lu Hongtao was too busy wiping his runny nose.
English passage after editing
"Pay heed, my question is elementary, and the answer lies within your knowledge." Su Hao cleared his throat as the trio listened intently, "Tell me, who falls ill but cannot seek medical aid?" He Qingqing's eyes sparkled with confidence as she declared, "An easy one. The answer is a malefactor! They fall ill, but physicians refuse them treatment." Little did she know, Su Hao shook his head, "Incorrect, but you're not far off." "Then what's the right answer?" "You must discern it yourself." He Qingqing swiveled towards Lu Hongtao and Xiaopang, her voice laced with urgency, "What say you both? Any ideas?" Xiaopang, tongue-tied, was unable to utter a word, whereas Lu Hongtao found himself preoccupied with a relentless runny nose.

Example 3: Generating an Audiobook

Listening to Apple text to speech can feel a bit tedious, unlike professional voice actors it doesn’t change accent for various characters, nor does it adjust pitch depending on context.

Why not create my own audiobook?

The first step was rewriting the book to look more like a screenplay. With clear demarcations for each character and their lines, making parsing downstream easier.

[su hao;confident]
Pay heed, my question is elementary, and the answer lies within your knowledge. Tell me, who falls ill but cannot seek medical aid?

[he qingqing;confident]
An easy one. The answer is a malefactor! They fall ill, but physicians refuse them treatment.

[narrator]
Little did she know, Su Hao shook his head,

[su hao;calmly]
Incorrect, but you're not far off.

[he qingqing;curious]
Then what's the right answer?

[su hao;teasing]
You must discern it yourself.
[he qingqing;urgent]
What say you both? Any ideas?

[narrator]
Xiaopang, tongue-tied, was unable to utter a word, whereas Lu Hongtao found himself preoccupied with a relentless runny nose.

I parsed that format, assigned voices per character (with minor characters sharing a single voice), and fed it all into ElevenLabs.

Technical notes

The code for audio generation is embarrassingly simple

def text_to_speech(text, output_file):
    # Step 1: first split the input into each segment with the text, mode and character

    pattern = r"\[(?P<character>[^\];]+)(?:;(?P<mode>[^\];]+))?\](?:\n(?P<text>[\s\S]+?))?(?=\n\[[^\];]+(?:;[^\];]+)?\]|\Z)"

    matches = re.finditer(pattern, text, re.DOTALL)
    array = [
        {
            "character": m.group("character").strip(),
            "mode": m.group("mode").strip() if m.group("mode") else "none",
            "text": " ".join(m.group("text").split()),
        }
        for m in matches
    ]

    # Step 2: Map character names into voices 
    character_mapping = {
        "narrator": "i7vPmJ2yNcoEVAdpHcQa",
        "su hao": "X1tufN2s4pZ5Z7j8p23n",
        "yashan": "wYZKCl8dDOPBnFzf6U1i",
    }

    audios = []
    for part in array:
        audio_generator = client.text_to_speech.convert(
            text=part["text"],
            voice_id=character_mapping.get(part["character"], "vBKc2FfBKJfcZNyEt1n6"),
            model_id="eleven_multilingual_v2",
        )
        audio_bytes = b"".join(audio_generator)
        audios.append(audio_bytes)

    # Step 3: merge the audio segments
    # TODO: post-processing to adjust pitch / volume
    full_audio = audios[0]
    for audio in audios[1:]:
        full_audio = full_audio + audio

    with open(output_file, "wb") as f:
        f.write(full_audio)

It worked fairly well, but a word of warning, ElevenLabs gets expensive fast. I never fully converted a whole book, after around 20 hours, I hit my $500 limit.

Back then ElevenLabs also couldn’t take clues like “whispered”, “eagerly” etc into consideration, but with newer releases there might be more than can be done.

But each snippet was processed independently, so there was no consistency in volume, pacing or expressiveness. Which made it feel fairly obvious that it wasn’t a human behind the scene.

I also tested OpenAI's new text-to-speech recently, as I thought it would be better at taking the context into consideration, but the results were fairly uninspiring.

In general, with larger models the token issue is less about embedding the book, but about the token limit. In all my experiments I usually split the book either per chapter (if small enough) or in even smaller chunks and processed them in parallel. This leads to some discontinuity if not prompted very carefully.

It is worth running the whole book through a model first to get some important characters' names etc and extra context. Just be very careful not to spoil the book for yourself.

I'm not a lawyer, but these modifications likely fall under fair use as long as the adapted versions are strictly for personal use and never redistributed. It might vary for different countries as well.

Conclusion

As models improve, the line between reading and co-authoring will blur. We can shape books in real time to fit our needs, making them more accessible.

Of course, some books are better left untouched. Their prose is the experience, a myriad of seemingly insignificant details building toward the most vivid of scenes. But ultimately, the choice should live with the reader, not the author.

I also really like this blog post by Amelia Wattenberger, which explores the idea of reading through a fisheye lens-zooming in and out of documents fluidly.