Come look over my shoulder while I explore how and whether LLMs are good writing tools: Here’s a wee version of the LLM comparison exercise I did with my team. We’ll make it a two-fer so you can see how the “good writing” skill works in practice, though we’ll see how that actually goes.
One of the more useful things you can do with an LLM is hold up a few ideas side by side and apply lenses to them. I know this history pretty well, so I asked a series of LLMs, why is Wisconsin’s cultural identity and cohesion stronger than Indiana’s, from a historical and business perspective?
Here are the answers in one doc, for comparison.
Each LLM will give us more or less the same story, different flavor. Within the industry, the differences across the models reflect “model personality.” Asking “why” instead of “whether” will probably drive the answer to favor Wisconsin. Using multiple lenses (two states, historical + business, identity + cohesion) forces the LLM to cross-reference across more of its training data, which tends to produce a more comprehensive answer.
For all the chatter about consciousness and whatever, remember that an LLM is an infinite series of if/then/elses applied to human language and semantics, so being able to talk about language and communication, getting meta with the tool and how you think through language, helps a lot when using one. This is maybe the one thing I like about experimenting so hard with the tools. I’m thinking about the technical side of writing and enjoying it quite a lot.
Functionally: all of them acknowledge hard historical truths within the subject matter and don’t shy away from critical perspectives, which is good. Both Gemini and Copilot include in-line links, which lets you judge the output’s authority in the moment as a reader. I liked Copilot’s more than I expected here. Claude’s answers are more lyrical and do provide more context, and yet do not encourage checking against outside sources by providing links within the output. And you can see that even with the good writing skill calling out hard bans on certain structure, Claude plows right through them.
Model personality: Claude favors sociological answers to Copilot’s economic answers. Claude is also highly intellectual and narrative by comparison, and that narrative style can mask nuance by sinking relative context within the storytelling. Gemini simplifies, boosts and cheerleads where the others don’t, and really goes hard on Wisconsin’s reputation as a drinking and Packers state when there are stronger structural arguments in play. Copilot is tricky because it looks authoritative like a briefing, which also makes it easily “extractible” for the user, but every citation requires authentication unless this is one of those “good enough” tasks.
As a writer, something I find annoying across the whole spread is the semantic reveal. LLMs are semantic machines, and it is persistently revealed in ways that are weird to the human ear. All of them go out of their way to describe things as “structural,” “connective” as in “connective tissue,” “load-bearing” and “legible.”
Finally, I included a second tab where I asked Claude for analysis across the four outputs, where it suggests that my framing of the question is altogether kind of problematic. It shows how a strong prompt is sometimes also a bad approach.
There are a lot of possible takeaways here, but I’d rather set aside the question of which tool is “good” or “bad” or “better” and think more about the patterns across the tools and their implications.