One of the more useful things you can do with an LLM is hold up a few ideas side by side and apply lenses to them. I know this history pretty well, so I asked a series of LLMs, why is Wisconsin’s cultural identity and cohesion stronger than Indiana’s, from a historical and business perspective?
Each LLM will give us more or less the same story, different flavor. Within the industry, the differences across the models reflect “model personality.” Asking “why” instead of “whether” will probably drive the answer to favor Wisconsin. Using multiple lenses (two states, historical + business, identity + cohesion) forces the LLM to cross-reference across more of its training data, which tends to produce a more comprehensive answer.
Functionally: all of them acknowledge hard historical truths within the subject matter and don’t shy away from critical perspectives, which is good. Both Gemini and Copilot include in-line links, which lets you judge the output’s authority in the moment as a reader. I liked Copilot’s more than I expected here. Claude’s answers are more lyrical and do provide more context, and yet do not encourage checking against outside sources by providing links within the output. And you can see that even with the good writing skill calling out hard bans on certain structure, Claude plows right through them.
Model personality: Claude favors sociological answers to Copilot’s economic answers. Claude is also highly intellectual and narrative by comparison, and that narrative style can mask nuance by sinking relative context within the storytelling. Gemini simplifies, boosts and cheerleads where the others don’t, and really goes hard on Wisconsin’s reputation as a drinking and Packers state when there are stronger structural arguments in play. Copilot is tricky because it looks authoritative like a briefing, which also makes it easily “extractible” for the user, but every citation requires authentication unless this is one of those “good enough” tasks.
As a writer, something I find annoying across the whole spread is the semantic reveal. LLMs are semantic machines, and it is persistently revealed in ways that are weird to the human ear. All of them go out of their way to describe things as “structural,” “connective” as in “connective tissue,” “load-bearing” and “legible.”
Finally, I included a second tab where I asked Claude for analysis across the four outputs, where it suggests that my framing of the question is altogether kind of problematic. It shows how a strong prompt is sometimes also a bad approach.
There are a lot of possible takeaways here, but I’d rather set aside the question of which tool is “good” or “bad” or “better” and think more about the patterns across the tools and their implications.
A note on Ezra Klein: I’ve been looking across the NYT’s breathless reporting on the AI industry and seeing very few women represented across the commentary. Klein’s tech coverage tends to center a fairly narrow circuit of sources: founders, researchers, and policy thinkers who are overwhelmingly male and concentrated in a few institutions, and generally assert the inevitability of disruptive AI as a matter of course. Meanwhile, Anthropic and others are anticipating that career ladders where women are concentrated will be impacted, some claiming professional women’s jobs will dry up or disappear entirely.
I looked over his podcast guests for the last year and noticed that among his tech-focused guests, almost all white, almost all men, almost none of them are women. A rough count of eight podcasts (nine if you stretch) with accompanying commentary only produced two women, both economists. Klein’s brand is the guy who does the reading, the big ideas guy. So it’s worth asking what he’s reading and whose ideas he considers big. While he probably wouldn’t disagree with Gebru or Crawford or Noble on the substance, his current body of commentary is pulled toward the perspectives of people who have the most to gain from the technology’s expansion and the least exposure to its costs.
One of the tricky things about consumer AI tools like Claude and Gemini is that the experience varies widely depending on the person using it, and it’s not always clear why. I have spent a lot of time learning the tools so I can advise around them in my work, and this variance of experience has become a frustrating part of the deal.
I manage a team of writers and creatives at work, and we are expected to have familiarity with the tools, despite complex and sometimes hostile feelings about the political and environmental implications around this sector. That’s quite a pickle, organizationally, managerially. Borrowing from Haraway, I thought okay, what if we take these tools seriously as a team of writers and creatives and put our professional standards up against them?
Among other exercises, I did a couple of comparisons on my team that are helpful for creating discussion around the “plausibility” question. People dismiss LLMs outputs as being merely plausible answers, rather than accurate or factual ones. And that’s correct, they are, and that’s the design. In many cases, plausibility is fine. Take Wikipedia, for example, which we understand to be a pretty good source, a plausible source, unless you’re writing a formal paper requiring original sources.
I digress – ultimately we needed to understand together that LLMs are not a WYSIWYG tool and talk through the implications.
I asked everyone to run the same paper through their LLM of choice, prompting it for a plain language summary. We then copied and pasted it into a shared doc, and compared and contrasted for discussion. Upon discussion, we had several takeaways, including that they were all similar in spirit but sometimes varying wildly in style and approach.
Knowing that algorithms are responsive and not static, we did it again later in the day, and copied and pasted our outputs into the shared doc. We compared and contrasted the difference between AM and PM. Again, it was similar in spirit but varied in style and approach. Some changed dramatically. One team member whose morning summary had been jokey and conversational received a much more staid and serious version in the afternoon.
At the time, I asked Claude to explain the variance: “Even with the same prompt and source material, LLMs don’t produce identical outputs each time. This is by design — there’s a degree of randomness (called “temperature”) in how the model selects words, which means each run produces a slightly different path through the text.”
Anyway, this got our gears turning on how (and whether) to approach LLMs as a team and as individuals and led to good group discussion. (It’s important to create space for criticism and critical approaches here.) It also gave us more confidence as a team responding to this new layer of complexity in our work, and helping our professional contacts and peers think about how to approach the tools and when and whether to use them. There will be tasks where AI-based tools are “good enough,” and tasks where they are not.
The swirl of mystery and speculation around this sector have people up in arms, and it’s useful to have approaches that give people firsthand experience, and to see how the experience works for others. The god trick of the singular interface turns out to be a bear for navigating it in the workplace, where our work is foundational, prosocial and specific.
Thinking more about arXiv and public-facing publishing: Back in the day, my mentor was among the first on campus to successfully argue that public-facing publishing should count toward tenure, as service and outreach at minimum. She was on the edge of digital studies, and by the time her work was formally peer-reviewed it risked being irrelevant. At the time, in the early 00s, blogging was the primary mechanism to get the job done, and peer review happened online in real time through interaction with other scholars. This also foreshadowed the “influencer” model that we know today – by cultivating interest and influence through these more informal channels, their formal work gained traction in formal channels.
Some notes on arXiv: arXiv is a preprint server hosted and maintained by Cornell University, where researchers post their work to establish priority and get it in front of other researchers quickly. This kind of server is common in emerging fields, to help build a body of shared knowledge among researchers, and fast.
With all the speculation around AI futures, arXiv papers are suddenly hot content on Twitter/X, paired with aggressive forecasting and commentary. This is causing a lot of ruckus among the ranks. arXiv papers are a-okay to reference as sources of current AI research, but I’d flag a couple of things: they’re technical documents written for other researchers, not press releases or news coverage, so the claims in them are often much narrower and more specific than how they get interpreted and reported.
A smart take on nonconsensual deepfakes, considering them not just as a social issue but as a cybersecurity issue, as they’re often connected to broader financial and social harassment campaigns.
She draws on Acemoglu and Restrepo’s concept of “so-so” technologies and traces a pattern from MOOCs to DOGE, where each iteration promises transformation but delivers incremental improvements at best and labor displacement at worst. The real danger, she argues, is that AI’s most compelling use case in the current political environment is threatening, demoralizing workers and justifying cuts, not revolutionizing how work gets done.
Cottom has been one of the writers I keep returning to because she is not dismissing the technology or retreating into reactionary nostalgia. She looks past the product announcements to the political economy underneath them. Who benefits from the hype cycle? What happens to the institutional infrastructure (education, research, public expertise) that AI claims to augment and simultaneously threatens to starve? She’s also very active on Instagram (and promoting a new documentary) and tracing the news around AI and higher ed in real time.
Some content notes: while I haven’t quite figured out how much I want to broadcast this space, I have put a couple of feedback mechanisms into place, including an email loop as well as turning on comments for those logged into micro.blog, Bluesky and Mastodon. If you’re reading along, say hello.
LLMs have a default house writing style with identifiable patterns: sentence fragments for emphasis, “not X, but Y” constructions, lots of hard contrast, atmospheric openings, heavy use of em dashes, and heavy use of marketing language. This reflects the semantic construction of an LLM. Custom instructions can override these defaults. A custom skill is a set of instructions within your account that modify how the model generates text. When you paste instructions into your profile settings, Claude reads them at the start of every conversation and adjusts its output accordingly.
I began using Claude daily for light writing tasks about six months ago, and over that time I started cataloging the patterns I was consistently editing out, including the terrible “not X, but Y” construction that showed up in nearly every response, and persistent em dashes used as all-purpose connectors when other punctuation is more appropriate.
I went through several iterations of bullying Claude into submission, narrowing the scope each time, before arriving at this version, which focuses specifically on writing mechanics and hard prohibitions.
You’ll need a paid Claude plan (Pro, Max, Team, or Enterprise). Free-tier accounts don’t have access to custom skills.
• Within the app, navigate to Customize > Skills and Create new skills
• Select add a new skill and Write skill instructions
• Copy and paste the copy from this file into the skill, making note of the name and description boxes. Feel free to tinker.
• Save your changes.
Note: The instructions in the linked file are Claude’s work, not mine. They came out of months of conversation, where Claude would analyze my style notes, and the file evolved from there. They read a little strangely because of that process. If I’d written them from scratch, they’d sound different. But looking at the file you can see what Claude responds to and how it works.
Claude will apply these instructions to every new conversation going forward. Existing conversations won’t pick up the change, so start a fresh chat to test it. If and when Claude struggles to apply the skill, call it out specifically in the prompt, such as, “Revise this for length using the good writing skill.”
The skill specifies constraints in a few categories and the instructions are plain text. As you go, you can also ask Claude to analyze previous conversations for suggested additions to the skill, which Claude will produce and implement within the chat. Each rule operates independently, so removing one doesn’t affect the others.
Claude processes custom instructions at the start of every conversation, before it generates any output. The instructions function as constraints on the model’s default behavior. The model doesn’t always follow every instruction perfectly and the results vary by task. You will still need to edit.
This new report from Anthropic is depressing at best, as it tries to measure which employment sectors carry the most exposure around AI expansion into the economy. In short, the tech is likely to impact two groups the hardest: educated professional women, and young workers for whom the career ladder will never materialize. In a right-side-up world, this would change the political dynamics of any policy response considerably. In this one, I don’t know.
Anthropic’s positioning here is curious, very god tricky. They are claiming the mantle of responsibility and transparency while predicting an inevitable end nobody wants, that they’re also selling as a service.
I still think much of the forecasting is oversold – the tech performs well in optimized environments, and last mile issues are a perennial concern in any engineering venture because the practical world is non-optimal. Time will tell, and there are big incentives in play. But the hunger and animus around the forecast feel bad.
I have pretty strong feelings against the use of AI around military operations, based on my hands-on experience with the tools. When the tech’s creators say the tech isn’t ripe for warfare, that’s strong feedback. And I agree, with the risks and implications around trust, marketing and hallucinations, it’s not even really ripe for the consumer marketplace, much less drone warfare. Whether or not artificial intelligence tech should be used for war is, of course, at the root of Haraway’s thesis, which we like to noodle with around these parts.
You might call this a taste test: Obsessed with the story about the McDonald’s CEO and how his LinkedIn-style videos selling the McD’s franchise have escaped containment, leading to one of the funnier CEO/product marketing dynamics in recent history. Burger King’s CEO swooped in, holding widely-marketed listening sessions with customers and to demonstrate his love for the Whopper in contrast with the deeply weird McD’s videos. Must read: Internet long-hauler Katie Notopoulos on how direct-to-public marketing works when the public is more familiar with the product than the org’s leaders.
My teenager thinks fast food is cool and subversive (cue the sound of one hundred moms groaning) and she and her friends regularly talk shop. They are Taco Bell fans and think burger stops are kind of gauche. Not gauche: Baja Blast.