A gift link from The Atlantic, a reporter is given $10k to spend on emerging betting markets and reports on what he found. Spoiler: it’s dark!

Lean In girlbosses out, burned out babes in.

Interesting read: NYT is using a custom LLM tool to track trends within the “manosphere,” as reported by the Nieman Journalism Lab.

A friend of the blog told me a story about a Substacker who uses AI to summarize books and then publishes AI-generated content about those summaries, never reading the books herself, and yet has a ton of followers. I’d guess at least some of those are purchased, betting that a high follower count will beget more followers by suggesting clout and credibility she didn’t earn as a reader talking to fellow readers. And followers aren’t subscribers, but that’s the business bet.

People are lookie-loos, they get curious when something is doing numbers and creating activity, so inflating follower counts is a real and persistent strategy. None of this is new. But best practices still hold regardless of which technologies you layer on top. Marketing erodes trust when it prioritizes short-term gains over honesty and reliability.​​​​​​​​​​​​​​​​

It’s strange to live in a time when you can’t reliably distinguish someone who has engaged with ideas from someone who automated the appearance of engaging with them.

Anecdotally hearing about LLMs being weaponized in divorce and custody, including inundating the other party with slop to drive up the opponent’s legal fees. Worse, the sycophancy is tuned to and confirms the aggrieved party’s grievances, regardless of their real-world relevance in court.

Brooke Warner grapples with the implications around authorship and ownership in relation to AI writing, and how it’s showing up in the small publishing business.

NYT on the trend of “Luddite teens.” I have a growing suspicion that various neo-Luddist trends will largely map to class identity. In some areas of the world, in some areas of the US, Facebook is the internet, yellow pages and water cooler, and the internet experience is entirely mediated by apps.

From 2021, notes on supply chains and chip shortages in a global economy.

Twitter on a vape, the great e-waste crisis.

A good essay on writing and communication in this time. Ultimately the pickle of AI writing is that meaning is something best made through collaboration.

Forgive me, but I’d like to propose we call this a “sloppelganger.”

Excited about Inkwell, a contemporary treatment of the classic RSS reader.

Doing numbers on Twitter/X this week: this paper tested 70+ LLMs on open-ended prompts and found they all produce strikingly similar outputs. Worse, the systems used to improve models actively penalize diversity, reinforcing the convergence.

I need someone to talk me into/out of the new Macbook Neo. Is this just a Chromebook for design people?

Through a new quiz, NYT asks readers to rate passages of writing against AI. Despite thinking I could spot the AI writing, my results were 50/50.

Telling the gods: I would love a micro.blog plugin that provides an easy way to bulk-manage post categories. If there is one, lmk.

Come look over my shoulder while I explore how and whether LLMs are good writing tools: Here’s a wee version of the LLM comparison exercise I did with my team. We’ll make it a two-fer so you can see how the “good writing” skill works in practice, though we’ll see how that actually goes.

One of the more useful things you can do with an LLM is hold up a few ideas side by side and apply lenses to them. I know this history pretty well, so I asked a series of LLMs, why is Wisconsin’s cultural identity and cohesion stronger than Indiana’s, from a historical and business perspective?

Here are the answers in one doc, for comparison.

Each LLM will give us more or less the same story, different flavor. Within the industry, the differences across the models reflect “model personality.” Asking “why” instead of “whether” will probably drive the answer to favor Wisconsin. Using multiple lenses (two states, historical + business, identity + cohesion) forces the LLM to cross-reference across more of its training data, which tends to produce a more comprehensive answer.

For all the chatter about consciousness and whatever, remember that an LLM is an infinite series of if/then/elses applied to human language and semantics, so being able to talk about language and communication, getting meta with the tool and how you think through language, helps a lot when using one. This is maybe the one thing I like about experimenting so hard with the tools. I’m thinking about the technical side of writing and enjoying it quite a lot.

Functionally: all of them acknowledge hard historical truths within the subject matter and don’t shy away from critical perspectives, which is good. Both Gemini and Copilot include in-line links, which lets you judge the output’s authority in the moment as a reader. I liked Copilot’s more than I expected here. Claude’s answers are more lyrical and do provide more context, and yet do not encourage checking against outside sources by providing links within the output. And you can see that even with the good writing skill calling out hard bans on certain structure, Claude plows right through them.

Model personality: Claude favors sociological answers to Copilot’s economic answers. Claude is also highly intellectual and narrative by comparison, and that narrative style can mask nuance by sinking relative context within the storytelling. Gemini simplifies, boosts and cheerleads where the others don’t, and really goes hard on Wisconsin’s reputation as a drinking and Packers state when there are stronger structural arguments in play. Copilot is tricky because it looks authoritative like a briefing, which also makes it easily “extractible” for the user, but every citation requires authentication unless this is one of those “good enough” tasks.

As a writer, something I find annoying across the whole spread is the semantic reveal. LLMs are semantic machines, and it is persistently revealed in ways that are weird to the human ear. All of them go out of their way to describe things as “structural,” “connective” as in “connective tissue,” “load-bearing” and “legible.”

Finally, I included a second tab where I asked Claude for analysis across the four outputs, where it suggests that my framing of the question is altogether kind of problematic. It shows how a strong prompt is sometimes also a bad approach.

There are a lot of possible takeaways here, but I’d rather set aside the question of which tool is “good” or “bad” or “better” and think more about the patterns across the tools and their implications.

Winer on Doctorow on RSS.

A note on Ezra Klein: I’ve been looking across the NYT’s breathless reporting on the AI industry and seeing very few women represented across the commentary. Klein’s tech coverage tends to center a fairly narrow circuit of sources: founders, researchers, and policy thinkers who are overwhelmingly male and concentrated in a few institutions, and generally assert the inevitability of disruptive AI as a matter of course. Meanwhile, Anthropic and others are anticipating that career ladders where women are concentrated will be impacted, some claiming professional women’s jobs will dry up or disappear entirely.

I looked over his podcast guests for the last year and noticed that among his tech-focused guests, almost all white, almost all men, almost none of them are women. A rough count of eight podcasts (nine if you stretch) with accompanying commentary only produced two women, both economists. Klein’s brand is the guy who does the reading, the big ideas guy. So it’s worth asking what he’s reading and whose ideas he considers big. While he probably wouldn’t disagree with Gebru or Crawford or Noble on the substance, his current body of commentary is pulled toward the perspectives of people who have the most to gain from the technology’s expansion and the least exposure to its costs.

One of the tricky things about consumer AI tools like Claude and Gemini is that the experience varies widely depending on the person using it, and it’s not always clear why. I have spent a lot of time learning the tools so I can advise around them in my work, and this variance of experience has become a frustrating part of the deal.

I manage a team of writers and creatives at work, and we are expected to have familiarity with the tools, despite complex and sometimes hostile feelings about the political and environmental implications around this sector. That’s quite a pickle, organizationally, managerially. Borrowing from Haraway, I thought okay, what if we take these tools seriously as a team of writers and creatives and put our professional standards up against them?

Among other exercises, I did a couple of comparisons on my team that are helpful for creating discussion around the “plausibility” question. People dismiss LLMs outputs as being merely plausible answers, rather than accurate or factual ones. And that’s correct, they are, and that’s the design. In many cases, plausibility is fine. Take Wikipedia, for example, which we understand to be a pretty good source, a plausible source, unless you’re writing a formal paper requiring original sources.

I digress – ultimately we needed to understand together that LLMs are not a WYSIWYG tool and talk through the implications.

I asked everyone to run the same paper through their LLM of choice, prompting it for a plain language summary. We then copied and pasted it into a shared doc, and compared and contrasted for discussion. Upon discussion, we had several takeaways, including that they were all similar in spirit but sometimes varying wildly in style and approach.

Knowing that algorithms are responsive and not static, we did it again later in the day, and copied and pasted our outputs into the shared doc. We compared and contrasted the difference between AM and PM. Again, it was similar in spirit but varied in style and approach. Some changed dramatically. One team member whose morning summary had been jokey and conversational received a much more staid and serious version in the afternoon.

At the time, I asked Claude to explain the variance: “Even with the same prompt and source material, LLMs don’t produce identical outputs each time. This is by design — there’s a degree of randomness (called “temperature”) in how the model selects words, which means each run produces a slightly different path through the text.”

Anyway, this got our gears turning on how (and whether) to approach LLMs as a team and as individuals and led to good group discussion. (It’s important to create space for criticism and critical approaches here.) It also gave us more confidence as a team responding to this new layer of complexity in our work, and helping our professional contacts and peers think about how to approach the tools and when and whether to use them. There will be tasks where AI-based tools are “good enough,” and tasks where they are not.

The swirl of mystery and speculation around this sector have people up in arms, and it’s useful to have approaches that give people firsthand experience, and to see how the experience works for others. The god trick of the singular interface turns out to be a bear for navigating it in the workplace, where our work is foundational, prosocial and specific.