One of the tricky things about consumer AI tools like Claude and Gemini is that the experience varies widely depending on the person using it, and it’s not always clear why. I have spent a lot of time learning the tools so I can advise around them in my work, and this variance of experience has become a frustrating part of the deal.

I manage a team of writers and creatives at work, and we are expected to have familiarity with the tools, despite complex and sometimes hostile feelings about the political and environmental implications around this sector. That’s quite a pickle, organizationally, managerially. Borrowing from Haraway, I thought okay, what if we take these tools seriously as a team of writers and creatives and put our professional standards up against them?

Among other exercises, I did a couple of comparisons on my team that are helpful for creating discussion around the “plausibility” question. People dismiss LLMs outputs as being merely plausible answers, rather than accurate or factual ones. And that’s correct, they are, and that’s the design. In many cases, plausibility is fine. Take Wikipedia, for example, which we understand to be a pretty good source, a plausible source, unless you’re writing a formal paper requiring original sources.

I digress – ultimately we needed to understand together that LLMs are not a WYSIWYG tool and talk through the implications.

I asked everyone to run the same paper through their LLM of choice, prompting it for a plain language summary. We then copied and pasted it into a shared doc, and compared and contrasted for discussion. Upon discussion, we had several takeaways, including that they were all similar in spirit but sometimes varying wildly in style and approach.

Knowing that algorithms are responsive and not static, we did it again later in the day, and copied and pasted our outputs into the shared doc. We compared and contrasted the difference between AM and PM. Again, it was similar in spirit but varied in style and approach. Some changed dramatically. One team member whose morning summary had been jokey and conversational received a much more staid and serious version in the afternoon.

At the time, I asked Claude to explain the variance: “Even with the same prompt and source material, LLMs don’t produce identical outputs each time. This is by design — there’s a degree of randomness (called “temperature”) in how the model selects words, which means each run produces a slightly different path through the text.”

Anyway, this got our gears turning on how (and whether) to approach LLMs as a team and as individuals and led to good group discussion. (It’s important to create space for criticism and critical approaches here.) It also gave us more confidence as a team responding to this new layer of complexity in our work, and helping our professional contacts and peers think about how to approach the tools and when and whether to use them. There will be tasks where AI-based tools are “good enough,” and tasks where they are not.

The swirl of mystery and speculation around this sector have people up in arms, and it’s useful to have approaches that give people firsthand experience, and to see how the experience works for others. The god trick of the singular interface turns out to be a bear for navigating it in the workplace, where our work is foundational, prosocial and specific.

Newsletter Microposts Tech