Testing AI Writing Voices


A Week-Long Experiment

A computer on a desk with a keyboard

Experimental Question: "Write a 200-word introduction for an article about the future of remote work that sounds exactly like me. Here's what makes my writing unique: I use accessible language, focus on human impact over technical details, often include unexpected cultural references, and always end with a thought-provoking question that challenges conventional wisdom. I tend to use short paragraphs. I avoid corporate jargon and never use phrases like 'leverage' or 'paradigm shift.”

The prompt may seem straightforward, but you might be surprised at how poorly language models (LLMs) can perform; perhaps "poorly" isn’t the right word. It’s understandable that these tools often generate unexpected results, given that many of them use similar underlying models. Anyway, let’s move on to the experiment.

Tools/methods: Claude, GPT-4.5 (it's supposed to be great at writing), and Copy.ai. I have provided each of these with some of my articles.

This question tests whether AI can:

  • Capture stylistic elements (sentence structure, paragraph length)

  • Implement specific writing techniques (accessibility, cultural references)

  • Avoid common AI writing pitfalls (corporate jargon, generic conclusions)

  • Maintain a value-driven approach (human impact focus)

The Results

Claude’s answer

At a glance, this seems fine, but it‘s very slow, not quite what I aim to convey.

Type.ai

Again quite similar to Claude, and this model seems to go even further, preferring to write a short form article as opposed to the introduction as requested.

GPT 4.5

Again same sort of result and flow.


Initial Thoughts: Patterns are emerging: AI can mimic surface style but struggles with deeper values and purpose. This raises questions: What makes a voice authentically "mine"?

The Messy Bits: What's not working: Hard to define what makes my voice distinctive. Assumptions I'm questioning: Maybe "voice" isn't about word choice but about what I choose to focus on Dead ends: Trying to quantify voice with metrics - doesn't capture the essence

So there you go. My results beg me to question whether these platforms really try to read what has been produced and tailor it to the desired voice, or rather, do they all look at wider examples now available on the internet, much of which is LLm-generated. What I want to test next is whether AI can capture my voice if I give it my unpublished notes, too.