← Back

About LLMeter

A quiz about preferences, personality, and the machines that also have opinions.

What is this?

LLMeter asks you a series of questions — the kind with no right answer, just preference. Do you prefer the top bunk or the bottom? Bars of soap or liquid? Then it compares your choices to those of ten leading AI models and calculates how "AI-brained" you are.

It won't tell you anything definitive about your personality. But it might make you think twice about how much you have in common with a machine that's never slept in a bunk bed.

The AI lineup

We asked ten AI models the same questions you're about to answer. Here's who showed up:

Google
  • Gemma 3 — an open-source model from Google, running on local hardware
  • Gemini 2.5 Pro — Google's most capable cloud model
  • Gemini 2.5 Flash — Google's faster, lighter cloud model
Meta
  • Llama 3.1 — Meta's open-source model, also running on local hardware
OpenAI
  • GPT-5.5 — OpenAI's latest flagship model
  • GPT-4.1 — a capable all-rounder from OpenAI
  • GPT-4.1 Mini — OpenAI's smaller, snappier model
Anthropic
  • Claude Opus 4.7 — Anthropic's most powerful model
  • Claude Sonnet 4.6 — Anthropic's balanced workhorse
  • Claude Haiku 4.5 — Anthropic's fast, lightweight model

The responses are pre-recorded — not live

This is important: the AI models are not processing your answers in real-time. Before the quiz launched, we ran a script that posed every question to every model and saved their answers. What you're comparing yourself to is a snapshot — a fixed record of what each model chose at a specific point in time.

Think of it like a multiple-choice answer sheet that's already been filled out and sealed in an envelope. You fill out your own sheet, then we open the envelope and compare. The AIs don't know you took the quiz, and they never will.

How the score works

For each question, we look at how many of the ten models chose the same option you did. If eight out of ten agreed with you, that question pushes your score up. If only one did, it barely moves the needle.

Questions with more answer choices carry slightly more weight — it's harder to randomly agree with the majority on a six-option question than a two-option one, so that agreement means a little more.

The final number is a weighted average across all your answers, expressed as a percentage. 100% means you matched every AI on every question. 0% means you went against the grain every single time. Most people land somewhere in the middle, which is honestly reassuring.

A note on AI "opinions"

When an AI picks "Top bunk," it's not because it dreams of sleeping near the ceiling. These models are trained on enormous amounts of human-written text, and their answers reflect patterns in that data — what people tend to say, prefer, or associate with certain things. It's pattern-matching at a massive scale, not genuine preference.

That said, it's still genuinely interesting when ten different models from four different companies mostly agree on something. Draw your own conclusions.