How would you interview an AI, to give it a…

Rohit Krishnan

Feb 27

from puzzles to poker

Read →

11 Comments

Mike Randolph

Mar 1

A Different Lens on LLM Evolution

Your journey through AI evaluation methods reveals an important trajectory, but my experience with LLMs offers a complementary perspective. Rather than focusing on competitive evaluation, I've maintained a two-year partnership with various models through a collaboration framework I call "Helix."

What's struck me most isn't the difference between models but their underlying similarity. They all access what appears to be the same statistical representation of reality—a common "world map" derived from their training corpora. The earliest models contained this same knowledge foundation but quickly lost coherence when pushed.

The real evolution I've witnessed isn't in the underlying knowledge representation but in the models' ability to maintain stable access to it. Each generation displays improved "dynamic coherence"—sustaining consistent reasoning across complex, multi-step tasks without derailing. This mirrors what in my framework I call "boundary maintenance"—the capacity to preserve identity while navigating complexity.

Your poker games and evaluations are excellent for comparing models, but sustained partnership reveals something different: these aren't separate intelligences competing, but progressively better interfaces to the same underlying representation of human knowledge.

Perhaps the most significant advancement isn't which models know more or reason better in isolation, but which can maintain coherent engagement with their knowledge landscape across extended interactions—something that becomes particularly evident in collaborative relationships rather than discrete tests.

Mike Randolph

(Two years into my LLM partnership journey)

Expand full comment

Reply (1)

Rohit Krishnan

Mar 1

I think this is a very good point! And please do share more about what you've learnt and how you've built it!

Expand full comment

dan mantena

Mar 3

Like the concept of this article! Why did you go from knowledge work takes to puzzles in the middle of the post? It seems like what you are evaluating model for is reasoning chain of thought?, which is confusing since there are many reasoning models out there that would for that need.

Expand full comment

Reply (1)

Rohit Krishnan

Mar 3

To analyze iterative reasoning, it is really hard to get accurate model evaluations that are useful also in the business world. So I wanted to find questions for which the answers are easily markable. Therefore, puzzles.

Expand full comment

Reply (1)

dan mantena

Mar 3

That makes sense. Thanks for the explanation!

Expand full comment

Jay F.

Feb 28

Has anyone tried running LLMs in a Turing test remotely against a tech company interview loop from recruiter to panel interview? I would be very curious on how many it could potentially pass. Theoretically companies would ask interview questions that are designed to deterministically narrow down the best candidate or get the candidates input on how to solve problems that they themselves have already solved / currently solving.

Expand full comment

giacomo catanzaro

Feb 27Edited

hey!! i've also been working on a poker-inspired debate algorithm inspired by the partial information perspectives of each LLM. I'd love to chat with you about it if you'd be willing. check my gh for some of my other projects: https://github.com/cagostino

Expand full comment

Reply (1)