Can LLMs beat Balatro?

In short: yes, and now I'm betting on Google to win the AI race.

Jordan Zhang Code

Balatro is a roguelike deckbuilder based on poker that was released in 2024. Players play poker hands to earn chips and pass rounds, trying to exceed exponentially increasing chip requirement as rounds progress.

Balatro gameplay 1

1. Start playing regular poker hands to earn money.

Balatro gameplay 2

2. Buy and use power ups to stack the deck in your favor.

Balatro gameplay 3

3. Build OP combos and win the game.

Balatro is an interesting test bed for LLM reasoning because:

  • It is stochastic with an enormous branching factor, requiring probabilistic reasoning to make optimal decisions.
  • Long range planning and execution are needed to curate strong enough combos and decks that can keep up with the exponentially increasing chip requirement.
  • Being turn-based, with no real-time or visuospatial qualities, the game is playable entirely in text form.

An example screenshot and corresponding state JSON from Ante 7:

Ante 7 game state screenshot
{
  "hand_levels": {
    "Two Pair": {
      "times_played": 2,
      "chips": 20,
      "level": 1,
      "mult": 2
    },
    "Pair": {
      "times_played": 2,
      "chips": 10,
      "level": 1,
      "mult": 2
    },
    "High Card": {
      "times_played": 6,
      "chips": 5,
      "level": 1,
      "mult": 1
    },
    "Flush Five": {
      "times_played": 0,
      "chips": 160,
      "level": 1,
      "mult": 16
    },
    "Flush House": {
      "times_played": 0,
      "chips": 140,
      "level": 1,
      "mult": 14
    },
    "Five of a Kind": {
      "times_played": 0,
      "chips": 120,
      "level": 1,
      "mult": 12
    },
    "Straight Flush": {
      "times_played": 0,
      "chips": 180,
      "level": 3,
      "mult": 16
    },
    "Four of a Kind": {
      "times_played": 0,
      "chips": 60,
      "level": 1,
      "mult": 7
    },
    "Full House": {
      "times_played": 0,
      "chips": 40,
      "level": 1,
      "mult": 4
    },
    "Flush": {
      "times_played": 16,
      "chips": 95,
      "level": 5,
      "mult": 12
    },
    "Straight": {
      "times_played": 0,
      "chips": 30,
      "level": 1,
      "mult": 4
    },
    "Three of a Kind": {
      "times_played": 0,
      "chips": 30,
      "level": 1,
      "mult": 3
    }
  },
  "deck": [
    {
      "facing": "back",
      "cost": 1,
      "type": "deck",
      "main_description": "+8 chips",
      "secondary_description": "",
      "sells_for": 1,
      "name": "8 of Diamonds",
      "enhancement": "Default Base"
    },
    {
      "facing": "back",
      "cost": 1,
      "type": "deck",
      "main_description": "+10 chips",
      "secondary_description": "",
      "sells_for": 1,
      "name": "Jack of Hearts",
      "enhancement": "Default Base"
    },
    {
      "facing": "back",
      "cost": 1,
      "type": "deck",
      "main_description": "+7 chips",
      "secondary_description": "",
      "sells_for": 1,
      "name": "7 of Hearts",
      "enhancement": "Default Base"
    },
    {
      "facing": "back",
      "cost": 1,
      "type": "deck",
      "main_description": "+9 chips",
      "secondary_description": "",
      "sells_for": 1,
      "name": "9 of Clubs",
      "enhancement": "Default Base"
    },
    {
      "facing": "back",
      "cost": 1,
      "type": "deck",
      "main_description": "+10 chips",
      "secondary_description": "",
      "sells_for": 1,
      "name": "Queen of Clubs",
      "enhancement": "Default Base"
    },
    {
      "facing": "back",
      "cost": 1,
      "type": "deck",
      "main_description": "+5 chips",
      "secondary_description": "",
      "sells_for": 1,
      "name": "5 of Spades",
      "enhancement": "Default Base"
    },
    {
      "facing": "back",
      "cost": 1,
      "type": "deck",
      "main_description": "+3 chips X1.5 Mult while this card stays in hand",
      "secondary_description": "",
      "sells_for": 1,
      "name": "3 of Hearts",
      "enhancement": "Steel Card"
    },
    {
      "facing": "back",
      "cost": 1,
      "type": "deck",
      "main_description": "+10 chips",
      "secondary_description": "",
      "sells_for": 1,
      "name": "10 of Hearts",
      "enhancement": "Default Base"
    },
    {
      "facing": "back",
      "cost": 1,
      "type": "deck",
      "main_description": "+10 chips +30 extra chips",
      "secondary_description": "",
      "sells_for": 1,
      "name": "10 of Diamonds",
      "enhancement": "Bonus"
    },
    {
      "facing": "back",
      "cost": 1,
      "type": "deck",
      "main_description": "+10 chips +4 Mult",
      "secondary_description": "",
      "sells_for": 1,
      "name": "King of Diamonds",
      "enhancement": "Mult"
    },
    {
      "facing": "back",
      "cost": 1,
      "type": "deck",
      "main_description": "+10 chips",
      "secondary_description": "",
      "sells_for": 1,
      "name": "Jack of Diamonds",
      "enhancement": "Default Base"
    },
    {
      "facing": "back",
      "cost": 1,
      "type": "deck",
      "main_description": "+3 chips",
      "secondary_description": "",
      "sells_for": 1,
      "name": "3 of Spades",
      "enhancement": "Default Base"
    },
    {
      "facing": "back",
      "cost": 1,
      "type": "deck",
      "main_description": "+3 chips",
      "secondary_description": "Retrigger this card 1 time",
      "sells_for": 1,
      "name": "3 of Diamonds",
      "enhancement": "Default Base"
    },
    {
      "facing": "back",
      "cost": 1,
      "type": "deck",
      "main_description": "+2 chips",
      "secondary_description": "",
      "sells_for": 1,
      "name": "2 of Hearts",
      "enhancement": "Default Base"
    },
    {
      "facing": "back",
      "cost": 1,
      "type": "deck",
      "main_description": "+10 chips",
      "secondary_description": "",
      "sells_for": 1,
      "name": "King of Spades",
      "enhancement": "Default Base"
    },
    {
      "facing": "back",
      "cost": 1,
      "type": "deck",
      "main_description": "+9 chips $3 if this card is held in hand at end of round",
      "secondary_description": "Retrigger this card 1 time",
      "sells_for": 1,
      "name": "9 of Diamonds",
      "enhancement": "Gold Card"
    },
    {
      "facing": "back",
      "cost": 1,
      "type": "deck",
      "main_description": "+5 chips",
      "secondary_description": "",
      "sells_for": 1,
      "name": "5 of Hearts",
      "enhancement": "Default Base"
    },
    {
      "facing": "back",
      "cost": 1,
      "type": "deck",
      "main_description": "+10 chips",
      "secondary_description": "",
      "sells_for": 1,
      "name": "Jack of Spades",
      "enhancement": "Default Base"
    },
    {
      "facing": "back",
      "cost": 1,
      "type": "deck",
      "main_description": "+8 chips",
      "secondary_description": "",
      "sells_for": 1,
      "name": "8 of Spades",
      "enhancement": "Default Base"
    },
    {
      "facing": "back",
      "cost": 1,
      "type": "deck",
      "main_description": "+3 chips",
      "secondary_description": "",
      "sells_for": 1,
      "name": "3 of Clubs",
      "enhancement": "Default Base"
    },
    {
      "facing": "back",
      "cost": 1,
      "type": "deck",
      "main_description": "+11 chips",
      "secondary_description": "",
      "sells_for": 1,
      "name": "Ace of Spades",
      "enhancement": "Default Base"
    },
    {
      "facing": "back",
      "cost": 1,
      "type": "deck",
      "main_description": "+9 chips",
      "secondary_description": "",
      "sells_for": 1,
      "name": "9 of Hearts",
      "enhancement": "Default Base"
    },
    {
      "facing": "back",
      "cost": 1,
      "type": "deck",
      "main_description": "+10 chips",
      "secondary_description": "",
      "sells_for": 1,
      "name": "King of Diamonds",
      "enhancement": "Default Base"
    },
    {
      "facing": "back",
      "cost": 1,
      "type": "deck",
      "main_description": "+9 chips",
      "secondary_description": "",
      "sells_for": 1,
      "name": "9 of Diamonds",
      "enhancement": "Default Base"
    },
    {
      "facing": "back",
      "cost": 1,
      "type": "deck",
      "main_description": "+6 chips X1.5 Mult while this card stays in hand",
      "secondary_description": "",
      "sells_for": 1,
      "name": "6 of Hearts",
      "enhancement": "Steel Card"
    },
    {
      "facing": "back",
      "cost": 1,
      "type": "deck",
      "main_description": "+6 chips",
      "secondary_description": "",
      "sells_for": 1,
      "name": "6 of Spades",
      "enhancement": "Default Base"
    },
    {
      "facing": "back",
      "cost": 1,
      "type": "deck",
      "main_description": "+8 chips",
      "secondary_description": "",
      "sells_for": 1,
      "name": "8 of Clubs",
      "enhancement": "Default Base"
    },
    {
      "facing": "back",
      "cost": 1,
      "type": "deck",
      "main_description": "+3 chips",
      "secondary_description": "",
      "sells_for": 1,
      "name": "3 of Diamonds",
      "enhancement": "Default Base"
    },
    {
      "facing": "back",
      "cost": 1,
      "type": "deck",
      "main_description": "+10 chips X1.5 Mult while this card stays in hand",
      "secondary_description": "",
      "sells_for": 1,
      "name": "Jack of Diamonds",
      "enhancement": "Steel Card"
    },
    {
      "facing": "back",
      "cost": 1,
      "type": "deck",
      "main_description": "+5 chips",
      "secondary_description": "",
      "sells_for": 1,
      "name": "5 of Hearts",
      "enhancement": "Default Base"
    }
  ],
  "owned_vouchers": [
    "v_magic_trick",
    "v_blank",
    "v_overstock_norm",
    "v_wasteful"
  ],
  "max_consumeables": 2,
  "round_number": 13,
  "state": "SELECTING_HAND",
  "tags": [],
  "blind_info": {
    "Boss": {
      "state": "Current",
      "chips_needed": 70000,
      "reward": 5,
      "boss_description": "First hand is drawn face down"
    },
    "Small": {
      "tag": "Top-up Tag",
      "tag_description": "Create up to 2 Common Jokers (Must have room)",
      "chips_needed": 35000,
      "reward": 3,
      "state": "Defeated"
    },
    "Big": {
      "tag": "Ethereal Tag",
      "tag_description": "Gives a free Spectral Pack",
      "chips_needed": 52500,
      "reward": 4,
      "state": "Defeated"
    }
  },
  "hands_left": 4,
  "discards_left": 0,
  "hand": [
    {
      "facing": "front",
      "cost": 1,
      "type": "hand",
      "main_description": "+11 chips",
      "secondary_description": "",
      "sells_for": 1,
      "name": "Ace of Hearts",
      "enhancement": "Default Base"
    },
    {
      "facing": "front",
      "cost": 1,
      "type": "hand",
      "main_description": "+10 chips",
      "secondary_description": "",
      "sells_for": 1,
      "name": "10 of Hearts",
      "enhancement": "Default Base"
    },
    {
      "facing": "front",
      "cost": 1,
      "type": "hand",
      "main_description": "+10 chips",
      "secondary_description": "",
      "sells_for": 1,
      "name": "Queen of Diamonds",
      "enhancement": "Default Base"
    },
    {
      "facing": "front",
      "cost": 1,
      "type": "hand",
      "main_description": "+10 chips",
      "secondary_description": "",
      "sells_for": 1,
      "name": "Queen of Diamonds",
      "enhancement": "Default Base"
    },
    {
      "facing": "front",
      "cost": 1,
      "type": "hand",
      "main_description": "+7 chips",
      "secondary_description": "",
      "sells_for": 1,
      "name": "7 of Diamonds",
      "enhancement": "Default Base"
    },
    {
      "facing": "front",
      "cost": 1,
      "type": "hand",
      "main_description": "+7 chips",
      "secondary_description": "",
      "sells_for": 1,
      "name": "7 of Diamonds",
      "enhancement": "Default Base"
    },
    {
      "facing": "front",
      "cost": 1,
      "type": "hand",
      "main_description": "+4 chips +4 Mult",
      "secondary_description": "",
      "sells_for": 1,
      "name": "4 of Diamonds",
      "enhancement": "Mult"
    }
  ],
  "dollars": 1,
  "jokers": [
    {
      "edition": "polychrome",
      "facing": "front",
      "cost": 0,
      "rarity": 1,
      "type": "joker",
      "main_description": "+10 Mult if played hand contains a Flush",
      "secondary_description": "X1.5 Mult",
      "sells_for": 4,
      "name": "Droll Joker",
      "enhancement": "Droll Joker"
    },
    {
      "facing": "front",
      "cost": 4,
      "rarity": 1,
      "type": "joker",
      "main_description": "Retrigger first played card used in scoring 2 additional times",
      "secondary_description": "",
      "sells_for": 2,
      "name": "Hanging Chad",
      "enhancement": "Hanging Chad"
    },
    {
      "facing": "front",
      "cost": 4,
      "rarity": 1,
      "type": "joker",
      "main_description": "+1 hand size",
      "secondary_description": "",
      "sells_for": 2,
      "name": "Juggler",
      "enhancement": "Juggler"
    },
    {
      "facing": "front",
      "cost": 7,
      "rarity": 3,
      "type": "joker",
      "main_description": "+250 Chips, -2 hand size",
      "secondary_description": "",
      "sells_for": 3,
      "name": "Stuntman",
      "enhancement": "Stuntman"
    }
  ],
  "ante": 7,
  "boss_blind_disabled": false,
  "chips": 0,
  "consumeables": [
    {
      "facing": "front",
      "cost": 3,
      "type": "joker",
      "main_description": "Destroys up to 2 selected cards",
      "secondary_description": "",
      "sells_for": 1,
      "name": "The Hanged Man",
      "enhancement": "The Hanged Man"
    }
  ],
  "max_jokers": 5,
  "seed": "422J6NUH",
  "can_reroll_boss": false,
  "played_hands": [
    {
      "ante": 1,
      "blind": "Boss",
      "chips_earned": 162,
      "hand_name": "Two Pair"
    },
    {
      "ante": 1,
      "blind": "Boss",
      "chips_earned": 336,
      "hand_name": "Pair"
    },
    {
      "ante": 1,
      "blind": "Boss",
      "chips_earned": 90,
      "hand_name": "Pair"
    },
    {
      "ante": 1,
      "blind": "Boss",
      "chips_earned": 24,
      "hand_name": "High Card"
    },
    {
      "ante": 2,
      "blind": "Big Blind",
      "chips_earned": 5158.5,
      "hand_name": "Flush"
    },
    {
      "ante": 2,
      "blind": "Boss",
      "chips_earned": 4032,
      "hand_name": "Flush"
    },
    {
      "ante": 3,
      "blind": "Boss",
      "chips_earned": 4531.5,
      "hand_name": "Flush"
    },
    {
      "ante": 4,
      "blind": "Big Blind",
      "chips_earned": 18150,
      "hand_name": "Flush"
    },
    {
      "ante": 4,
      "blind": "Boss",
      "chips_earned": 17991.75,
      "hand_name": "Flush"
    },
    {
      "ante": 4,
      "blind": "Boss",
      "chips_earned": 4230,
      "hand_name": "Two Pair"
    },
    {
      "ante": 5,
      "blind": "Big Blind",
      "chips_earned": 19355.25,
      "hand_name": "Flush"
    },
    {
      "ante": 5,
      "blind": "Boss",
      "chips_earned": 9936,
      "hand_name": "Flush"
    },
    {
      "ante": 5,
      "blind": "Boss",
      "chips_earned": 13020,
      "hand_name": "Flush"
    },
    {
      "ante": 6,
      "blind": "Big Blind",
      "chips_earned": 7938,
      "hand_name": "Flush"
    },
    {
      "ante": 6,
      "blind": "Big Blind",
      "chips_earned": 432,
      "hand_name": "High Card"
    },
    {
      "ante": 6,
      "blind": "Big Blind",
      "chips_earned": 236.25,
      "hand_name": "High Card"
    },
    {
      "ante": 6,
      "blind": "Big Blind",
      "chips_earned": 23740.5,
      "hand_name": "Flush"
    },
    {
      "ante": 6,
      "blind": "Boss",
      "chips_earned": 8028,
      "hand_name": "Flush"
    },
    {
      "ante": 6,
      "blind": "Boss",
      "chips_earned": 6487.5,
      "hand_name": "Flush"
    },
    {
      "ante": 6,
      "blind": "Boss",
      "chips_earned": 7288.5,
      "hand_name": "Flush"
    },
    {
      "ante": 6,
      "blind": "Boss",
      "chips_earned": 18892.5,
      "hand_name": "Flush"
    },
    {
      "ante": 7,
      "blind": "Small Blind",
      "chips_earned": 47994,
      "hand_name": "Flush"
    },
    {
      "ante": 7,
      "blind": "Big Blind",
      "chips_earned": 31122,
      "hand_name": "Flush"
    },
    {
      "ante": 7,
      "blind": "Big Blind",
      "chips_earned": 8160.75,
      "hand_name": "High Card"
    },
    {
      "ante": 7,
      "blind": "Big Blind",
      "chips_earned": 9447.75,
      "hand_name": "High Card"
    },
    {
      "ante": 7,
      "blind": "Big Blind",
      "chips_earned": 7985.25,
      "hand_name": "High Card"
    }
  ]
}

As I was interested in the agents' decision making abilities, and not their ability to navigate and click around the game's UI to do things, I elected to mod the game to stringify the state as JSON, rather than working with the agents through screenshots. Parts of the mod code are available in the experiment-code repository for your reference, though I have not had a chance to formally turn it into a Steamodded mod.

The experimental setup was as follows:

  • I had models from OpenAI, Claude, and Google each play 10 games using the same set of 10 run seeds, to reduce the effect of run-to-run variance in the comparison.
  • The maximum thinking effort setting for each model was used.
  • I gave a system prompt that explained the general rules and flow of the game, which can be viewed here. The system prompt does not provide any information on specific instances of game objects like jokers, cards, consumable, or bosses, and does not give any strategies for playing the game.
  • Because the rules for scoring hands can involve very long sequences of cards triggering compounding special effects, I helped the agent gauge the amount of chips it might earn by playing a hand by providing a history of the last 7 hands played. This backwards looking memory of recent hands played is similar in feeling to how a human player gauges how "strong" their next played hand might be.
  • In addition to the current state, I also provided the state, action taken, and agent's reasoning for the last 3 turns, to help maintain some tactical continuity across turns.
  • All games were played on White Stake (base/easiest difficulty) and Red Deck (first deck available in the game). Not the biggest challenge in the world, I know.

Here are the results, including the average cost per turn for each model:

Seed Cost per turn 422J6NUH PE93PR87 H1Z58VZ6 U15QWQGE 5U4K91D8 2JW2IH4X 22V4E3UE M8KADSBQ M3F6LPGY 84N5GIYF
claude-4.5-sonnet $0.058 Ante 5 Ante 3 Ante 5 Ante 4 Ante 6 Ante 6 Ante 6 Ante 5 Ante 2 Ante 4
gpt-5.2 $0.034 Ante 7 WON Ante 6 WON Ante 7 Ante 5 Ante 5 Ante 5 Ante 4 WON
gemini-3-flash $0.008 WON WON WON WON WON WON Ante 7 WON Ante 4 WON

Select a run from the table above to view the turn-by-turn history and agent reasoning of that run.

Turn 1 of 1
Game state screenshot

Agent Reasoning

Loading...

An unexpected outcome

Notably, gemini-3-flash trounces both claude-4.5-sonnet and gpt-5.2, despite the latter two being bigger and more expensive. I originally planned to compare the "default" model tier for each company, and given that I started with Claude (and was seeing the above performances), was even in the process of implementing Reflexion , AlphaEvolve and Voyager-inspired approaches to help the agent develop for itself a set of useful gameplay policies. However, when by chance I swapped to gemini-3-flash for a quick point of comparison, I was shocked to see that it crushed its first run while claude-4.5-sonnet was struggling to reliably get to Ante 7. Not only that, it had built a deck of nearly entirely clubs cards by the end of the run, showing that its strategy was highly coherent throughout the 284 turn run.

Balatro win

gemini-3-flash wins the game with a deck full of clubs.

Turn 9 of the gemini run on seed 5U4K91D8 is what tipped me off to what was happening. In its reasoning, gemini mentions offhandedly that the Blank Voucher, although it says "Does nothing?" as the textual description of its effect (which is what the agent sees), actually unlocks the Antimatter Voucher for later purchase in the run, which allows the player to have an extra joker slot. The thing is none of this information is in the system prompt or the state passed to the model.

Scrolling through the turn-by-turn history of gemini runs, it becomes clear that gemini simply knew more about the game than the other two models. As a result, it also had some gameplay habits that could technically be arrived at by reasoning alone, but the other models could not reliably replicate. It:

Because these are idiosyncrasies of Balatro but not poker, the other two models had to rely on sheer comprehension and reasoning to play the game, and to an extent had to forget their knowledge of poker that was being induced by the overall vocabulary of the game, to perform effectively.

Why Google is going to win the AI race

Given how big the performance gap is, and the clues that point to the fact that it is knowledge based, not reasoning based, the source of the gap must be the pretraining data mix. When I originally started the agent development, I searched for a textual guide (i.e. on Reddit) to see if I could get Claude to win if I gave it a manually written strategy. While written coherent written guides that went beyond individual tips and tricks were hard to find, Youtube content of this sort was abundant, probably outnumbering written content for Balatro by at least 10x.

And not just for Balatro, Google possesses, through Youtube, high fidelity multi-modal content on pretty much any topic you can imagine, let alone the user-action data emitted by the rest of its portfolio, from Gmail to Gsuite to Maps to the search engine itself. Among the major foundation model companies, Google alone (since Microsoft is not in this race) has products in which its existing users live, work, and play, a massive advantage over model builders that are starting from scraping the internet, with no existing product portfolio to act as data source and distribution.

The Bitter Lesson-adjacent truth, highlighted by this experiment, is that the team that has better data is going to have better results for 10x cheaper inference than the team that must make up for this gap with more clever reasoning techniques, and Google is just much more likely to have any particular topic be in distribution than the other model companies.

What's next?

Admittedly this initial experiment was done at the easiest difficulty setting, and with a single deck. As far as Balatro goes, we've only made it past the lobby with LLMs. There are 7 more difficulties, which add challenges like:

Achieving good win rates on higher difficulties may require the Reflexion/Voyager/AlphaEvolve-style approaches that were unnecesary at base difficulty.

Moreover, Balatro remains an interesting test bed to study long range reasoning in LLMs, and it has an extensive community of modders who are constantly creating new game mechanics that are out-of-distribution even for Gemini. In general, games are great tests of reasoning and long-term planning, and creating a general game-playing framework for benchmarking LLMs is a promising direction for the future.