I think the article is somewhat over-representing the difficulty here. Once you're at the team selection screen and choosing your lineup, there are only 15 possible combinations to choose from. Once you factor in that many/most teams are designed around one or two specific synergies, and that your opponent's team is only partially known (you see their Pokemon species but not the moves, stat distributions, etc), which puts huge error bars around whatever prediction you're trying to make, it usually turns out that you're really only picking from 1-3 realistic choices, and there's a very paper-scissors-rock nature to it that you can't really "learn" in the ML sense.
I think you could have gotten equivalent results on such a predictor using much simpler regressions and/or heuristics, once you've already fixed the matchup.
(Also, I just think it's funny how the paper keeps citing "(Zheng, 2020)", etc, like it's a scholarly article or something. Aaron Zheng is a VGC YouTuber and what is being cited is just an online guide a la GameFAQs)
The soft prediction metric seems especially ridiculous to me. If I'm not mistaken, just picking at random gets better results than their ML selection at >= 5 predictions (1-(2/3)*5 > 0.8438).
However:
> your opponent's team is only partially known (you see their Pokemon species but not the moves, stat distributions, etc)
That's not true in the main competitive live format (e.g. NAIC 2025 which is the main case study here). These tournaments are "open team sheet", aka. moves, ability and held items are known (but not IVs/EVs).
I'm not sure whether this is the case on Smogon though, which means they might even be mixing two completely different datasets...
> but not IVs/EVs
And even then these can be guessed or even inferred using previous battles as an indicator.
> Once you're at the team selection screen and choosing your lineup, there are only 15 possible combinations to choose from.
Nit: there are 15 possible lineups (i.e. combinations of 2 pokemons to start the battle with) but there are 90 possible teams if you also factor in the other 2 pokemons in the back.
Most of my experience is with pre team preview singles (where there was an entirely different meta of blindly choosing a lead that would match up favorably against the set of other common leads), but my understanding was that VGC has a handful of Pokemon (Smeargle...) with a P_lead/P_bring ratio of 1.
Oh Yes, I am so excited seeing this!
I have recently started watching a lot of WolfeyVGC and so the graph of incineroar being the most used etc. are so true.
There are a lot of other things that smogon does like the best hacked pokemon (ie. you can get abilities / movesets but not anything else, and some are banned like wonder guard) and there blissy with the transform ability is the strongest.
Honestly, Pokemon VGC isn't that balanced. Incineroar / IIRC before it, there was thundrous. But still its decently balanced that the game works. WolfeyVGC is an absolute delight to watch!
I think even though there are limited choices given the teams the problem of learning these teams are interesting given the sheer variety of possibly teams. A good model would probably need to be learn something useful about competitiveness Pokémon.
I might try my hand at this problem using the open sheet format for more data.