Physicist, Startup Founder, Blogger, Dad

Wednesday, December 06, 2017

AlphaZero: learns via self-play, surpasses best humans and machines at chess

AlphaZero taught itself chess through 4 hours of self-play, surpassing the best humans and the best (old-style) chess programs in the world.
Chess24: 20 years after DeepBlue defeated Garry Kasparov in a match, chess players have awoken to a new revolution. The AlphaZero algorithm developed by Google and DeepMind took just four hours of playing against itself to synthesise the chess knowledge of one and a half millennium and reach a level where it not only surpassed humans but crushed the reigning World Computer Champion Stockfish 28 wins to 0 in a 100-game match. All the brilliant stratagems and refinements that human programmers used to build chess engines have been outdone, and like Go players we can only marvel at a wholly new approach to the game. ...
ArXiv preprint:
Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

The game of chess is the most widely-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. In contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go, by tabula rasa reinforcement learning from games of self-play. In this paper, we generalise this approach into a single AlphaZero algorithm that can achieve, tabula rasa, superhuman performance in many challenging domains. Starting from random play, and given no domain knowledge except the game rules, AlphaZero achieved within 24 hours a superhuman level of play in the games of chess and shogi (Japanese chess) as well as Go, and convincingly defeated a world-champion program in each case.
Finally, we analysed the chess knowledge discovered by AlphaZero. Table 2 analyses the most common human openings (those played more than 100,000 times in an online database of human chess games (1)). Each of these openings is independently discovered and played frequently by AlphaZero during self-play training. When starting from each human opening, AlphaZero convincingly defeated Stockfish, suggesting that it has indeed mastered a wide spectrum of chess play.

No comments:

Blog Archive