<Mastering the game of Go without human webofknowledgee>这篇新发的NATURE谁有原文分享的吗?

When AI can think, 人工智使许多人失业!!
已有 734 次阅读
|个人分类:|系统分类:
重新启动AI革命: Yuval Noah Harari认为,由于人工智能使许多人失业,我们必须建立新的经济,社会和教育制度。第二个文艺复兴时期: 伊恩·戈尔丁(Ian Goldin)呼吁科学家帮助社会迎接破坏性的变革。As artificial intelligence puts many out of work, we must forge new economic, social and educational systems, argues Yuval Noah Harari.Ian Goldin calls on scientists to help society to weather the disruptive transformations afoot.As automation changes employment, researchers should gather the evidence to help map the implications.** A long-standing goal of artificial intelligence is an algorithm that learns, tabula rasa, superhuman proficiency in challenging domains. Recently, AlphaGo became the first program to defeat a world champion in the game of Go. The tree search in AlphaGo evaluated positions and selected moves using deep neural networks. These neural networks were trained by supervised learning from human expert moves, and by reinforcement learning from self-play. Here we introduce an algorithm based solely on reinforcement learning, without human data, guidance or domain knowledge beyond game rules. AlphaGo becomes its own teacher: a neural network is trained to predict AlphaGo’s own move selections and also the winner of AlphaGo’s games. This neural network improves the strength of the tree search, resulting in higher quality move selection and stronger self-play in the next iteration. Starting tabula rasa, our new program AlphaGo Zero achieved superhuman performance, winning 100–0 against the previously published, champion-defeating AlphaGo.Subject terms:••,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, , & , Journal name:NatureVolume:550,Pages:354–359Date published:(19 October 2017)DOI:doi:10.1038/nature24270** From Wikipedia, the free encyclopedia AlphaGo logoAlphaGo is a
that plays the . It was developed by 's
in London in October 2015.It became the first
program to beat a human
on a full-sized 19×19 board. In March 2016, it beat
in , the first time a computer Go program has beaten a
professional without handicaps. Although it lost to Lee Sedol in the fourth game, Lee resigned the final game, giving a final score of 4 games to 1 in favour of AlphaGo. In recognition of the victory, AlphaGo was awarded an honorary 9-dan by the . The lead up and the challenge match with Lee Sedol were documented in a documentary film of the same name, directed by Greg Kohs. It was chosen by
as one of the
runners-up on 22 December 2016.At the 2017 , AlphaGo beat , the world No.1 ranked player at the time, in a . After this, AlphaGo was awarded professional 9-dan by the . After the match between AlphaGo and Ke Jie, AlphaGo retired while DeepMind continues AI research in other areas.AlphaGo uses a
algorithm to find its moves based on knowledge previously &learned& by , specifically by an
method) by extensive training, both from human and computer play.[]Go is considered much more difficult for computers to win than other games such as , because its much larger
makes it prohibitively difficult to use traditional AI methods such as ,
search.Almost two decades after IBM's computer
beat world chess champion
in the , the strongest Go programs using
techniques only reached about
level, and still could not beat a professional Go player without . In 2012, the software program , running on a four PC cluster, beat
() two times at five and four stones handicap. In 2013,
(9p) at four-stones handicap.According to AlphaGo's David Silver, the AlphaGo research project was formed around 2014 to test how well a neural network using
can compete at Go. AlphaGo represents a significant improvement over previous Go programs. In 500 games against other available Go programs, including Crazy Stone and Zen, AlphaGo running on a single computer won all but one. In a similar matchup, AlphaGo running on multiple computers won all 500 games played against other Go programs, and 77% of games played against AlphaGo running on a single computer. The distributed version in October 2015 was using 1,202
and 176 .[]In October 2015, the distributed version of AlphaGo defeated the European
champion , a
(out of 9 dan possible) professional, five to zero. This was the first time a computer Go program had beaten a professional human player on a full-sized board without handicap. The announcement of the news was delayed until 27 January 2016 to coincide with the publication of a paper in the journal
describing the algorithms used.[]AlphaGo played South Korean professional Go player , ranked 9-dan, one of the best players at Go,[] with five games taking place at the
in , South Korea on 9, 10, 12, 13, and 15 March 2016, which were video-streamed live. Aja Huang, a DeepMind team member and amateur 6-dan Go player, placed stones on the
for AlphaGo, which ran through Google's cloud computing with its servers located in the United States. The match used
with a 7.5-point , and each side had two hours of thinking time plus three 60-second
periods. The version of AlphaGo playing against Lee used a similar amount of computing power as was used in the Fan Hui match. reported that it used 1,920 CPUs and 280 GPUs.At the time of play, Lee Sedol had the second-highest number of Go international championship victories in the world. While there is no single official method of , some sources ranked Lee Sedol as the fourth-best player in the world at the time. AlphaGo was not specifically trained to face Lee.The first three games were won by AlphaGo following resignations by Lee. However, Lee beat AlphaGo in the fourth game, winning by resignation at move 180. AlphaGo then continued to achieve a fourth win, winning the fifth game by resignation.The prize was US$1 million. Since AlphaGo won four out of five and thus the series, the prize will be donated to charities, including . Lee Sedol received $150,000 for participating in all five games and an additional $20,000 for his win.In June 2016, at a presentation held at a university in the Netherlands, Aja Huang, one of the Deep Mind team, revealed that it had rectified the problem that occurred during the 4th game of the match between AlphaGo and Lee, and that after move 78 (which was dubbed the &hand of God& by many professionals), it would play accurately and maintain Black's advantage, since before the error which resulted in the loss, AlphaGo was leading throughout the game and Lee's move was not credited as the one which won the game, but caused the program's computing powers to be diverted and confused. Huang explained that AlphaGo's policy network of finding the most accurate move order and continuation did not precisely guide AlphaGo to make the correct continuation after move 78, since its value network did not determine Lee's 78th move as being the most likely, and therefore when the move was made AlphaGo could not make the right adjustment to the logical continuation.[]Main article: On 29 December 2016 a new account named &Magist& from South Korea began to play games with professional players on the
server. It changed its account name to &Master& on 30 December, then moved to the FoxGo server on 1 January 2017. On 4 January, DeepMind confirmed that the &Magister& and the &Master& were both played by an updated version of AlphaGo. As of 5 January 2017, AlphaGo's online record was 60 wins and 0 losses, including three victories over Go's top ranked player, , who had been quietly briefed in advance that Master was a version of AlphaGo. After losing to Master,
offered a bounty of 100,000
(US$14,400) to the first human player who could defeat Master. Master played at the pace of 10 games per day. Many quickly suspected it to be an AI player due to little or no resting between games. Its adversaries included many world champions such as , , , , , , , Li Qincheng, , , Tang Weixing, , , , , , , , national champions or world championship runners-up such as , , Meng Tailing, Dang Yifei, Huang Yunsong, , Gu Zihao, Shin Jinseo, , and An Sungjoon. All 60 games except one were fast-paced games with three 20 or 30 seconds . Master offered to extend the byo-yomi to one minute when playing with
in consideration of his age. After winning its 59th game Master revealed itself in the chatroom to be controlled by Dr.
of the DeepMind team, then changed its nationality to United Kingdom. After these games were completed, the co-founder of Google DeepMind,
said in a tweet &we're looking forward to playing some official, full-length games later [2017] in collaboration with Go organizations and experts&.Go experts were extremely impressed by AlphaGo's performance and by its Ke Jie stated that &After humanity spent thousands of years improving our tactics, computers tell us that humans are completely wrong... I would go as far as to say not a single human has touched the edge of the truth of Go.&[]In May 2017, AlphaGo played three games with Ke Jie, the world No.1 ranked player, and two exhibition games with several top Chinese professionals in :A best of 3 match versus world number 1,
(23, 25 & 27 May)AlphaGo versus a collaborating team of top Chinese professionalsPair Go: human plus AlphaGo versus human plus AlphaGo offered 1.5 million dollars winner prizes for the
match between Ke Jie and AlphaGo while the losing side took 300,000 dollars. AlphaGo won all three games against Ke Jie. AlphaGo was awarded professional 9-dan by Chinese Weiqi Association.After winning its three-game match against Ke Jie, the world’s top Go player, AlphaGo is retiring. DeepMind is disbanding the team that worked on the game while continuing AI research in other areas. After the Summit, Deepmind published 50 full length AlphaGo vs AlphaGo matches, as a gift to the Go community.[]An early version of AlphaGo was tested on hardware with various numbers of
and , running in asynchronous or distributed mode. Two seconds of thinking time was given to each move. The resulting
are listed below. In the matches with more time per move higher ratings are achieved.Configuration and performanceSinglep. 10–11404812,181Single404822,738Single404842,850Single404882,89012428642,937Distributed247641123,079Distributed401,2021763,140Distributed641,9202803,168In May 2016, Google unveiled its own proprietary hardware &&, which it stated had already been deployed in multiple internal projects at Google, including the AlphaGo match against Lee Sedol.In the
in May 2017, DeepMind disclosed that the version of AlphaGo used in this Summit was , and revealed that it had measured the strength of different versions of the software. AlphaGo Lee, the version used against Lee, could give AlphaGo Fan, the version used in AlphaGo vs. Fan Hui, three stones, and AlphaGo Master was three-stone stronger further.Configuration and strengthAlphaGo FanDistributednearly 3,0005:0 against 50 , distributedabout 3,7504:1 against single machine with
v2about 4,75060:0 against []As of 2016, AlphaGo's algorithm uses a combination of
techniques, combined with extensive training, both from human and computer play. It uses , guided by a &value network& and a &policy network,& both implemented using
technology. A limited amount of game-specific feature detection pre-processing (for example, to highlight whether a move matches a
pattern) is applied to the input before it is sent to the neural networks.The system's neural networks were initially bootstrapped from human gameplay expertise. AlphaGo was initially trained to mimic human play by attempting to match the moves of expert players from recorded historical games, using a database of around 30 million moves. Once it had reached a certain degree of proficiency, it was trained further by being set to play large numbers of games against other instances of itself, using
to improve its play. To avoid &disrespectfully& wasting its opponent's time, the program is specifically programmed to resign if its assessment of win probability falls beneath for the match against Lee, the resignation threshold was set to 20%.[]Toby Manning, the match referee for AlphaGo vs. Fan Hui, has described the program's style as &conservative&. AlphaGo's playing style strongly favours greater probability of winning by fewer points over lesser probability of winning by more points. Its strategy of maximising its probability of winning is distinct from what human players tend to do which is to maximise territorial gains, and explains some of its odd-looking moves. It makes a lot of opening moves that have never or seldom been made by humans, while avoiding many second-line opening moves that human players like to make. It likes to use , especially if the opponent is over concentrated.[][][]AlphaGo's March 2016 victory was a major milestone in artificial intelligence research. Go had previously been regarded as a hard problem in machine learning that was expected to be out of reach for the technology of the time. Most experts thought a Go program as powerful as AlphaGo was at l some experts thought that it would take at least another decade before computers would beat Go champions. Most observers at the beginning of the 2016 matches expected Lee to beat AlphaGo.With games such as checkers (that has been && by the
team), chess, and now Go won by computers, victories at popular board games can no longer serve as major milestones for artificial intelligence in the way that they used to. 's
called AlphaGo's victory &the end of an era... board games are more or less done and it's time to move on.&When compared with Deep Blue or with , AlphaGo's underlying algorithms are potentially more general-purpose, and may be evidence that the scientific community is making progress towards . Some commentators believe AlphaGo's victory makes for a good opportunity for society to start discussing preparations for the possible future impact of . (As noted by entrepreneur Guy Suter, AlphaGo itself only knows how to play Go, and doesn't possess general purpose intelligence: &[It] couldn't just wake up one morning and decide it wants to learn how to use firearms&) In March 2016, AI researcher
stated that &AI methods are progressing much faster than expected, (which) makes the question of the long-term outcome more urgent,& adding that &in order to ensure that increasingly powerful AI systems remain completely under human control... there is a lot of work to do.& Some scholars, such as , warned (in May 2015 before the matches) that some future self-improving AI could gain actual general intelligence, leadi other scholars disagree: AI expert Jean-Gabriel Ganascia believes that &Things like ''... may never be reproducible&, and says &I don't see why we would speak about fears. On the contrary, this raises hopes in many domains such as health and space exploration.& Computer scientist
said &I don't think people should be scared... but I do think people should be paying attention.&[]Go is a popular game in China, Japan and Korea, and the 2016 matches were watched by perhaps a hundred million people worldwide. Many top Go players characterized AlphaGo's unorthodox plays as seemingly-questionable moves that initially befuddled onlookers, but made sense in hindsight: &All but the very best Go players craft their style by imitating top players. AlphaGo seems to have totally original moves it creates itself.& AlphaGo appeared to have unexpectedly become much stronger, even when compared with its October 2015 match where a computer had beat a Go professional for the first time ever without the advantage of a handicap. The day after Lee's first defeat, Jeong Ahram, the lead Go correspondent for one of South Korea’s biggest daily newspapers, said &Last night was very gloomy... Many people drank alcohol.& The , the organization that oversees Go professionals in South Korea, awarded AlphaGo an honorary 9-dan title for exhibiting creative skills and pushing forward the game's progress.China's , an 18-year-old generally recognized as the world's best Go player, initially claimed that he would be able to beat AlphaGo, but declined to play against it for fear that it would &copy my style&. As the matches progressed, Ke Jie went back and forth, stating that &it is highly likely that I (could) lose& after analysing the first three matches, but regaining confidence after AlphaGo displayed flaws in the fourth match.Toby Manning, the referee of AlphaGo's match against Fan Hui, and Hajin Lee, secretary general of the , both reason that in the future, Go players will get help from computers to learn what they have done wrong in games and improve their skills.After game two, Lee said he felt &speechless&: &From the very beginning of the match, I could never manage an upper hand for one single move. It was AlphaGo's total victory.& Lee apologized for his losses, stating after game three that &I misjudged the capabilities of AlphaGo and felt powerless.& He emphasized that the defeat was &Lee Se-dol's defeat& and &not a defeat of mankind&. Lee said his eventual loss to a machine was &inevitable& but stated that &robots will never understand the beauty of the game the same way that we humans do.& Lee called his game four victory a &priceless win that I (would) not exchange for anything.&[] has also been working on their own Go-playing system , also based on combining machine learning and . Although a strong player against other computer Go programs, as of early 2016, it had not yet defeated a professional human player. Darkforest has lost to CrazyStone and Zen and is estimated to be of similar strength to CrazyStone and Zen., a system developed with support from video-sharing website
and the , lost 2–1 in November 2016 to Go master , who holds the record for the largest number of Go title wins in Japan.Heritage** Epilogue **__If you can’t do something smart do something right. — Shepherd Book__The blog posts of my engaging, practical, and personal stories, I hope, make your day a little more pleasant.Unless marked as sponsor/advertiser, I do not receive compensation for any of my posts or my distribution of referenced articles (solely for cross-references and for my memory of credits to those inspired me to write) - so I have no conflict of interest - freedom to speak, free to re-distribute at your will - no need to ask me.
转载本文请联系原作者获取授权,同时请注明本文来自李胜文科学网博客。链接地址:
上一篇:下一篇:
当前推荐数:0
评论 ( 个评论)
扫一扫,分享此博文
作者的精选博文
作者的其他最新博文
热门博文导读
Powered by
Copyright &君,已阅读到文档的结尾了呢~~
扫扫二维码,随身浏览文档
手机或平板扫扫即可继续访问
nature16961-Mastering the game of Go with deep neural networks and tree search
举报该文档为侵权文档。
举报该文档含有违规或不良信息。
反馈该文档无法正常浏览。
举报该文档为重复文档。
推荐理由:
将文档分享至:
分享完整地址
文档地址:
粘贴到BBS或博客
flash地址:
支持嵌入FLASH地址的网站使用
html代码:
&embed src='/DocinViewer--144.swf' width='100%' height='600' type=application/x-shockwave-flash ALLOWFULLSCREEN='true' ALLOWSCRIPTACCESS='always'&&/embed&
450px*300px480px*400px650px*490px
支持嵌入HTML代码的网站使用
您的内容已经提交成功
您所提交的内容需要审核后才能发布,请您等待!
3秒自动关闭窗口当前位置: &
AlphaGo进化:3天100:0碾压旧版 不使用人类知识
14:48:01 & &第一财经APP &
用微信扫描二维码分享至好友和朋友圈
为您推荐:但斌的BLOG
字号:大 中 小
今日Nature: 人工智能从0到1, 无师自通完爆阿法狗100-0 | 深度解析
知社 知社学术圈
海归学者发起的公益学术平台
去年,有个小孩读遍人世所有的棋谱,辛勤打谱,苦思冥想,棋艺精进,4-1打败世界冠军李世石,从此人间无敌手。他的名字叫阿法狗。
今年,他的弟弟只靠一副棋盘和黑白两子,没看过一个棋谱,也没有一个人指点,从零开始,自娱自乐,自己参悟,100-0打败哥哥阿法狗。他的名字叫阿法元。
DeepMind这项伟大的突破,今天以Mastering the game of Go without human
knowledge为题,发表于Nature,引起轰动。知社特邀国内外几位人工智能专家,给予深度解析和点评。文末有DeepMind
David Silver博士专访视频。特别致谢Nature和DeepMind提供讯息和资料授权。
Nature今天上线的这篇重磅论文,详细介绍了谷歌DeepMind团队最新的研究成果。人工智能的一项重要目标,是在没有任何先验知识的前提下,通过完全的自学,在极具挑战的领域,达到超人的境地。去年,阿法狗(AlphaGo)代表人工智能在围棋领域首次战胜了人类的世界冠军,但其棋艺的精进,是建立在计算机通过海量的历史棋谱学习参悟人类棋艺的基础之上,进而自我训练,实现超越。
阿法狗元棋力的增长与积分比较
可是今天,我们发现,人类其实把阿法狗教坏了! 新一代的阿法元(AlphaGo Zero),
完全从零开始,不需要任何历史棋谱的指引,更不需要参考人类任何的先验知识,完全靠自己一个人强化学习(reinforcement
learning)和参悟, 棋艺增长远超阿法狗,百战百胜,击溃阿法狗100-0。
达到这样一个水准,阿法元只需要在4个TPU上,花三天时间,自己左右互搏490万棋局。而它的哥哥阿法狗,需要在48个TPU上,花几个月的时间,学习三千万棋局,才打败人类。
这篇论文的第一和通讯作者是DeepMind的David Silver博士,
阿法狗项目负责人。他介绍说阿法元远比阿法狗强大,因为它不再被人类认知所局限,而能够发现新知识,发展新策略:
This technique is more powerful than previous versions of
AlphaGo because it is no longer constrained by the limits of human
knowledge. Instead, it is able to learn tabula rasa from the
strongest player in the world: AlphaGo itself. AlphaGo Zero also
discovered new knowledge, developing unconventional strategies and
creative new moves that echoed and surpassed the novel techniques
it played in the games against Lee Sedol and Ke Jie.
DeepMind联合创始人和CEO则说这一新技术能够用于解决诸如蛋白质折叠和新材料开发这样的重要问题:
AlphaGo Zero is now the strongest version of our program and
shows how much progress we can make even with less computing power
and zero use of human data. Ultimately we want to harness
algorithmic breakthroughs like this to help solve all sorts of
pressing real world problems like protein folding or designing new
materials.
美国的两位棋手在Nature对阿法元的棋局做了点评:它的开局和收官和专业棋手的下法并无区别,人类几千年的智慧结晶,看起来并非全错。但是中盘看起来则非常诡异:
the AI’s open&ing choices and end-game methods have converged on
ours ― seeing it arrive at our sequences from first principles
suggests that we haven’t been on entirely the wrong track. By
contrast, some of its middle-game judgements are truly
mysterious.
为更深入了解阿法元的技术细节,知社采访了美国杜克大学人工智能专家陈怡然教授。他向知社介绍说:
DeepMind最新推出的AlphaGo
Zero降低了训练复杂度,摆脱了对人类标注样本(人类历史棋局)的依赖,让深度学习用于复杂决策更加方便可行。我个人觉得最有趣的是证明了人类经验由于样本空间大小的限制,往往都收敛于局部最优而不自知(或无法发现),而机器学习可以突破这个限制。之前大家隐隐约约觉得应该如此,而现在是铁的量化事实摆在面前!
他进一步解释道:
这篇论文数据显示学习人类选手的下法虽然能在训练之初获得较好的棋力,但在训练后期所能达到的棋力却只能与原版的AlphaGo相近,而不学习人类下法的AlphaGo
Zero最终却能表现得更好。这或许说明人类的下棋数据将算法导向了局部最优(local
optima),而实际更优或者最优的下法与人类的下法存在一些本质的不同,人类实际’误导’了AlphaGo。有趣的是如果AlphaGo
Zero放弃学习人类而使用完全随机的初始下法,训练过程也一直朝着收敛的方向进行,而没有产生难以收敛的现象。
阿法元是如何实现无师自通的呢? 杜克大学博士研究生吴春鹏向知社介绍了技术细节:
之前战胜李世石的AlphaGo基本采用了传统增强学习技术再加上深度神经网络DNN完成搭建,而AlphaGo
Zero吸取了最新成果做出了重大改进。
首先,在AlphaGo Zero出现之前,基于深度学习的增强学习方法按照使用的网络模型数量可以分为两类:
一类使用一个DNN"端到端"地完成全部决策过程(比如DQN),这类方法比较轻便,对于离散动作决策更适用;
另一类使用多个DNN分别学习policy和value等(比如之前战胜李世石的AlphaGoGo),这类方法比较复杂,对于各种决策更通用。此次的AlphaGo
Zero综合了二者长处,采用类似DQN的一个DNN网络实现决策过程,并利用这个DNN得到两种输出policy和value,然后利用一个蒙特卡罗搜索树完成当前步骤选择。
其次,AlphaGo
Zero没有再利用人类历史棋局,训练过程从完全随机开始。随着近几年深度学习研究和应用的深入,DNN的一个缺点日益明显:
训练过程需要消耗大量人类标注样本,而这对于小样本应用领域(比如医疗图像处理)是不可能办到的。所以Few-shot
learning和Transfer learning等减少样本和人类标注的方法得到普遍重视。AlphaGo
Zero是在双方博弈训练过程中尝试解决对人类标注样本的依赖,这是以往没有的。
第三,AlphaGo
Zero在DNN网络结构上吸收了最新进展,采用了ResNet网络中的Residual结构作为基础模块。近几年流行的ResNet加大了网络深度,而GoogLeNet加大了网络宽度。之前大量论文表明,ResNet使用的Residual结构比GoogLeNet使用的Inception结构在达到相同预测精度条件下的运行速度更快。AlphaGo
Zero采用了Residual应该有速度方面的考虑。
杜克大学博士研究生谢知遥对此做了进一步阐述:
DeepMind的新算法AlphaGo
Zero开始摆脱对人类知识的依赖:在学习开始阶段无需先学习人类选手的走法,另外输入中没有了人工提取的特征 。
在网络结构的设计上,新的算法与之前的AlphaGo有两个大的区别。首先,与之前将走子策略(policy)网络和胜率值(value)网络分开训练不同,新的网络结构可以同时输出该步的走子策略(policy)和当前情形下的胜率值(value)。实际上
policy与value网络相当于共用了之前大部分的特征提取层,输出阶段的最后几层结构仍然是相互独立的。训练的损失函数也同时包含了policy和value两部分。这样的显然能够节省训练时间,更重要的是混合的policy与value网络也许能适应更多种不同情况。
另外一个大的区别在于特征提取层采用了20或40个残差模块,每个模块包含2个卷积层。与之前采用的12层左右的卷积层相比,残差模块的运用使网络深度获得了很大的提升。AlphaGo
Zero不再需要人工提取的特征应该也是由于更深的网络能更有效地直接从棋盘上提取特征。根据文章提供的数据,这两点结构上的改进对棋力的提升贡献大致相等。
因为这些改进,AlphaGo
Zero的表现和训练效率都有了很大的提升,仅通过4块TPU和72小时的训练就能够胜过之前训练用时几个月的原版AlphaGo。在放弃学习人类棋手的走法以及人工提取特征之后,算法能够取得更优秀的表现,这体现出深度神经网络强大的特征提取能力以及寻找更优解的能力。更重要的是,通过摆脱对人类经验和辅助的依赖,类似的深度强化学习算法或许能更容易地被广泛应用到其他人类缺乏了解或是缺乏大量标注数据的领域。
这个工作意义何在呢?人工智能专家、美国北卡罗莱纳大学夏洛特分校洪韬教授也对知社发表了看法:
我非常仔细从头到尾读了这篇论文。首先要肯定工作本身的价值。从用棋谱(supervised
learning)到扔棋谱,是重大贡献(contribution)!干掉了当前最牛的棋手(变身前的阿法狗),是advancing
state-of-the-art
。神经网络的设计和训练方法都有改进,是创新(novelty)。从应用角度,以后可能不再需要耗费人工去为AI的产品做大量的前期准备工作,这是其意义(significance)所在!
接着,洪教授也简单回顾了人工神经网络的历史:
人工神经网络在上世纪四十年代就出来了,小火了一下就撑不下去了,其中一个原因是大家发现这东西解决不了“异或问题”,而且训练起来太麻烦。到了上世纪七十年代,Paul
Werbos读博时候拿backpropagation的算法来训练神经网络,提高了效率,用多层神经网络把异或问题解决了,也把神经网络带入一个新纪元。上世纪八九十年代,人工神经网络的研究迎来了一场大火,学术圈发了成千上万篇关于神经网络的论文,从设计到训练到优化再到各行各业的应用。
Jim Burke教授,一个五年前退休的IEEE Life
Fellow,曾经讲过那个年代的故事:去开电力系统的学术会议,每讨论一个工程问题,不管是啥,总会有一帮人说这可以用神经网络解决,当然最后也就不了了之了。简单的说是大家挖坑灌水吹泡泡,最后没啥可忽悠的了,就找个别的地儿再继续挖坑灌水吹泡泡。上世纪末的学术圈,如果出门不说自己搞神经网络的都不好意思跟人打招呼,就和如今的深度学习、大数据分析一样。
然后,洪教授对人工智能做了并不十分乐观的展望:
回到阿法狗下棋这个事儿,伴随着大数据的浪潮,数据挖掘、机器学习、神经网络和人工智能突然间又火了起来。这次火的有没有料呢?我认为是有的,有海量的数据、有计算能力的提升、有算法的改进。这就好比当年把backpropagation用在神经网络上,的确是个突破。
最终这个火能烧多久,还得看神经网络能解决多少实际问题。二十年前的大火之后,被神经网络“解决”的实际问题寥寥无几,其中一个比较知名的是电力负荷预测问题,就是用电量预测,刚好是我的专业。由于当年神经网络过于火爆,导致科研重心几乎完全离开了传统的统计方法。等我刚进入这个领域做博士论文的时候,就拿传统的多元回归模型秒杀了市面上的各种神经网络遗传算法的。我一贯的看法,对于眼前流行的东西,不要盲目追逐,要先审时度势,看看自己擅长啥、有啥积累,看准了坑再跳。
美国密歇根大学人工智能实验室主任Satinder
Singh也表达了和洪教授类似的观点:这并非任何结束的开始,因为人工智能和人甚至动物相比,所知所能依然极端有限:
This is not the beginning of any end because AlphaGo Zero, like
all other successful AI so far, is extremely limited in what it
knows and in what it can do compared with humans and even other
不过,Singh教授仍然对阿法元大加赞赏:这是一项重大成就, 显示强化学习而不依赖人的经验,可以做的更好:
The improvement in training time and computational complex&ity
of AlphaGo Zero relative to AlphaGo, achieved in about a year, is a
major achieve&ment… the results suggest that AIs based on
reinforcement learning can perform much better than those that rely
on human expertise.
陈怡然教授则对人工智能的未来做了进一步的思考:
Zero没有使用人类标注,只靠人类给定的围棋规则,就可以推演出高明的走法。有趣的是,我们还在论文中看到了AlphaGo
Zero掌握围棋的过程。比如如何逐渐学会一些常见的定式与开局方法
,如第一手点三三。相信这也能对围棋爱好者理解AlphaGo的下棋风格有所启发。
除了技术创新之外,AlphaGo Zero又一次引发了一个值得所有人工智能研究者思考的问题:
在未来发展中,我们究竟应该如何看待人类经验的作用。在AlphaGo
Zero自主学会的走法中,有一些与人类走法一致,区别主要在中间相持阶段。AlphaGo
Zero已经可以给人类当围棋老师,指导人类思考之前没见过的走法,而不用完全拘泥于围棋大师的经验。也就是说AlphaGo
Zero再次打破了人类经验的神秘感,让人脑中形成的经验也是可以被探测和学习的。
陈教授最后也提出一个有趣的命题:
未来我们要面对的一个挑战可能就是:
在一些与日常生活有关的决策问题上,人类经验和机器经验同时存在,而机器经验与人类经验有很大差别,我们又该如何去选择和利用呢?
不过David Silver对此并不担心,而对未来充满信心。他指出:
If similar techniques can be applied to other structured
problems, such as protein folding, reducing energy consumption or
searching for revolutionary new materials, the resulting
breakthroughs have the potential to positively impact society.
以下为DeepMind David Silver 博士专访视频,中文字幕由Nature上海办公室制作:
您觉得哪一个突破更加关键呢?是阿法狗拜人为师最后打败老师,还是阿法元无师自通打败阿法狗?不妨留言告诉我们,并和大伙分享您对人工智能何去何从的看法。
更多讯息,请参见Nature论文链接/articles/doi:10.1038/nature24270
0 && image.height>0){if(image.width>=700){this.width=700;this.height=image.height*700/image.}}" class="SG_icon SG_icon214" v_src="/blog7style/images/common/sg_trans.gif" width="20" height="16" title="赠金笔" align="absmiddle">赠金笔
function open_phone(e) {
var context = document.title.replace(/%/g, '%');
var url = document.location.
open("/ishare.do?m=t&u=" + encodeURIComponent(url) + "&t=" + encodeURIComponent(context) + "&sid=70cd6ed4a0");
!觉得精彩就顶一下,顶的多了,文章将出现在更重要的位置上。
请根据下图中的字符输入验证码:
(您的评论将有可能审核后才能发表)
已成功添加“”到
请不要超过6个字

我要回帖

更多关于 isi web of knowledge 的文章

 

随机推荐