查看: 710|回复: 2

[奥大] 机械学习与统计的相对客观分析 [复制链接]

多野哥哥

长老级

Rank: 8 Rank: 8

升级 15%

UID: 301194
热情: 212
人气: 705
主题: 9
帖子: 196
精华: 3
积分: 575
阅读权限: 20
注册时间: 2011-11-2

电梯直达

楼主

发表于 2013-10-23 09:58:13 |只看该作者 |倒序浏览 微信分享

本帖最后由多野哥哥于 2013-10-23 09:59 编辑

我当然写不出这么叼B的东西，但以下是相关的一些读后感和节选。。。
原文是以下地址：
http://brenocon.com/blog/2008/12/statistics-vs-machine-learning-fight/

## 一个图表比较

## 前文提及：machine learning 大部分建基于统计的probability theory...
##.以下是我认为比较贴切的一点，特别是最后几句。。。
## 在实际中，双方的目标是不同的。。
I’ll also note that there are definitely a number of topics in ML that aren’t very related to statistics or probability. Max-margin methods: if all we care about is prediction, why bother using a probability model at all? Why not just optimize the spatial geometry instead? SVM’s don’t require a lick of probability theory to understand. (Of course probability-based approaches are huge in ML, but it’s important to remember they’re not the only game in town, and there is no necessary reason they must be.) And then there are non-traditional settings such as online learning, reinforcement learning, and active learning, where the structure of access to information is in play. There are certainly plenty of things in statistics that aren’t considered part of ML — say, regression diagnostics and significance testing. Finally, many ML problems involve large, high dimensional data and models, where computational issues are very important. For example, in statistical machine translation, alignment models are described with probability theory and fit to data, but their structure is complex enough that optimal inference is intractable, and how you do approximate inference (EM, Viterbi, beam search, etc.) is a very major issue.

这一点也相当有趣：
think this is reflective of the differences in institutional culture between CS and Stats. There’s an interesting John Langford post on part of the issue, which he calls “The Stats Handicap”. He points out that stats Ph.D.’s have a big disadvantage in the job market because statistics has an old-school journal-oriented publishing culture, so students publish much less and have less experience engaging with a research community. CS is conference-oriented — certain conferences have a higher prestige than many journals (e.g. NIPS in ML, CHI in HCI) — and this results in faster turnaround, dissemination, and collaboration. (I’ve heard others make similar comparisons between CS and psychology.) I’d expect any discipline with a larger conference emphasis to have better courses since they should reward presentation/teaching skills — or at least encourage practice — more than in journal world.

## 用machine learning的算法（当然这些很多的算法是基于统计理论的完善的）做data mining
## 以下是一些统计与data mining的看法
Another issue is the definition of statistics itself. In 1997, Jerome Friedman wrote an extremely interesting analysis of the situation: “Data Mining and Statistics: What’s the Connection?”. He points out, quite correctly, the statistical impoverishment of some common approaches to data mining. You can certainly blame statistics for not marketing its ideas well enough, or blame CS for ignoring statistics.

## 以下是一些看法：统计人都被打成这样了，怎么可以阿Q精神一下。
That is not to say statistics is not important — it’s incredibly important. He quotes Efro（boostraping（统计）的主要贡献人）n as saying “Statistics has been the most successful information science.” However, information science is becoming bigger and broader and more exciting, thanks to computation and ever-increasing amounts of data. What should statisticians do? Friedman continues (light editing and emphasis is mine):

One view says that our field should concentrate on that small part of information science that we do best, namely probabilistic inference based on mathematics. If this view is adopted, we should become resigned to the fact that the role of Statistics as a player in the “information revolution” will steadily diminish over time.

Another point of view holds that statistics ought to be concerned with data analysis. The field should be defined in terms of a set of problems — rather than a set of tools — that pertain to data. Should this point of view ever become the dominant one, a big change would be required in our practice and academic programs.
First and foremost, we would have to make peace with computing. It’s here to stay; that’s where the data is. This has been one of the most glaring omissions in the set of tools that have so far defined Statistics. Had we incorporated computing methodology from its inception as a fundamental statistical tool (as opposed to simply a convenient way to apply our existing tools) many of the other data related fields would not have needed to exist. They would have been part of our field.

Friedman wrote this article more than 10 years ago. All his observations about the importance and increasing prevalence of data and computing power are even more true today than back then. Has the field of statistics changed? Not clear. (I’d appreciate seeing evidence to the contrary.)

## 总结，真心话，其实奥大经济系的计量经济亦有“类统计分析”的效果。。
## 类统计分析指，你会学到为什么会这样在统计系了，但其它系都在用，而且给你相关数据告诉你怎么用。。。
## 奥大的统计往往会令不少人失望，他们会期望教得像澳洲精算那样都是概率模型，或者，教得像中国那样大部分都是数学。
## 没有！奥大的统计现在主要贡献生物，医疗等自然科学。想学偏社会科学的统计，还是早登极乐，脱离苦海，选择经济，社会，心理学(奥大心理学其实更偏向于脑/认知科学。) 吧
I know that I’m interested in quantitative information science, including statistics and data analysis. Machine learning has many strengths, but it is definitely an odd way to go about analysis. But there’s a good case that statistics, as traditionally defined, is only going to have a smaller role in the future. “Data mining” sounds more relevant, but does it even exist as a coherent subject? Maybe it’s time to study a more applied statistical field like econometrics.

附件: 你需要登录才可以下载或查看附件。没有帐号？注册

分享0 收藏0

使用道具举报

多野哥哥

长老级

Rank: 8 Rank: 8

升级 15%

UID: 301194
热情: 212
人气: 705
主题: 9
帖子: 196
精华: 3
积分: 575
阅读权限: 20
注册时间: 2011-11-2

沙发

发表于 2013-10-23 10:19:09 |只看该作者 微信分享

以下是一些非电脑,非统计的学生的讨论,他们会应用到统计以及电脑,这比单方面一个统计系学生说统计好,CS学生说CS好,黄婆卖瓜的逻辑来得好.

chemometric : 化学计量学
I come from yet another closely related field: chemometrics which is usually defined as applying statistics to chemical problems/data. Never heard machine learning in the place of statistics here. But chemometrics is heavily focused on prediction (also DoE, but far less about hypothesis testing)

I don't think it is fair to exclude prediction from statistics.

I rather see a difference in the approach (Ahmed's culture): My guess would be that machine learning is maybe more pragmatic than "pure statistics": if machine learning has an algorithm that solves a problem that's good. Statisticians tend to want thorough theoretical foundations as well. Chemometrics would also be more on the pragmatic side.
(Source: personal experience with chemometrics, where e.g. partial least squares regression has an extremely successful track of records for some 30 years now, including industrial application. Statistics now start to take the approach seriously because finally some statisticians bothered to have a look at the mathematical properties - before it was just an algorithm that happened to work very well with the chemometric data sets).

使用道具举报

ucksil

开国大老

Rank: 10

升级 44.93%

UID: 366844
热情: 2086
人气: 3008
主题: 19
帖子: 212
精华: 0
积分: 2674
阅读权限: 30
注册时间: 2013-10-14

板凳

发表于 2013-10-24 13:21:36 |只看该作者 微信分享

本帖最后由 ucksil 于 2014-12-23 11:49 编辑

.......................................

使用道具举报

返回列表

帐号		自动登录	找回密码
密码			登录	注册