如何调用tampermonkey apilearn sentiment analysis 的api

Second Try: Sentiment Analysis in Python
Introduction
with using R for sentiment analysis, I started talking with a friend here at school about my work. Jackson and I decided that we’d like to give it a better shot and really try to get some meaningful results. After a lot of research, we decided to shift languages to Python (even though we both know R). We made this shift because Python has a number of very useful libraries for text processing and sentiment analysis, plus it’s easy to code in. We launched right into tutorials and coding, and this post will be about that process and our results.
We also met with , a professor of linguistics here at Stanford. Prior to meeting with him, we consulted
extensively and found it incredibly useful. We had a fantastic chat with Professor Potts and he helped us grasp some of the concepts we were working on.
If you’d like to jump straight to seeing the full code, you can head over to the .
One of the resources we got a lot of mileage out of was , especially the articles on , . and . More to follow about each of those elements.
Another great discovery was the
(NLTK). This is an incredible library for Python that can do a huge amount of text processing and analysis. This would end up forming the basis for our program.
During our first attempt, we basically just tried to convert my program in R into Python. We quickly realized that not only did Python have more efficient ways to do some of the steps, but it also was missing some functionality that I used in the R version. So instead, we started based on StreamHacker’s code.
An important piece of sentiment analysis terminology: “features” are whatever you’re analyzing in an attempt to correlate to the labels. For example, in this code, the features will be the words in each review. Other algorithms could use different types of features — some algorithms use bigrams or trigrams (strings of two or three consecutive words, respectively) as the features.
An idea from StreamHacker that we really liked was writing a function to evaluate different feature selection mechanisms. That means that we would be able to write different methods to select different subsets of the features (read: words) in the reviews and then evaluate those methods.
As an aside, here are the imports we used for the project, so I won’t have to reference them again:
import re, math, collections, itertools
import nltk, nltk.classify.util, nltk.metrics
from nltk.classify import NaiveBayesClassifier
from nltk.metrics import BigramAssocMeasures
from nltk.probability import FreqDist, ConditionalFreqDist
We began to write our feature evaluator (again, many thanks to StreamHacker.):
def evaluate_features(feature_select):
#reading pre-labeled input and splitting into lines
posSentences = open('polarityData\\rt-polaritydata\\rt-polarity-pos.txt', 'r')
negSentences = open('polarityData\\rt-polaritydata\\rt-polarity-neg.txt', 'r')
posSentences = re.split(r'\n', posSentences.read())
negSentences = re.split(r'\n', negSentences.read())
posFeatures = []
negFeatures = []
#/questions/367155/splitting-a-string-into-words-and-punctuation
#breaks up the sentences into lists of individual words (as selected by the input mechanism) and appends 'pos' or 'neg' after each list
for i in posSentences:
posWords = re.findall(r&[\w']+|[.,!?;]&, i)
posWords = [feature_select(posWords), 'pos']
posFeatures.append(posWords)
for i in negSentences:
negWords = re.findall(r&[\w']+|[.,!?;]&, i)
negWords = [feature_select(negWords), 'neg']
negFeatures.append(negWords)
This is the same polarity data that was used in my , so check that out if you’re curious about the data. Essentially what that block of code does is splits up the reviews by line and then builds a posFeatures variable which contains the output of our feature selection mechanism (we’ll see how that works in a minute) with ‘pos’ or ‘neg’ appended to it, depending on whether the review it is drawing from is positive or negative.
The next bit of code separates the data into training and testing data for a Naive Bayes classifier, which is the same type of classifier I used before.
#selects 3/4 of the features to be used for training and 1/4 to be used for testing
posCutoff = int(math.floor(len(posFeatures)*3/4))
negCutoff = int(math.floor(len(negFeatures)*3/4))
trainFeatures = posFeatures[:posCutoff] + negFeatures[:negCutoff]
testFeatures = posFeatures[posCutoff:] + negFeatures[negCutoff:]
Now, thanks to NLTK, I can very simply train my classifier:
classifier = NaiveBayesClassifier.train(trainFeatures)
Pretty cool, huh? The last thing this function needs to do is check how well the classifier does when it tries to classify the testing data. This code is a little challenging so I’ll walk through it thoroughly.
First, I have to initiate referenceSets and testSets, to be used shortly. referenceSets will contain the actual values for the testing data (which we know because the data is prelabeled) and testSets will contain the predicted output.
Next, for each one of the testFeatures (the reviews that need testing), I iterate through three things: an arbitrary ‘i’, so be used as an identifier, and then the features (or words) in the review, and the actual label (‘pos’ or ‘neg’).
I add the ‘i’ (the unique identifier) to the correct bin in referenceSets. I then predict the label based on the features using the trained classifier and put the unique identifier in the predicted bin in testSets.
for i, (features, label) in enumerate(testFeatures):
referenceSets[label].add(i)
predicted = classifier.classify(features)
testSets[predicted].add(i)
This gives me a big list of identifiers in referenceSets[‘pos’], which are the reviews known to be positive (and the same for the negative reviews). It also gives me a list of identifiers in testSets[‘pos’], which are the reviews predicted to be positive (and similarly for predicted negatives). What this allows me to do is to compare these lists and see how well the predictor did. Here’s where one of the StreamHacker articles (on ) really helped.
The essence of those two terms is that precision is a measure of false positives — a higher precision means fewer reviews that aren’t in the desired label get labeled as being in there. A high recall means fewer reviews that are in the desired label get put in the wrong level. As you can imagine, these metrics correlate very closely. Here’s the code (again from the NLTK library) to print out the positive and negative recall and precision, as well as the accuracy (a less-specific measure just showing what percentage the classifier got right). NLTK also has a cool function that shows the features (words) that were most helpful to the classifier in determining whether a review was positive or negative. So here’s the code:
print 'train on %d instances, test on %d instances' % (len(trainFeatures), len(testFeatures))
print 'accuracy:', nltk.classify.util.accuracy(classifier, testFeatures)
print 'pos precision:', nltk.metrics.precision(referenceSets['pos'], testSets['pos'])
print 'pos recall:', nltk.metrics.recall(referenceSets['pos'], testSets['pos'])
print 'neg precision:', nltk.metrics.precision(referenceSets['neg'], testSets['neg'])
print 'neg recall:', nltk.metrics.recall(referenceSets['neg'], testSets['neg'])
classifier.show_most_informative_features(10)
The Basic Method
So after all that, we can start to figure out our feature selection mechanism. This basically means the way we select which words to train the classifier on.
In my previous post, I didn’t actually train the classifier on words at all. The classifier was trained on the number of words in each category from very negative to very positive. With this Python program, Jackson and I chose to look at the individual words themselves rather than counting positive and negative words.
We did this because there is inherent error in picking positive and negative words — there’s a huge loss of information there: sentence-long reviews were reduced down to just a few digits. With this method, we’re keeping a lot more of the information in the review ().
The most obvious feature selection mechanism is just to look at all the words in each review — it’s simple to code and provides a great base case. Here’s all we need:
def make_full_dict(words):
return dict([(word, True) for word in words])
This just builds a
(what we need for the evaluate_features method) that has each of the words in the review followed by ‘True’.
Then we can just run the testing method:
print 'using all words as features'
evaluate_features(make_full_dict)
Here’s the output:
using all words as features
train on 7998 instances, test on 2666 instances
accuracy: 0.
pos precision: 0.
pos recall: 0.
neg precision: 0.
neg recall: 0.
Most Informative Features
engrossing = True
17.0 : 1.0
quiet = True
15.7 : 1.0
mediocre = True
13.7 : 1.0
absorbing = True
13.0 : 1.0
portrait = True
12.4 : 1.0
refreshing = True
12.3 : 1.0
flaws = True
12.3 : 1.0
inventive = True
12.3 : 1.0
triumph = True
11.7 : 1.0
refreshingly = True
11.7 : 1.0
That might seem overwhelming, but we can go through it bit by bit!
First, we see that the accuracy is 77% — this is already seeming a lot better than my first attempt with R. Then we see that the precisions and recalls are all pretty close to each other, which means that it is classifying everything fairly evenly. No problems there, and we don’t need to read a whole lot more into it.
Then we see the most informative features — for example, if ‘engrossing’ is in a review, there’s a 17:1 chance the review is positive.
As a side note, there’s a couple interesting ones in there. For example, ‘flaws’ being in a review strongly indicates that the review is positive. Perhaps that’s because people rarely use “flaws” in a very negative sense, typically opting for stronger words. However, it is common to hear “it had some flaws, but…”
The Next Step
Moving on, the next way to select features is to only take the n most informative features — basically, the features that convey the most information. Again (for the millionth time), thanks to StreamHacker here.
We first need to find the information gain of each word. This is a big chunk of code, but we’ll break it up.
First, we broke up the words in a similar way to in the evaluate_features function and made them iterable (so that we could iterate through them):
def create_word_scores():
#splits sentences into lines
posSentences = open('polarityData\\rt-polaritydata\\rt-polarity-pos.txt', 'r')
negSentences = open('polarityData\\rt-polaritydata\\rt-polarity-neg.txt', 'r')
posSentences = re.split(r'\n', posSentences.read())
negSentences = re.split(r'\n', negSentences.read())
#creates lists of all positive and negative words
posWords = []
negWords = []
for i in posSentences:
posWord = re.findall(r&[\w']+|[.,!?;]&, i)
posWords.append(posWord)
for i in negSentences:
negWord = re.findall(r&[\w']+|[.,!?;]&, i)
negWords.append(negWord)
posWords = list(itertools.chain(*posWords))
negWords = list(itertools.chain(*negWords))
Then we set up an overall frequency distribution of all the words, which can be visualized as a huge histogram with the number of each word in all the reviews combined. However, with just this line, all we do is initialize the frequency distribution — it’s actually empty:
word_fd = FreqDist()
We’ll also need a conditional frequency distribution — a distribution that takes into account whether the word is in a positive or negative review. This can be visualized as two different histograms, one with all the words in positive reviews, and one with all the words in negative reviews. Like above, this is just an empty conditional frequency distribution. Nothing is in there yet.
cond_word_fd = ConditionalFreqDist()
Then, we essentially fill out the frequency distributions, incrementing (with .inc) the counter of each word within the appropriate distribution.
for word in posWords:
word_fd.inc(word.lower())
cond_word_fd['pos'].inc(word.lower())
for word in negWords:
word_fd.inc(word.lower())
cond_word_fd['neg'].inc(word.lower())
The next thing we need to find the highest-information features is the count of words in positive reviews, words in negative reviews, and total words:
pos_word_count = cond_word_fd['pos'].N()
neg_word_count = cond_word_fd['neg'].N()
total_word_count = pos_word_count + neg_word_count
The last thing we need to do is use a
test (also from NLTK) to score the words. We find each word’s positive information score and negative information score, add them up, and fill up a dictionary correlating the words and scores, which we then return out of the function. Chi-squared tests, as you can read in the Wikipedia article I just linked to, is a great way to see how much information a given input conveys.
word_scores = {}
for word, freq in word_fd.iteritems():
pos_score = BigramAssocMeasures.chi_sq(cond_word_fd['pos'][word], (freq, pos_word_count), total_word_count)
neg_score = BigramAssocMeasures.chi_sq(cond_word_fd['neg'][word], (freq, neg_word_count), total_word_count)
word_scores[word] = pos_score + neg_score
return word_scores
We then make another function that finds the best n words, given a set of scores (which we’ll calculate using the function we just made) and an n:
def find_best_words(word_scores, number):
best_vals = sorted(word_scores.iteritems(), key=lambda (w, s): s, reverse=True)[:number]
best_words = set([w for w, s in best_vals])
return best_words
Are you a little confused by that ‘lambda’? I certainly was when I saw it. Essentially what it does is allow you to temporarily make a function to return something. In this case, it’s helping to sort the words into the correct order. You can check
out for more information on lambda functions.
Finally, we can make a feature selection mechanism that returns ‘True’ for a word only if it is in the best words list:
def best_word_features(words):
return dict([(word, True) for word in words if word in best_words])
Last, I ran it using the best 10, 100, , and 15000 words. Here’s the code to do that:
numbers_to_test = [10, 100, 1000, 10000, 15000]
#tries the best_word_features mechanism with each of the numbers_to_test of features
for num in numbers_to_test:
print 'evaluating best %d word features' % (num)
best_words = find_best_words(word_scores, num)
evaluate_features(best_word_features)
Here’s my output (I’ve cut out the informative features list because it’s the same for all of them, including the one using all the features):
evaluating best 10 word features
train on 7998 instances, test on 2666 instances
accuracy: 0.
pos precision: 0.
pos recall: 0.
neg precision: 0.
neg recall: 0.
evaluating best 100 word features
train on 7998 instances, test on 2666 instances
accuracy: 0.
pos precision: 0.
pos recall: 0.
neg precision: 0.
neg recall: 0.
evaluating best 1000 word features
train on 7998 instances, test on 2666 instances
accuracy: 0.
pos precision: 0.
pos recall: 0.
neg precision: 0.
neg recall: 0.
evaluating best 10000 word features
train on 7998 instances, test on 2666 instances
accuracy: 0.
pos precision: 0.
pos recall: 0.
neg precision: 0.
neg recall: 0.
evaluating best 15000 word features
train on 7998 instances, test on 2666 instances
accuracy: 0.
pos precision: 0.
pos recall: 0.
neg precision: 0.
neg recall: 0.
The Conclusions
There’s lots to read into here. Obviously, using very few features didn’t do great in terms of accuracy, precision, and recall because there wasn’t enough data to build the model off of. Using 1000 features is about as good as using all the features, but at 10000 and 15000, there’s a pretty huge increase over the base case, getting up to ~85% accuracy and similar precision and recall statistics.
That means that using intelligent feature selection increased the accuracy by around 8 percentage points, which seems like quite a significant jump. Jackson and I were very happy about that.
We’re also happy about the results as a whole — classifying reviews with over 80% accuracy is pretty impressive and we can see lots of applications for this technology.
Of course, there are tons of ways to improve these results. These include (but are not limited to):
Adding different feature selection mechanisms
Pre-processing the text to get rid of unimportant words or punctuation
Doing deeper analysis of the sentences as a whole
Trying a different classifier than the Naive Bayes Classifier
A disclaimer applies: we’re just learning all of this, and fairly independently too. There’s a decent chance that there’s a mistake or an inappropriate conclusion somewhere. If there is, please don’t hesitate to . I’d also love to hear from you if you have any other input on the project!
For the full code, check out the .
pointed out in the comments section that there’s a linear relationship between the accuracy and the log of the number of features. I decided to take a look at this in R. Here’s the code I used.
library(ggplot2)
features &- c(10, 50, 100, 250, 500, 750, , , 1, 15000)
logFeatures &- log(features)
accuracy &- c(0.3, 0.5, 0.3, 0.9, 0.4, 0.4, 0.1087)
data &- data.frame(logFeatures, accuracy)
ggplot(data, aes(x=logFeatures, y=accuracy)) +
geom_point(shape=1) +
geom_smooth(method=lm)
This takes advantage of the great
library for R. It simply puts in the features, calculates the log of them, puts in the accuracy, and plots these two variables. Here’s the output:
I noticed here at the last 4 values seem a bit out of line and ran the script again with those removed to see how well the relationship performed through 5000 features:
This is quite a strong correlation. Very interesting how well that worked out. Thanks for the heads up, Pieter!
Filed Under:
& Andy Bromberg. Powered byKimono + MonkeyLearn---8个步骤进行网络数据抓取和情感分析
作者:NLP自然语言处理 微信公众号
本文翻译自《Kimono + MonkeyLearn: sentiment analysis with machine learning and web scraped data》,为NLP自然语言处理公众号独家编译,未经许可禁止转载现如今,我们身边充斥着各种各样的工具,帮助我们理解客户的反应:客户喜欢这个地方吗?客户反感这个菜单吗?他们还会再回来吗?不断增长的数据是十分宝贵的,但是如此庞大体量的数据却很少有合适的工具可以对其进行分析、帮助我们理解并转化成生产力。当然,还是有一些技术可以帮助我们解锁数据背后的意义,帮助我们更好地做生意。本文就为您介绍如何使用Kimono和MonkeyLearn来将客户数据转化为有意义的情感分析。Kimono + MonkeyLearnKimono是一款智能的网络数据抓取器,通过将网站转化为API,将数据从网络上抓取下来。通过使用Kimono,用户可以选择任一网络上想要抓取的数据,剩下的交给Kimono来做就好了,它能在几秒内完成网站到API的转换。MonkeyLearn是一款使用机器学习技术从文本中获取相关数据的平台。它的目的是让任何经验水平的程序员轻松地根据他们特定的需求从文本上提取和分类信息,并以便捷、迅速以及高效的方式将该技术融合与他们自己的平台和应用程序中。Kimono和MonkeyLearn之间有一种天然的契合:使用Kimono我们可以从网络上提取信息;使用MonkeyLearn我们可以创建和使用机器学习模型来对数据/信息进行情感分析、主题检测、语言检测、关键词检测以及实体识别甚至更多。如何使用Kimono和MonkeyLearn创建一款酒店后台数据的情感分析检测工具目标是针对酒店评论数据来开发一款情感分析的工具。我们使用Kimono和MonkeyLearn从TripAdvisor上获取数据,并使用这些评论数据作为训练数据,使用MonkeyLearn创建机器学习模型。该模型将学习检测酒店评论数据是“积极”的还是“消极”的,从而帮助我们更好地理解新的和未处理的评论数据。1. 创建Kimono API首先通过创建Kimono API从TripAdvisor上抓取数据:安装Kimono Chrome 扩展程序导航至想要提取信息的网站,点击该扩展插件,此处以TripAdvisor上“New York Inn”的数据为例来创建情感分析分类器。用Kimono选择想要抓取的数据如果这步您需要帮助,请参看这篇文章 “Select page data to scrape with kimono”。本案例中我们提取评价标题、评价内容和星级。 我们需要添加三个属性:“标题”、“内容”、“星级”,并且在网页上标注相应的领域。下次在该网站上,Kimono就会根据该设置自动识别每一个评论数据了。 标记完所有属性后,我们需要标记分页链接---即获取下一页评论区的爬虫。你可以通过使用Kimono panination activation icon来标注下一页的链接。在创建Kimono API之前,我们需要在“星级”属性中做一些配置以便获取星级值,也就是说,我们需要获取“1 of 5 stars”或“5 of 5 stars”这样的值。你可以通过点击Data Model View并为星级属性进行配置。 你可以前往Raw Data View来看看爬虫是不是拿到了正确的属性值。 现在,第一步就完成了。我们只需点击“Done”按钮,在创建窗口中,选择manual crawl作为你的API设置,并设置crawl limit为50 pages max。 2. 获取数据现在我们已经创建了Kimono爬虫了,我们准备开始从网上爬取数据。你需要前往API Detail的Crawl Setup tab,点击Start Crawl按钮: 爬虫程序开始启动,大概几秒钟就完了。要浏览爬取的数据,前往Data Preview tab,选择CSV格式并点击下载链接:3. 准备数据现在我们已经下载了kimonoData.csv文件,该对数据进行预处理了。我们用Python和Pandas Library来操作。首先导入csv文件到数据框架中,移除冗余数据,去除中立态度的数据(3 of 5 stars)。然后我们创建一个新列将标题和内容联系起来。然后我们创建一个新列,用于存储情感预测数据:“好”或“坏”。我们将3星以上转换成“好”,将3星以下转换成“坏”。我们只留下full_content和true_category列。创建好的数据框架是这样子的。要想大概看一下这些数据,我们有429个好评和225个差评。最后,我们以MonkeyLearn的格式存储数据,所以我们移除了表头和序号列。第一列一定是评论内容列,第二列一定是评分列。然后我们将评论内容转换成UTF-8格式。4. 创建MonkeyLearn分类器 该用MonkeyLearn了,我们想要分别根据评论内容是正面的还是负面的,来创建一个文本分类器,将文本评论内容自动分成两类:“好”或“坏”。这个过程叫做“情感分析”,也就是从文本中提取情感。首先你得登陆 ,进入主界面后使用 Monkeylearn预设置好的文本挖掘模块来创建自定义分类器。在Classification页,点击Create Module按钮。 会弹出一个对话框,填写一些初始的设置条件。我们设置“English”为工作语言,并将新模块命名为“Hotel Sentiment”。我们也可以进行一些高级设置,点击Show advanced options设置N-gram范围为1~3;设置Use stemming为不可用;设置Filter stopwords为可用,自定义Custom stopwords为“the, and”点击创建按钮,我们就进入模块的详细页面。5. 导入Kimono数据前往Actions目录,选择Upload tree,然后选择Kimono创建的CSV数据。上传完成后,MonkeyLearn会在左侧创建相应的分类树。分类树中我们会有三个节点:Root(起点)以及情感分类:“好”和“坏”。点击每一个类别,可以在屏幕右下方列表中看到相应的评论例子(刚刚用Kimono收集的评论)。6. 训练MonkeyLearn这一步很简单:训练机器学习算法。只需要在屏幕右上方点击Train按钮,在MonkeyLearn云中训练机器学习算法时,我们可以看到进度条。根据分类树和样本数据的不同,该过程可能花费几秒到几分钟不等。训练完后,模块状态变为TRAINED。然后会显示一个统计数据,表明该模块的正确预测能力。 这些衡量值有“Accuracy”、 “Precision”和“Recall”,是机器学习评价分类算法性能的常用参数。你也可以看看右侧的keyword cloud,显示了用于分类样本和预测文本情感的一些词条。正如你所见,这些词条都和酒店特色正面或负面评价的情感相关。这些词条会被MonkeyLearn的统计算法自动获取。7. 测试情感分析模块我们创建好的情感分析分类器里什么也没有,现在,我们可以使用MonkeyLearn提供的界面直接对模块进行测试。前往API tab,你可以写、粘贴一些文字,然后提交,这样可以得到预测结果。结果显示了针对我们输入的内容,分类器所做出的响应。值得注意的是结果代码中会显示一个预测标签。如果情感预测为”好“,则相应的probability为1;本案例中的情感预测非好即坏,则probability在0~1之间徘徊,1表明系统对于预测的肯定程度为100%。当然分类器难免出一些Bug,比如把好的评论当成坏的,反之亦然。解决的办法就是尽可能多地收集训练数据(在本例中,就是尽可能从更多的酒店里收集更多的评论数据),你可以上传更多的数据,重新训练来改善结果。你也可以在分类器的高级设置里试试别的设置并重新训练分类器。不同的分类任务的设置条件也是不同的。8. 将MonkeyLearn API和模块集成起来你可以做同样的事,但必须以编程的方式,你可以用任何程序设计语言轻松地集成任何MonkeyLearn模块到你的项目当中。举例来说,你打算用Python,只需前往API Library,找到相应的编程语言,复制粘贴这段代码:总结本文使用Kimono和MonkeyLearn创建了学会在酒店评论中进行情感分析的机器学习模块。Kimono可以快速抓取网络文本数据,MonkeyLearn可以创建情感分析的分类器。不仅如此,你还可以使用MonkeyLearn 预先训练好的模块来丰富Kimono API,并针对网络抓取数据进行情感分析、主题检测、语言检测、关键词检测以及实体检测。如果你有特殊需求,你还可以用MonkeyLearn创建自定义模块,来处理按你所需要的方式获取的信息。如果你是MonkeyLearn用户,你还可以用Kimono轻松抓取网络数据,并在短短几分钟训练自定义模块并创建机器学习模型。更多干货内容,敬请关注 NLP自然语言处理 微信公众号
相关微信文章:
相关推荐:
本网站所有内容均采集自网络,如有侵权麻烦邮箱联系删除。

我要回帖

更多关于 sentiment api 的文章

 

随机推荐