|
| | | |
Predicting usefulness of online reviews using stochastic gradient boosting and randomized trees
Kumar, M. and Upadhyay, S.
This paper presents our analysis of online user reviews
from different business categories posted on the internet
rating and review services website Yelp. We use
business, reviewer, and review level data to generate
predictive features for estimating the number of
useful votes an online review is expected to receive.
Unstructured text data are mined using natural language
processing techniques and combined with structured
features to train two different machine learning
algorithms - Stochastic Gradient Boosted Regression
Trees and Extremely Randomized Trees. The results
from both of these algorithms are ensembled to generate
better performing predictions. The approach described
in this paper mirrors the one used by one of
the authors in a Kaggle competition hosted by Yelp.
Out of 352 participants, the author stood 3rd on the final leaderboard. |
Cite as: Kumar, M. and Upadhyay, S. (2013). Predicting usefulness of online reviews using stochastic gradient boosting and randomized trees. In Proc. Eleventh Australasian Data Mining Conference (AusDM13) Canberra, Australia. CRPIT, 146. Christen, P., Kennedy, P., Liu, L., Ong, K.L., Stranieri, A. and Zhao, Y. Eds., ACS. 65-72 |
(from crpit.com)
(local if available)
|
|