• Yoonkang Low

Sentiment analysis & modelling for web scraped phone reviews using Python


This work aims to demonstrate the ability of Python modules to scrape web data (specifically Sony Xperia handsets from Amazon product reviews), run some text/language analysis to understand and illustrate key features, and finally to test different modelling techniques to predict sentiment.

You'll find the necessary notebooks on Yoon's github page.

The notebook for scraping the reviews can be found here

The notebook for the text analysis and sentiment model can be found here

In this second notebook, the following python modules are used:

  • Pandas - data manipulation and data analysis

  • Matplotlib / season / wordclound - data visualisation

  • re (regex) - identify patterns

  • Sklearn - feature extraction / statistical modelling / model validation / model measurement

  • Nltk - language processing

The topics covered in this second notebook can be summarised as follows:

  1. Data quality checks

  2. Derive new features

  3. Preliminary data analysis with wordcloud and charts

  4. Text features extraction

  5. Sentiment modelling

  6. List important features from the best model

  7. Further development

If you have any feedback or questions, Yoon will be happy to help!


Yoonkang Low is a freelance Analytical and Data Science consultant with over 9 years experience, including time with analytical agencies, and at Amazon.

#python #sentimentanalysis #modelling #webscraping

© 2018 - Analytics Link