To classify customer reviews of an online sales platform as positive or negative, specific requirements must be met, and limitations should be considered.

Requirements for Reliable Classification

  • Large datasets are required to generate reliable results, ideally thousands. These need to be split into training (70-80%) and testing (20-30%)
  • The language is for the platform specific as to whether it is positive or negative. The words to determine positive and negative are product specific and can therefore vary from product to product of the online platform.
  • The datasets need to cover the full range of the online platform products otherwise it is not representative of the whole platform.
  • Any abbreviations or terms need to be explained and could be different from product to product on the platform.
  • Language nuances need to be made consistent as per the IMDB movie review. Different words will apply to the products as opposed to the IMDB movie review.
  • The raw text of the review needs to be cleaned (misspellings, bad characters, etc) so that it is consistent and noise is reduced.
  • The criteria for a positive and negative review needs to be clearly defined.

Limitations of the approach

  • The criteria for a positive and negative review is sentiment based and very difficult to keep consistent over time, so can drift leading to unreliable results. What was once a good review could now be a bad review, especially in the middle ground of reviews using words like “the product was ok”.
  • The model might not understand the sentiment of a review correctly: “Great product when it worked” is not a positive review but has very positive words in it, so testing is critical.
  • Reducing sentiment to binary “positive” or “negative” is very difficult or easy to be inaccurate, because there are many shades of sentiment from very good to very bad and everything in between.
  • The sheer size of data required to create a model could be impractical for the online platform, especially setting up training and testing data.
  • The time required to maintain the model also requires headcount

Possible Improvements

  • There must be existing models that can be used to do this kind of classification. Multiple existing models could be tested and compared to find a fit.
  • The training and testing data of other models could be also compared to the training and testing data of this experiment to measure quality.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top