This module introduces the main methods of analysis and mining of opinions and personal evaluations for users based on Big Data generated on the web or other sources. Emphasis will be put on text mining method applied to text originated on social media. Lessons will be supported by case studies developed in the SoBigData.eu lab.
Topic- and opinion-oriented text analysis, differences and peculiarities. The machine learning pipeline for automatic text analysis. Building and using lexical resources. Feature engineering for SAOM. Recognition, definition and solution of problems regarding classification, regression, information extraction, quantification. Differences between individual and aggregated analysis. Evaluation of models. State of the art in research and market products.
Statistical relevance analysis
Lexical resources: SentiWordNet and other sentiment lexicons. Polarity: IMDB and Twitter datasets. Spam detection: Tripadvisor and Yelp datasets. Regression: Amazon and Tripadvisor datasets. Quantification:Amazon datasets
Recognition of SAOM problems in practical contexts. Choice of best fit model for their formalization. Definition of the external resources required to solve the problem. Choice of proper software tools and implementation of ad hoc solutions. Choice and use of machine learning algorithms for the creation of SAOM models. Evaluation and analysis of results.