Abstract: Advanced data analysis techniques are gaining popularity. With modern statistics / data mining / machine learning engines, products and packages, data science has become a black box. It is possible to use data science without knowing how it works. However, not knowing how the algorithms work might lead to many problems, including using the wrong algorithm for a task, misinterpretation of the results, and more. This seminar explains how the most popular data mining algorithms work, when to use which algorithm, and advantages and drawbacks of each algorithm as well. Demonstrations and labs show the algorithms usage in T-SQL, R, and Python languages.
Algorithms explained include Naïve Bayes, Decision Trees, Neural Networks, Logistic Regression, Perceptron Model, Linear Regression, Regression Trees, Ordinal Regression, Poisson Regression, Principal Component Analysis, Support Vector Machines, Hierarchical Clustering, K-Means Clustering, Expectation-Maximization Clustering, Association Rules, Sequence Clustering, Auto-Regressive Trees with Cross-Prediction (ARTXP), Auto-Regressive Integrated Moving Average (ARIMA), and Time Series.
The seminar also includes the explanation of the introductory statistics, including descriptive statistics, correlations and linear associations. Even the information theory is touched briefly. All of these methods are useful for gathering understanding of the data used for later analysis and advanced data profiling. Mining unstructured data, specifically texts, is covered in the course as well.
1. Introduction to data mining and / or machine learning
2. Descriptive statistics for data overview
3. Classification, prediction and estimation algorithms
4. Forecasting, unsupervised algorithms, and text mining
Note: This recorded class is available in the format of a video course. Content is presented in modular videos. Learn more.