Abstract: If you want to learn how to get information from your data with Transact-SQL, or shortly T-SQL language, then this course is the course for you. It will teach you how to calculate statistical measures from descriptive statistics including centers, spreads, skewness and kurtosis of a distribution, find the associations between pairs of variables, including calculating the linear regression formula, calculate the confidence level with definite integration, find the amount of information in your variables, and do also some machine learning or data science analysis, including predictive modeling and text mining.
T-SQL language in latest editions of SQL Server, Azure SQL Database, and Azure Synapse Analytics, has so many business intelligence (BI) improvements that it might become your primary analytic database system. Many database developers and administrators are already proficient with T-SQL. Occasionally they need to analyze the data with statistical or data science methods, but they do not want to or have time to learn a completely new language for these tasks. In addition, they need to analyze huge amounts of data, where specialized languages like R and Python might not be fast enough. SQL Server has been optimized for work with big datasets for decades.
In order to get the maximum out of these language constructs, you need to learn how to properly use them. This in-depth course shows extremely efficient statistical queries that use the window functions and are optimized through algorithms that use mathematical knowledge and creativity. The formulas and usage of those statistical procedures are explained as well.
Any serious analysis starts with data preparation. The course introduces some common data preparation tasks and shows how to implement them in T-SQL.
No analysis is good without good data quality. The course introduces data quality issues, and shows how you can check for completeness and accuracy with T-SQL, and how to measure improvements of data quality over time. And since the talk is already about the time, the course shows how you can optimize queries with temporal data, for example when you search for overlapping intervals. More advanced time-oriented information includes hazard and survival analysis.
Then the course switches to the currently most fashionable topic, the data science. Some of quite advanced algorithms can also be implemented in T-SQL. The reader learns about the market basket analysis with association rules using different measures like support and confidence, and even sequential market basket analysis, when there is a sequence in the basket. Then the course shows how to develop predictive models with a mixture of k-nearest neighbor and decision trees algorithms and with Bayesian inference analysis.
Analyzing text, or text mining, is another modern topic. However, many people do not realize that you can do really a lot of text mining also in pure T-SQL. SQL Server can also become a text mining engine. The course shows how to analyze text in multiple natural languages with pure T-SQL, using also features from the full-text search (FTS).
- Descriptive Statistics
- Associations between Pairs of Variables
- Data Preparation
- Data Quality
- Time-Oriented Data
- Time-Oriented Analyses
- Data Mining
- Text Mining