Technical Tutorials

This year we provided 12 free technical tutorials for our attendees. Use this chance to advance your Data Science skills with intensive hands-on training in key areas including data visualization, machine learning, AI, data engineering, and many others in-demand areas.

Our partners provided next technical tutorials:

Application of Machine Learning Clustering Algorithms on Geospatial Data by TomTom

Kickstarting with Cloud analytics

Time series forecast with PyFlux by ThingsSolver

Also, in partnership with fellow data scientists and data science communites from Europe we provided next technical tutorials:

  • Foundation of visualisation with Tableau
  • Advanced Visualisation with Tableau
  • Deep Learning with Python
  • Data Mining with GDPR
  • Data Analytics with R
  • Google Data Studio
  • Predictive modeling
  • Advanced Analytics and more

You will learn

Topic

Predictive Analytics Machine Learning Natural Language Processing Deep Learning Probabilistic Programming Artificial Intelligence Data Wrangling Data Engineering Data Analytics Time Series Geospatial data Big Data

Languages

R Python SQL Pig/Hive Jupyter

Tools

Tableau Google Studio Amazon Web Services R Studio PostgreSQL Scikit-learn Elastic Search PyFlux

Important notice:

R is widely known as one of the most popular languages used for doing Data Analytics. If we want to understand what data want to tell us, we need to understand data, what is the problem and what are the questions we need to ask to get deeper insights.

This is why in the first part of the technical tutorials we shall go through next topics: Data types, conditional statements, loops and functions
Importing & Cleaning Data with R
Data Manipulation with R

In the second part of the technical tutorials we shall go through Analytical processes and what does really Data Analytics means. We shall go through next topics:
Introduction to Data Analysis
Exploratory Data Analysis

Pre-requirements and needed knowledge :
You need to have installed R version R-3.5.1 & R Studio version 1.1.456.

About lecturer : | , |

We will deploy small multi node Redshift cluster (7,5 TB capacity, 700GB RAM and 96 virtual cores) . Then we will load several hundred millions rows data set in it and start querying it while demonstrating working with data in cloud data warehouse. At the end participants will connect to database with possibility to query it.

Cloudwalker is certified Amazon partner with twenty years of experience in analytics and big data services.

Pre-requirements and needed knowledge :

About lecturer : | |

This tutorial is designed for the beginner Tableau user. It is for anyone who works with data – regardless of technical or analytical background. This course is designed to help you understand and use the important concepts and techniques in Tableau to move from simple to complex visualizations and learn how to combine them in interactive dashboards.

This course includes a workbook containing key concepts on each topic covered and hands-on activities to reinforce the skills and knowledge attained. It also includes a digital student resources folder containing Tableau workbooks and data sources to support the hands-on activities. At the end of this course, you will be able to:
Connect to your data, Edit and save a data source, Understand Tableau terminology, Use the Tableau interface / paradigm to effectively create powerful visualizations, Create basic calculations including basic arithmetic calculations, custom aggregations and ratios, date math, and quick table calculations, Represent your data using the following visualization types, Dual axis and combined charts with different mark types, Highlight Tables, Scatter Plots and to build dashboards.

Pre-requirements and needed knowledge :

About lecturer : | |

| |

This tutorial will cover the process from structuring data to feature extraction techniques like PCA, while improving coding skills using tips and tricks. As an introduction we will introduce ourselves to different data structures in R (tibble, data frame, data table). Concept of pipeline coding will be explained, emphasizing the elegancy and readability of this way of coding.
We will explore different packages for preprocessing data prior to modelling. In dplyr we will learn basic functions for wrangling and manipulating data, while improving our pipeline coding skills. Using the new package recipes, we will sequentially apply different preprocessing steps from transforming the data to feature extraction. By going through all these different steps we will round up the premodel process. Each of the steps will be explained thoroughly with additional theoretical background.
All of this will be applied on real life data sets in different industrial areas (telecommunications, finance & cetera).

Pre-requirements and needed knowledge :

About lecturer : | |

This technical tutorial will cover basic concepts of time series analysis, like time series decomposition, stationarity analysis, trend and seasonality smoothing. Afterwards, some of the most popular algorithms used for time series forecasting will be presented and explored. The workshop will include programming in Python, and its time series forecasting library – PyFlux.

We shall go through three parts:

Part 1: Intro to time series analysis:
Part 2: Intro to Pyflux library
Part 3: Hands-on example

The main goal of this tutorial is to introduce participants with main concepts of time series analysis, as well as with forecasting methods available in the PyFlux library.

Pre-requirements and needed knowledge :

About lecturer : | |

The goal of this tutorial is to go through a process of making an appropriate model that can identify which Telecom customers are willing to churn in the next month. Participants will be able to experience working with real Telco data set and to learn how to solve problems that come with it.

The tutorial consists of following steps: Data Analysis, Data Cleaning, Feature Engineering, Solving the problem of imbalanced classes, Modeling and Evaluation. Different machine learning models, their pros and cons will be discussed in the Modeling part. Besides its practical application, PCA, balancing techniques, and machine learning models will be discussed from a theoretical point of view as well.

Pre-requirements and needed knowledge :

About lecturer : | |

In this technical tutorial, "Python Fundamentals", we will learn what python actually is, why is it so popular and why do you need to learn it. Also we will go through some examples of real life python solutions, which will give you a strong base, so that later you can develop your python skills on your own.

Pre-requirements and needed knowledge :

About lecturer : | |

Technical tutorial is aimed at data scientists who wish to apply and understand methods of Machine Learning in practice – from basic self implemented linear regression all the way to convolutional and recurrent neural networks, we will look to give a brief overview of the field, have some fun with interesting data sets and try to explain amazing theoretical background in the field with practical and dynamic approach that you rarely find in textbooks and libraries.

We shall be using language Python 3 with a few libraries (will be posted on GitHub in detail) and Jupyter Notebook/Lab. Course attendees are advised to prepare their laptops ahead of the course with listed libraries as they will also have a few tasks to complete – Jupyter Notebooks will be prepared and published on GitHub before the course starts. https://github.com/termNinja/ds4-pmtml

Pre-requirements and needed knowledge :

About lecturer : | |

Geospatial clustering can be defined as the process of grouping geographical locations into groups such that locations within a group are close together when compared to those which are in the other groups. It is an important part of geospatial data processing since it provides certain insights into the distribution of data and characteristics of spatial clusters.

This tutorial will introduce machine learning concepts suitable for solving clustering problems, and cover the road from raw data acquisition, preparation and visualization to the forming and labeling of clusters. The tutorial will also present and compare several approaches to solve the clustering problem using concrete machine learning algorithms. Practical examples will be based on clustering of geographical locations, but can be generalized to cover other clustering problems as well.

The aim of this tutorial is to enable participants to use existing machine learning algorithms found in modern libraries or even to roll their own algorithm implementation for a specific clustering problem.

Pre-requirements and needed knowledge :

About company : TomTom created the easy-to-use navigation device, one of the most influential inventions of all time. Since then, we have grown from a start-up, into a global technology company. We design and develop innovative navigation products, software and services, that power hundreds of millions of applications across the globe. This includes industry-leading location-based and mapmaking technologies, embedded automotive navigation solutions; personal navigation devices and apps, and the most advanced telematics fleet management and connected car services.

Combining our own R&D expertise with business and technology partnerships, we continue to shape the future, leading the way with autonomous driving, smart mobility and smarter cities. Headquartered in Amsterdam with offices in 37 countries, we offer advanced digital maps that cover 137 countries, and our hyper-detailed and real-time TomTom Traffic information service reaches more than five billion people in 69 countries.

To read our story, visit www.tomtom.com

Since the focus of data analysis can be seen as a discovery of general or significant patterns in data, aspects of data privacy, security and ethics needs to be discussed. We will start with discussion of the impact of the 4th industrial revolution and the rise of data-driven business models. The understanding of ethical issues in the age of data will be introduced via fundamentals of privacy & data protection. The new model will be explained through the General Data Protection Regulation which is applicable across the EU from 25th May 2018.
The focus will be on rights of data subjects, data protection principles and impact of GDPR on business systems organization, software development and data mining. It will be discussed what are the steps necessary to implement GDPR and comply with legal regulation in order to be able to release data products. Participants will be also learning how to use the new self-assessment tool which will help them to assess the costs and time necessary for compliance.

Pre-requirements and needed knowledge :

About lecturer : | |

R is powerful tool, that can give us a lot of insights and oversights needed for advanced analytics. To fully comprehend power of R, we need to understand why and how is something happening. In the most cases this is not easy task, and require serious understanding of the data we are using for analysis. That is why we need to use visualisation as a part of process and we need to understand difference between correlation/casualties, as well how to make good inference based on the data we have.

In this technical tutorial we shall go through next topics:
- Data visualisation with R
- Correlation and Regression
- Foundations of Inference

Pre-requirements and needed knowledge :

About lecturer : | |

In this technical tutorial you will see the overview of Power BI Desktop and learn how to:

* Connect to data sources in Power BI Desktop
* Clean and transform your data with the Query Editor
* Work with more advanced data sources and transformation
* Clean irregularly formatted data
* Explore time-based data with visual hierarchies and drill-down
* Publish and share reports

Pre-requirements and needed knowledge :

About lecturer : | |