Understanding a data pipeline with emphasis on ETL
Day 1, 10:45-12:15 pm (lecture)
In this lecture, we will understand some basic concepts of data science: The lecture is a deep-dive into data pipelines from data extraction to destination, where the data is consumed by individuals or downstream systems. We will provide a walkthrough of ETL significance in data science, explore key technologies and see a simple demo. We will also explore data mining in Python by connecting to MySQL and running queries. Finally, we will visualize data with Matplotlib.
Wiza Msuku is data lake team lead at Elizabeth Glaser Pediatric AIDS Foundation, he is responsible for the Ministry of Health Central data repository for the HIV program and related programs throughout the data management life cycle of both structured and unstructured data.
He leads a team with a vision of leveraging on big data, advanced analytics and Machine learning to predict petient outcomes to improve care and treatment. Prior to this, he worked with TNM and AIRTEL Malawi in data management roles.
He has a Masters degree in communications Management from Buckinghamshire University, Bachelor of science degree in computing and information system from London Metropolitan University, certified oracle database admin and a python fanatic