Program: Malawi Data Science Bootcamp

When	Theme
Day 1	Introduction to data science, data, data manipulation and visualization
Day 2+3	Statistical and machine learning for data science
Day 4	Big data analytics
Day 5	Responsible Data Science + group sessions

We are currently adding abstracts and titles from the program

Web scraping for data science

Day 1, 1:30-3:00 pm (lecture)

Ralph Tambala

In each data science project, data collection is the first and most important stage. Some websites and cloud services provide APIs to ease data collection by external users. However, most websites do not. A data science endeavour can be hampered by a lack of APIs or publicly available data for …

Understanding a data pipeline with emphasis on ETL

Day 1, 10:45-12:15 pm (lecture)

Wiza Msuku 🐦 @ugly_as_eva

In this lecture, we will understand some basic concepts of data science: The lecture is a deep-dive into data pipelines from data extraction to destination, where the data is consumed by individuals or downstream systems. We will provide a walkthrough of ETL significance in data science, explore key …

Practical session on web scraping

Day 1, 4:15-5:15 pm (tutorial)

Ralph Tambala

The importance of datasets for machine learning and data science

Day 1, 9:30-10:30 am (lecture)

Amelia Taylor 🐦 @AT_poly_AI

The growth in the use of technology is evident in all sectors of human activity, i.e. education, business, social life and government. This is due to many factors. Two that are most often quoted are ‘the availability of massive datasets’ and ‘cheap computing power’. Data science aims at applying a …

Linear regression for machine learning

Day 2, 1:00-4:30 pm (lecture and exercises)

Winnie Wezi Mkandawire

Linear Regression is one of the fundamental supervised-machine learning algorithms. While it is relatively simple and might not seem fancy enough when compared to other Machine Learning algorithms, it remains widely used across various domains such as Biology, Social Sciences, Finance, and …

Basic machine learning algorithms and their application

Day 2, 10:30-12:00 am (lecture)

Akuzike Banda

In today’s world, there has been rapid growth in recent years in the area of Machine Learning. Machine learning focuses on building systems with the ability to learn and enhance from experience without being programmed to do so. Algorithms are used to make all of this happen. Machine learning …

Practical session on support vector machines and random forest

Day 3, 1:00 - 2:30 pm (tutorial)

Tiwonge Msulira Banda 🐦 @tiwobanda

This practical session will provide students the opportunity to to use the two classification algorithms: Support Vector Machines (SVM) and Random Forest (RF) on real datasets to make classification predictions.

Mathematics and statistics concepts for machine learning and data science (application in healthcare)

Day 3, 10:30-12:00 am (lecture)

Richard J Munthali 🐦 @RichardMunthali

This lecturer will aim to systematically help participants to “master” and “recognize” the core concepts in statistics and mathematics which are used at different stages in machine learning and data science projects. Concepts like statistics and probability, descriptive …

Mathematics and statistics concepts for machine learning and data science (application in healthcare)

Day 3, 3:00-4:30 pm (tutorial)

Richard J Munthali 🐦 @RichardMunthali

In this session, we will go through practical examples in healthcare data using python and Jupyter notebooks. From Exploratory data analysis (EDA) to model building, model selection and saving the best model and use it to predict new data. We will use some statistical and mathematical concepts to …

Support vector machines and random forest

Day 3, 8:30 - 10:00 am (lecture)

Tiwonge Msulira Banda 🐦 @tiwobanda

This Lecture will continue our journey with Supervised Machine Learning, with a focus on classification problems. Two algorithms will be covered, namely Support Vector Machines (SVM) and Random Forest (RF). The two algorithms will be used to classify phishing emails from a real dataset. We will go …

Databases for data scientists

Day 4, 1:00-2:30 pm (tutorial)

Maria Maistro

We will create a small database: create and populate tables, run some simple SQL queries, connect to the database with Python.

Big data for development use cases

Day 4, 10:00-11:30 am (lecture and exercises)

Rachel Sibande 🐦 @RachelSibande

This Lecture will delve into real practical examples of how Big Data has been applied in real life. The Lecture will highlight diverse use cases in health, food security monitoring and disaster management among others from over 5 countries. The Lecture shall focus on the practical use of Mobile …

Databases for data scientists

Day 4, 8:30-10:00 am (lecture)

Maria Maistro

This lecture will be an introduction to databases. We will briefly cover the entity-relationship model (entities, relationship and attributes), the relational model (relation, primary key, foreign key) and SQL queries (select, where, limit, join, etc.).

Introduction to big data

Day 5, 08:30-12:00 am (lecture and exercises)

Michael Zimba

There are three paramount factors that drive a (data) computing paradigm, namely: capacity, throughput and latency. Loosely put, capacity refers to how much data can we store, throughput defines how fast can we transmit data, and latency is the time lapse between issuing an instruction and starting …