Program: Malawi Data Science Bootcamp
When | Theme |
---|---|
Day 1 | Introduction to data science, data, data manipulation and visualization |
Day 2+3 | Statistical and machine learning for data science |
Day 4 | Big data analytics |
Day 5 | Responsible Data Science + group sessions |
We are currently adding abstracts and titles from the program
Understanding a data pipeline with emphasis on ETL
Day 1, 10:45-12:15 pm (lecture)
Wiza Msuku 🐦 @ugly_as_eva
In this lecture, we will understand some basic concepts of data science: The lecture is a deep-dive into data pipelines from data extraction to destination, where the data is consumed by individuals or downstream systems. We will provide a walkthrough of ETL significance in data science, explore key …
Read more »Web scraping for data science
Day 1, 1:30-3:00 pm (lecture)
Ralph Tambala
In each data science project, data collection is the first and most important stage. Some websites and cloud services provide APIs to ease data collection by external users. However, most websites do not. A data science endeavour can be hampered by a lack of APIs or publicly available data for …
Read more »Practical session on web scraping
Day 1, 4:15-5:15 pm (tutorial)
Ralph Tambala
In each data science project, data collection is the first and most important stage. Some websites and cloud services provide APIs to ease data collection by external users. However, most websites do not. A data science endeavour can be hampered by a lack of APIs or publicly available data for …
Read more »The importance of datasets for machine learning and data science
Day 1, 9:30-10:30 am (lecture)
Amelia Taylor 🐦 @AT_poly_AI
The growth in the use of technology is evident in all sectors of human activity, i.e. education, business, social life and government. This is due to many factors. Two that are most often quoted are ‘the availability of massive datasets’ and ‘cheap computing power’. Data science aims at applying a …
Read more »Basic machine learning algorithms and their application
Day 2, 10:30-12:00 am (lecture)
Akuzike Banda
In today’s world, there has been rapid growth in recent years in the area of Machine Learning. Machine learning focuses on building systems with the ability to learn and enhance from experience without being programmed to do so. Algorithms are used to make all of this happen. Machine learning …
Read more »Linear regression for machine learning
Day 2, 1:00-4:30 pm (lecture and exercises)
Winnie Wezi Mkandawire
Linear Regression is one of the fundamental supervised-machine learning algorithms. While it is relatively simple and might not seem fancy enough when compared to other Machine Learning algorithms, it remains widely used across various domains such as Biology, Social Sciences, Finance, and …
Read more »Mathematics and statistics concepts for machine learning and data science (application in healthcare)
Day 3, 10:30-12:00 am (lecture)
Richard J Munthali 🐦 @RichardMunthali
This lecturer will aim to systematically help participants to “master” and “recognize” the core concepts in statistics and mathematics which are used at different stages in machine learning and data science projects. Concepts like statistics and probability, descriptive …
Read more »Practical session on support vector machines and random forest
Day 3, 1:00 - 2:30 pm (tutorial)
Tiwonge Msulira Banda 🐦 @tiwobanda
This practical session will provide students the opportunity to to use the two classification algorithms: Support Vector Machines (SVM) and Random Forest (RF) on real datasets to make classification predictions.
Read more »Mathematics and statistics concepts for machine learning and data science (application in healthcare)
Day 3, 3:00-4:30 pm (tutorial)
Richard J Munthali 🐦 @RichardMunthali
In this session, we will go through practical examples in healthcare data using python and Jupyter notebooks. From Exploratory data analysis (EDA) to model building, model selection and saving the best model and use it to predict new data. We will use some statistical and mathematical concepts to …
Read more »Support vector machines and random forest
Day 3, 8:30 - 10:00 am (lecture)
Tiwonge Msulira Banda 🐦 @tiwobanda
This Lecture will continue our journey with Supervised Machine Learning, with a focus on classification problems. Two algorithms will be covered, namely Support Vector Machines (SVM) and Random Forest (RF). The two algorithms will be used to classify phishing emails from a real dataset. We will go …
Read more »Big data for development use cases
Day 4, 10:00-11:30 am (lecture and exercises)
Rachel Sibande 🐦 @RachelSibande
This Lecture will delve into real practical examples of how Big Data has been applied in real life. The Lecture will highlight diverse use cases in health, food security monitoring and disaster management among others from over 5 countries. The Lecture shall focus on the practical use of Mobile …
Read more »Databases for data scientists
Day 4, 1:00-2:30 pm (tutorial)
Maria Maistro
We will create a small database: create and populate tables, run some simple SQL queries, connect to the database with Python.
Read more »Databases for data scientists
Day 4, 8:30-10:00 am (lecture)
Maria Maistro
This lecture will be an introduction to databases. We will briefly cover the entity-relationship model (entities, relationship and attributes), the relational model (relation, primary key, foreign key) and SQL queries (select, where, limit, join, etc.).
Read more »Introduction to big data
Day 5, 08:30-12:00 am (lecture and exercises)
Michael Zimba
There are three paramount factors that drive a (data) computing paradigm, namely: capacity, throughput and latency. Loosely put, capacity refers to how much data can we store, throughput defines how fast can we transmit data, and latency is the time lapse between issuing an instruction and starting …
Read more »