Introduction to big data

Day 5, 08:30-12:00 am (lecture and exercises)
Michael Zimba

There are three paramount factors that drive a (data) computing paradigm, namely: capacity, throughput and latency. Loosely put, capacity refers to how much data can we store, throughput defines how fast can we transmit data, and latency is the time lapse between issuing an instruction and starting to receive data. From 1956 to 2021, while capacity per unit volume has increased 200 billion times, throughput has increased only 20 thousand times, and latency a mere 150 times. This logarithmic discrepancy amongst the paramount factors has necessitated a shift into a computing paradigm that deploys massive parallelism and batch processing. In Big Data, we consider a portfolio of technologies that are designed to store, manage and analyze data that is too large to fit on a single machine while accommodating for the growing discrepancy amongst capacity, throughput and latency. In our talk, we will specifically introduce the fundamental concept of big data; discuss legacy Hadoop technologies before winding up with Apache Spark ecosystem. Our emphasis will be on the unifying underlying modular architecture which is strikingly common to this diversity of technologies.

Speaker biography

Michael Zimba studied Electrical and Electronics Engineering (BSc (distinction), University of Malawi, 2005); Information Theory, Coding and Cryptography (MSc (distinction), Mzuzu University, 2009); and Computer Science and Technology ( PhD (Doctor of Engineering), Hunan University, 2012). He is a Senior Lecturer in the Department of Information and Communication Technology, Mzuzu University. From 2018 to 2021 he was the Dean of Faculty of Science, Technology and Innovation. Prior to joining the Department of Information and Communication Technology, Michael was a Lecturer of Electronics, Signal Processing and Communication Physics in the Department of Physics and Electronics, Mzuzu University.

At regional level, Michael is a contributing expert on Artificial Intelligence to African Union Development Agency – New Partnership for Africa’s Development (AUDA-NEPAD). AUDA-NEPAD is an arm of African Union High Level Panel on Emerging Technologies, (APET). AUD-NEPAD organizes international seminars and conferences on emerging technologies and their impact.

At national level, Michael is a founding member of the UNDP-led National Innovation Coordinating Team, charged with Science, Technology and Innovation harnessing at national stage. He is involved in organizing discussion fora on STI. Michael is also engaged by the National Council for Higher Education on ICT-Systems and ICT-education quality assurance.

Michael’s teaching and research interests include Data Science; Artificial Intelligence, AI, Machine/Deep Learning; 5G/B5G Mobile and Wireless Communication; Intelligent and Smart Systems; Emerging Technologies, their Impact and Enabling Policies; Digital Multimedia Signal Processing, Analysis and Forensics; System Automation, Modeling and Simulation; Watermarking, Steganography and Cryptology.