Web scraping for data science

Day 1, 1:30-3:00 pm (lecture)

Ralph Tambala

In each data science project, data collection is the first and most important stage. Some websites and cloud services provide APIs to ease data collection by external users. However, most websites do not. A data science endeavour can be hampered by a lack of APIs or publicly available data for download. In this session, we will look at how data may be extracted from the Internet using a technique known as web scraping. We will use Python to parse a live webpage and collect data of interest from it. Additionally, we will look into how to prepare and save the data for use. In conclusion, we will discuss the ethical and legal implications of web scraping.

Speaker biography

Ralph Tambala is a Lecturer of Computer Science at the Malawi University of Science and Technology (MUST). He loves coding, appreciates art and enjoys a good read. His research interests include Data Science, Machine Learning and Natural Language Processing.