Welcome to my data playground, a repository where I explore and learn to use new tools and technologies for dealing with a variety of data problems. I've decided to publish my projects to this repository so that I can share what I've learned, and hopefully help others who are new to these topics.
I'll use real-world examples as much as possible, because I realize that often there is a significant gap going from toy examples in tutorials/training courses to dealing with real-world data problems.
I'll primarily use python for my examples, but may explore other tools/languages in the future.
1. Wrangling COVID-19 data
In this project I dive into web scraping and data wrangling techniques.
- I'll use the python libraries Requests and Beautiful Soup to automatically download PDFs from the WHO website containing daily COVID-19 data.
- A second implementation uses Selenium, which is essential when you're dealing with dynamic websites that only build up the HTML page once opened in your browser. Think of websites built using frameworks like React, Angular, and Vue.
- Once downloaded, I'll read the table from the PDF using tabula-py and clean up the data using pandas.
Got any questions?
Reach out to me here or open an issue. Happy to have a chat! 😃