1. Beer Crawl

    Many websites like Kaggle and the UC Irvine Machine Learning Repository offer a vast selection of clean data sets on which experienced and aspiring data scientists alike can experiment. However the web is full of interesting data that has not been organized in neat tables ready for extraction. For instance one can think of plane ticket prices, real estate listings, stock prices, product reviews, social media feeds, etc. The list goes on!

    In such cases web scraping becomes a very handy skill to have as a data scientist. I've always thought that the idea of creating a script to crawl webpages in order to extract information sounded fun, but I didn't know how to approach this problem until I came across this book, which pedagogically outlines how to use the Python library BeautifulSoup to pull data out of HTML and XML files. I won't get into the specifics of my …

    Read more →

blogroll

social