1. Legendary Pokemon Classifier

    There is no denying that Pokemon had a big influence on all the kids of my generation. I remember being 8 or 9 and looking forward to finishing school so I could catch up on the adventures of Ash and Pikachu. I also remember the fun I would have playing Pokemon Stadium on Nintendo 64 with my cousins on the weekend. The phenomenal popularity of Pokemon GO last year further confirmed that the nostalgia factor is still strong for a lot of people, even to this day.

    I was browsing Kaggle for datasets to practice classification algorithms when I came across one describing the first 6 generations of Pokemon with a total of 721 Pokemon, of which 46 are legendary. Bingo! I thought. This dataset is not only a fun way to experiment with classifiers to predict whether a Pokemon is legendary or not, but also provides a way to simulate an end-to-end machine learning project. Moreover, evaluating the performance of our models will require careful thinking since only a small fraction (6.4% to be exact) of the Pokemon are legendary.

    Read more →
  2. Dogs vs. Cats - Classification with VGG16

    Convolutional neural networks (CNNs) are the state of the art when it comes to computer vision. As such we will build a CNN model to distinguish images of cats from those of dogs by using the Dogs vs. Cats Redux: Kernels Edition dataset.

    Pre-trained deep CNNs typically generalize easily to different but similar datasets with the help of transfer learning. The reason is simple: the filters present in the earlier convolutional layers of a CNN usually capture low-level features such as straight lines, whereas higher-level filters recognizing complex objects such as faces are activated deeper in the network. As such it is possible to directly use the training weights associated with shape recognition and retrain only the deepest layers of the network - a procedure called finetuning or transfer learning - to perform classification tasks on different types of images.

    For this competition we follow the process described in the deep learning course fast.ai Read more →

  3. Stochastic Optimization

    In a world where data can be collected continuously and storage costs are cheap, issues related to the growing size of interesting datasets can pose a problem unless we have the right tools for the task. Indeed, in the event where we have streaming data it might be impossible to wait until the "end" before fitting our model since it may never come. Alternatively it might be problematic to even store all of the data, scattered across many different servers, in memory before using it. Instead it would be preferable to do an update each time some new data (or a small batch of it) arrives. Similarly we might find ourselves in an offline situation where the number of training examples is very large and traditional approaches, such as gradient descent, start to become too slow for our needs.

    Stochastic gradient descent (SGD) offers an easy solution to all of these problems.

    Read more →
  4. Page 1 / 1

blogroll

social