09 Jun 2016
In this blog post, I’ll explain my approach for the San Francisco Crime Classification competition, in which I participated for the past two months. This competition was hosted by kaggle, a free online platform for predictive modelling and analytics. I ended up in the first 60 places out of 2335 participants and so far is my best personal result. This competition belongs to the knowledge competitions, meaning that the submissions of the participants are evaluated on the whole test data, so there wasn’t any danger of overfitting the leaderboard, as after every submission the true (end) leaderboard score was calculated (no secrets). Furthermore, there weren’t any ranking points, so no particular gain except for learning new methods on how to tackle machine learning problems.
Continue reading...
11 Apr 2016
This blog post shows how to use the theano library to perform linear and logistic regression. I won’t go into details of what linear or logistic regression is, because the purpose of this post is mainly to use the theano library in regression tasks. However, details on linear and logistic regression can be found on the Wikipedia website. For the purpose of this blog post, I created a small python package Regression_theano, which resides in my Github repository. So, assuming that you,
Continue reading...
14 Mar 2016
This blog post is about randomly searching for the optimal parameters of various algorithms employing resampling in R. A randomized search simply samples parameter settings a fixed number of times from a specified subset of the hyperparameter space of a learning algorithm. This method has been found to be more effective in high-dimensional spaces than an exhaustive search (grid-search). Moreover, the purpose of random search is to optimize the performance of an algorithm using a resampling method such as cross-validation, bootstrapping etc. for a better generalization.
Continue reading...
14 Feb 2016
This blog post is about feature selection in R, but first a few words about R. R is a free programming language with a wide variety of statistical and graphical techniques. It was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and is currently developed by the R Development Core Team. R comes by installation with a core number of packages, which can be extended with more than 7,801 additional packages (as of January 2016). Packages can be downloaded from either the Comprehensive R Archive Network (CRAN) or from other sources like Github or the Bioconductor. Many of those statistical packages are written in R itself, however, a nice feature of R is that it can be linked to lower-level programming languages ( such as C or C++ ) for computationally intensive tasks. More information about R can be found here.
Continue reading...
31 Jan 2016
In my first blog post, I’ll explain how I created my blog. I don’t have any knowledge of building web sites and somehow I thought it will be difficult. However, after lots of ‘googling’, I finally managed it.
I use the Lanyon theme of the Poole sliding sidebar theme, which is a jekyll setup. For all those, like me, who a week ago didn’t know what Poole or jekyll is, the following two links can give some more details,
Continue reading...