mlampros Organizing and Sharing thoughts, Receiving constructive feedback

Clustering using the ClusterR package

This blog post is about clustering and specifically about my recently released package on CRAN, ClusterR. The following notes and examples are based mainly on the package Vignette.

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is the main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics.

Continue reading...

Kernel k nearest neighbors

This blog post is about my recently released package on CRAN, KernelKnn. The package consists of three functions KernelKnn, KernelKnnCV and knn.index.dist. It also includes two data sets (housing data, ionosphere), which will be used here to illustrate the functionality of the package.

k nearest neighbors

In pattern recognition the k nearest neighbors (KNN) is a non-parametric method used for classification and regression. Although KNN belongs to the 10 most influential algorithms in data mining, it is considered as one of the simplest in machine learning.

Continue reading...

OpenImageR, an image processing toolkit

This blog post is about my recently released package on CRAN , OpenImageR. The package supports functions for image pre-processing, filtering and image recognition and it uses RccpArmadillo extensively to reduce the execution time of computationally intesive functions. OpenImageR can be split in 3 parts : basic functions (convolution, cropImage, down_sample_image, flipImage, gamma_correction, imageShow, image_thresholding, List_2_Array, MinMaxObject, NormalizeObject, readImage, resizeImage, rgb_2gray, rotateFixed, rotateImage, writeImage), image filtering (Augmentation, delationErosion, edge_detection, translation, uniform_filter, ZCAwhiten) and image recognition (average_hash, dhash, hash_apply, HOG, HOG_apply, invariant_hash, phash). The following code snippets explain the functionality of the OpenImageR package in more detail,

Continue reading...

San Francisco Crime Classification competition

In this blog post, I’ll explain my approach for the San Francisco Crime Classification competition, in which I participated for the past two months. This competition was hosted by kaggle, a free online platform for predictive modelling and analytics. I ended up in the first 60 places out of 2335 participants and so far is my best personal result. This competition belongs to the knowledge competitions, meaning that the submissions of the participants are evaluated on the whole test data, so there wasn’t any danger of overfitting the leaderboard, as after every submission the true (end) leaderboard score was calculated (no secrets). Furthermore, there weren’t any ranking points, so no particular gain except for learning new methods on how to tackle machine learning problems.

Continue reading...

Linear and logistic regression in Theano

This blog post shows how to use the theano library to perform linear and logistic regression. I won’t go into details of what linear or logistic regression is, because the purpose of this post is mainly to use the theano library in regression tasks. However, details on linear and logistic regression can be found on the Wikipedia website. For the purpose of this blog post, I created a small python package Regression_theano, which resides in my Github repository. So, assuming that you,

Continue reading...