mlampros Organizing and Sharing thoughts, Receiving constructive feedback

Regularized Greedy Forest in R

This blog post is about my newly released RGF package (the blog post consists mainly of the package Vignette). The RGF package is a wrapper of the Regularized Greedy Forest python package, which also includes a Multi-core implementation (FastRGF). Portability from Python to R was made possible using the reticulate package and the installation requires basic knowledge of Python. Except for the Linux Operating System, the installation on Macintosh and Windows might be somehow cumbersome (on windows the package currently can be used only from within the command prompt). Detailed installation instructions for all three Operating Systems can be found in the README.md file and in the rgf_python Github repository.

Continue reading...

Statoil / C-CORE Iceberg Classifier Competition

For the last two months, I had participated in a machine learning competition organized by Kaggle (platform for predictive modeling and analytics), where I ended up in the top 1 % on the private leaderboard or 24th out of 3343 participants. I thought it would be worth writing a blog post in order to both share my experience / insights and keep a reference of key features for satellite imagery ( Sentinel-1 satellite data and specifically HH - transmit/receive horizontally - and HV - transmit horizontally and receive vertically ) in case it might be useful in the future.

Continue reading...

Geospatial Queries using Pymongo in R

Since I submitted the geojsonR package I was interested in running geospatial MongoDB queries using GeoJson data. I decided to use PyMongo (through the reticulate package) after opening two Github issues here and here. In my opinion, the PyMongo library is huge and covers a lot of things however, my intention was to be able to run geospatial queries from within R.

The GeoMongo package

The GeoMongo package allows the user,

  • to insert and query only GeoJson data using the geomongo R6 class
  • to read data in either json (through the geojsonR package) or BSON format (I’ll explain later when BSON is necessary for inserting data)
  • to validate a json instance using a schema using the json_schema_validator() function (input parameters are R named lists)
  • to utilize MongoDB console commands using the mongodb_console() function. The mongodb_console() function takes advantage of the base R system() function. For instance, MongoDB console commands are necessary in case of bulk import / export of data as documented here and here.

Continue reading...

Fuzzy string Matching using fuzzywuzzyR and the reticulate package in R

I recently released an (other one) R package on CRAN - fuzzywuzzyR - which ports the fuzzywuzzy python library in R. “fuzzywuzzy does fuzzy string matching by using the Levenshtein Distance to calculate the differences between sequences (of character strings).”

There is no big news here as in R already exist similar packages such as the stringdist package. Why then creating the package? Well, I intend to participate in a recently launched kaggle competition and one popular method to build features (predictors) is fuzzy string matching as explained in this blog post. My (second) aim was to use the (newly released from Rstudio) reticulate package, which “provides an R interface to Python modules, classes, and functions” and makes the process of porting python code in R not cumbersome.

First, I’ll explain the functionality of the fuzzywuzzyR package and then I’ll give some examples on how to take advantage of the reticulate package in R.

Continue reading...

Processing of GeoJson data in R

This blog post is about my recently released package on CRAN, geojsonR. The following notes and examples are based mainly on the package Vignette.

GeoJSON is an open standard format designed for representing simple geographical features, along with their non-spatial attributes, based on JavaScript Object Notation. The features include points (therefore addresses and locations), line strings (therefore streets, highways and boundaries), polygons (countries, provinces, tracts of land), and multi-part collections of these types. GeoJSON features need not represent entities of the physical world only; mobile routing and navigation apps, for example, might describe their service coverage using GeoJSON. The GeoJSON format differs from other GIS standards in that it was written and is maintained not by a formal standards organization, but by an Internet working group of developers.”

Continue reading...