Machine Learning: Linear regression and gradient descent – Part 1

The purpose of this article is to understand how gradient descent works, by applying it and illustrating on linear regression. We will have a quick introduction to Linear regression before jumping on to the estimation techniques. Please feel free to skip the background section, if you are familiar with linear regression. The reason why I chose to write on this topic is that “a good understanding of gradient descent and…

Read More

Our data platform with Docker

This article has been co-authored by Rafi Syed and Sree Pratheep. As a continuation of our earlier article on Docker, here is a brief about how we started our journey in Docker. Our goal was to have a completely self-sufficient development cum integration environment – to make the development experience smoother and to reduce the ramp up time for any newcomer to the team. Also, to break the dependency on…

Read More

Tools in the data armoury: R vs Spark

The purpose of this article is twofold. The first is to give a quick comparison between R and Spark, in terms of performance. The second is to introduce you to Spark’s ML library. Background As R is inherently single threaded, it may not be wise to compare Spark and R in terms of performance. Though it is not an ideal comparison, some of the numbers below will definitely excite someone…

Read More

Why we chose Docker to build Crayon’s data processing platform

Historically, it has been a struggle for enterprise software vendors to control the deployment environment on their client’s installation base. Without proper control on the deployment environment, troubleshooting any issues on the client’s setup was a nightmare. Initially, companies delivered an appliance with a customized OS image as part of it. That gave complete control on the environment right from hardware to OS and through software applications installed as part…

Read More