Learning Machine Learning: A beginner's journey

I am learning about Machine Learning and Deep Learning (ML/DL) since last one year. I think ML/DL is here to stay. I don't think it's some fad or bubble! Here's why:

are the results of ml/dl. It's hard to argue against success.

ML/DL has been growing organically since 1985 (with backpropagation algorithms) and went through a second phase of acceleration after 2005 (with the widespread availability of big data and distributed data processing platforms). The rise of ML/DL is following an ascending curve pattern, not a propagating short-term bubble pattern. Since it has grown slowly over several years, I'm betting it will be for at least the same amount of time.

ML/DL is co-developed with Applications. It has evolved a lot on the practice side with trial and error, and its theory is still a bit lagging behind and unable to explain many things. According to Nasim Taleb's estimate, ml/dl is antifragile.

Learning Machine Learning: A beginner's journey

Behind this is the market of ML/DL. Big money provides big incentives and has been attracting a lot of smart people. So many smart people can't be wrong.

There's certainly a lot of hype about ML/DL as well. ML/DL proved viable for a specific set of applications, and it is an exaggeration to claim that general AI has arrived. We are far from it. But that's a good thing, because we'll have a lot of succulent problems to work with.

That's why I'm doubling the ml/dl.

Here are my first impressions of learning about ML/DL. ML/DL uses a very different toolkit and approach than the distributed systems sector. I was initially surprised and intrigued by the very experimental and trial and error nature of ML/DL. ML/DL is dealing with noisy/fuzzy/messy real world data and naturally the field has devised statistical and probabilistic tools. The verification is through showing the performance on the test set only. The data set is king. Debugging is a mess, and learning is very opaque. On the other hand, I really like the dynamics in the ML/DL field. There are lots of resources and platforms and lots of interesting applications.

My interest in ML/DL lies in its interactions with distributed systems. I am not interested in writing image/text/speech processing applications. I learned about ML/DL to think about two questions:

How can we build better distributed systems/architectures to improve the performance of ML/DL systems/applications?

How can we use ML/DL to build better distributed systems?

These are big questions and will take a long time to properly answer, so I hope to revisit them later. Below I talk about how I went about learning ML/DL, and I look forward to writing a short summary introductory ML/DL concepts and mechanisms in the coming days.

How did I go about learning ML/DL

In January, I started following Andrew Ng's machine learning course at Coursera. (Alternatively, here's Ng's course material for CS 229 at Stanford.) After the kids were asleep, I spent an hour each night following videos of Ng's class. Andrew Ng has a nice and simple way of explaining ML concepts. He is a very good teacher.

On a side note, if you want to learn a bit about Ng's thinking process and her approach to life, creativity, and failure, I highly recommend this interview. It's a very good read.

I really liked the first 3 weeks of NG's course: Introduction to Linear Regression, Linear Regression with Multiple Features, and Logistic Regression and Regularization. But as the course went into logistic regression with non-linear decision boundaries, I began to get overwhelmed by the amount of information and complexity. And as the course progressed in neural networks, I started to get lost. For example, I could not build a good mental model and diagram of the forward and backward propagation in neural networks. So those parts didn't stay with me. (I wasn't following programming assignments well either.)

I think the problem was that NG was explaining neural network concepts in a generic/generic way. It seemed very abstract to me. It could have worked better if he had settled on a small concrete use case and explained the concepts that way.

Recently, I began auditing a deep learning course on Udacity with Vincent VanHock, a Google engineer. This course provided a simple introduction to deep learning. The course began with the Multinational Logistics Classification. Since I knew about logistic regression, I could easily follow it. I liked the softmax function, one-hot encoding and cross entropy ideas because they are all very practical and solid concepts. The course introduced these with a use case of MNIST letter classification for the first 10 letters.

Then using the same MNIST example, the course introduced Rectified Linear Units (ReLu) as a simple way to introduce non-linearity and showed how to build a deep network

Post a Comment

0 Comments