This resource is primarily designed for beginner to intermediate data scientists or analysts who are interested in identifying and applying machine learning algorithms to address the problems of their interest.

A common question asked by a newbie when faced with a wide variety of machine learning algorithms is, "Which algorithm should I use?" The answer to the question depends on several factors, including:




The size, quality and nature of the data.

Available computational time.

urgency of work.

What do you want to do with the data.

Even an experienced data scientist cannot tell which algorithm will perform best before trying different algorithms. We're not advocating a one-to-one approach, but we do hope to provide some guidance based on some obvious factors on which algorithms to try first.

Machine Learning Algorithms Cheat Sheet

The Machine Learning Algorithm Cheat Sheet helps you choose from various machine learning algorithms to find the right one for your specific problems. This article walks you through the process of using Leaflet.

Since the cheat sheet is designed for beginner data scientists and analysts, we'll make some simple assumptions when we talk about algorithms.

The algorithms suggested here are the result of feedback and suggestions compiled from multiple data scientists and machine learning experts and developers. There are many issues on which we have not reached an agreement and for these issues we try to highlight the similarities and resolve the differences.

Additional algorithms will be added later as our library grows to include a more complete set of available methods.

how to use cheat sheet

Read the path and algorithm label on the chart as "if <path label> then use <algorithm>". for example:

If you want to do dimension reduction then use principal component analysis.

If you need a numerical prediction quickly, use decision trees or linear regression.

If you need a hierarchical result, use hierarchical clustering.

Sometimes more than one branch will apply, and other times none of them will be a perfect match. It is important to remember that these paths are intended to be rule-of-thumb recommendations, so some recommendations are not accurate. Many data scientists I spoke to said that the only surefire way to find the best algorithm is to try them all.

Types of Machine Learning Algorithms

This section provides an overview of the most popular types of machine learning. If you are familiar with these categories and would like to discuss specific algorithms, you can skip this section and go to "When to use specific algorithms" below.

supervised learning

Supervised learning algorithms make predictions based on a set of examples. For example, historical sales can be used to forecast future prices. With supervised learning, you have an input variable labeled training data and a desired output variable. You use algorithms to analyze training data to learn a function that maps inputs to outputs. This prediction function maps new, unknown examples by generalizing from training data to predict outcomes in unseen situations.

Classification: When the data is being used to predict a categorical variable, supervised learning is also called classification. This condition occurs when an image is assigned to a label or indicator, dog or cat. When there are only two labels, it is called binary classification. When there are more than two categories, the problems are called multi-class classification.

Regression: When predicting continuous values, the problem becomes a regression problem.

Forecasting: It is the process of making predictions about the future on the basis of past and present data. It is most commonly used to analyze trends. A common example might be an estimate of next year's sales based on the current year's and previous years' sales.