# What is Machine Learning?

## What is deep learning? And how does linear regression fit in?

**Machine learning** is when computers digest a whole lot of observations of the world, and then learn patterns or relationships within those observations. They use the learned patterns and relationships to predict new observations (outcomes) about the world. Machine learning includes the field of statistical learning which is a well-developed subject that has been producing wonderful tools for a long time.

Both **deep learning** (the latest craze in machine learning) and **linear regression** (very old school stuff) are forms of statistical learning. I would go on to say that any machine learning algorithm is a form of statistical learning — because the core principles of statistical learning are necessary to train a machine on the data you feed it. (Training on data is the learning part of machine learning.) I’m actually not aware of any forms of machine learning that aren’t statistical learning, but I’m keeping my mind open here.

**Is linear regression really machine learning?**

Linear regression is just one of those boring things you did in statistics. How could that be machine learning? When you perform a linear regression (with just about any statistical software), the machine is quite literally learning the parameters of a linear model from a training data set. The learned linear model takes observations (the Xs) and predicts outcomes (the Ys) — and it has learned this prediction ability from the data you fed it.

When you feed that learned model a new observation (a new X) it will predict a new Y. If you also add that new observation to the training data set, you will also update your model parameters (i.e., adaptively learn a better model which is a better predictor than the last one). This sort of thing is what is happening when an “AI” checker-playing program (or any machine learning algorithm) gets smarter over time.

**Enough with the Xs and Ys. Can I have an example please? **

Suppose you wanted an algorithm that could predict what drugs would kill a cell. To do that, you would have to feed the algorithm a training data set composed of drug structural features (the Xs) and their effect on the cell (the Ys) so it can parameterize (learn) the underlying predictive (mathematical) model. The model parameters capture the relationship between drug structural features and cell effects. Then you present a new drug to the model (a drug is “new” because it is a new combination of structural features that the algorithm has never seen before). The model will predict the cell-killing effect of that new drug. If the prediction is right, we instinctively feel like the machine is “smart” and has “learned” from the data. All it has really done is fit a model. If it is wrong — we think it’s about as smart as a paper clip.

You could use a linear model for such a learning algorithm (that’s linear regression), or you could use a fancy-pants model like a neural network (certain forms of which are deep learning algorithms). And there are tons of other types of models too. No matter what modeling approach you choose, the training / prediction paradigm is the same.

So every time you run a regression analysis, believe it or not, you are quite literally performing machine learning. It’s not as sexy as a self-driving car, but linear regression really is honest-to-goodness machine learning. Yup, you’ve been a data scientist for years and you didn’t even know it.

In the sciences, plain old regression is often more than adequate. In fact, for the “shallow and wide” (a.k.a., under-determined) data so common in the life sciences, linear regression is often your best option. The constraints of a linear model help you pull out something reasonable from your data when neural networks couldn’t even get out of the starting gate. At a machine learning conference I attended in March 2018 (sponsored by ATUM and Autodesk), the presenters showed that linear regression methods beat the neural network models in predicting phenotypes from nucleic acid sequences. (Lasso regression was the winner. It was developed in the early 2000s.) In other cases, the neural networks were kicking butt in a serious way, particularly in image processing.

**So what is deep learning? **

Deep learning is a form of machine learning that uses a neural network as the model — in particular layers of neural networks that form “deep” stacks of interconnected nodes. In deep learning, there are so many neural layers that the model can have millions of parameters requiring millions of CPU hours to solve them. These models were essentially unsolvable (and therefore useless in practical applications) until about 10 years ago when GPUs became widespread (providing horsepower) and a method was invented to solve them quickly (here). The AI craze that we know today is based on the impressive (almost magical) performance of such deep learning models and training methods on tough problems like the classification of images. Such problems were not handled very well by prior machine learning methods. For a nice summary of the recent history of all this, check out this seminar by John Kaufhold, the founder of Deep Learning Analytics.

**What’s “supervised” and “unsupervised” learning? **

When you feed a model both the observations (the Xs) and the outcomes (the Ys) and ask it to learn the relationship between them, that is “supervised learning”. You are supervising the machine with examples of what it should predict given a set of observations.

Unsupervised learning is when you feed only observations (just Xs) with no outcomes (no Ys) and you ask the machine to group or decompose the observations into a set of patterns. Each pattern is representative of general characteristics of some portion of the observations. Unsupervised learning is a classification problem — the learning here is done by finding a model that can sort (classify) the observations into an optimal set of patterns. What is “optimal” is defined by the algorithm designer. It might, for example, be defined as any model that produces desired number of patterns, or it might be defined as a model that can regenerate the original data most accurately, or it might be a combination of both criteria.

Once you have learned a model this unsupervised way, you can (as usual) feed it a new observation and it will sort that observation into a pattern (e.g. a “cat” pattern) or decompose it into a set of patterns (e.g., visual features or body parts that define a cat). That’s the prediction part of unsupervised learning. This kind of learning is what picture analysis algorithms are typically doing, e.g., segregating pictures into hot dogs and “not hot dogs” within a pile of pictures of both.

**What is AI? **

AI is a vaguely defined term used in popular culture to refer to an array of machine learning methodologies that when combined into software applications and devices seem to endow machines with “smarts”. If there were anything approximating a definition of **AI**, it would be the Turing test. But that’s not what it means in our popular culture today.

**What kinds of tools do machine learning? **

There are tons of toolkits now for doing machine learning (Apache Spark MLib, JMP, TensorFlow, SAS, R, Scikit-learn, Amazon’s ML tools, etc.). It’s kind of a pick-your-favorite world out there. All of them get you to the same place ultimately. Here is a list of some of them.

Part of why machine learning has become so much a part of popular culture today is because of these toolsets. They’re easily accessible, free and open — meaning that almost anyone can tinker with them just like they used to tinker with radios in the 1950s.

**If you are curious about how to apply machine learning to your science, give us a shout at****advice@riffyn.com.**