Understanding the Basics of Deep Learning and Neural Networks

Last week I had the opportunity to visit my graduate school alma mater, The University of Arizona where I studied artificial intelligence and image processing many years ago. I remember signing up for my first semester classes and electing to challenge myself with Professor Neifeld's neural network class. It already had the reputation of being one of the toughest classes requiring students to understand both the mathematical theory and real-world application of neural networks to solve classification and other problems.

Neural Networks Before Cloud Computing


Of course back then there wasn't cloud computing or easy access to parallel computing methods or deep learning Python libraries. As students, we had to program the algorithms by hand starting with the mathematics of a single neuron, the iterations to loop through all the neurons in each layer, and the algorithms to implement the backpropagation learning algorithms. You were more likely to screw up programming the mathematics before even having had the chance to tune the network properly to solve the challenge.

Needless to say, I learned how to program many neural networks. I assumed when one failed, it was because I had selected the wrong algorithm rather than a flawed implementation.

The reason we have deep learning today is because cloud computing enables us to program multiple layers of thousands of neurons. And instead of programming the intricacies of the algorithms, AI developers are more focussed on how to present  datasets to AI algorithms, selecting algorithms, tuning the learning algorithms, and evaluating the behaviors.

A Simple Explanation of Neural Networks


But as an "old dog" of neural networks, it gives me the opportunity to explain what they are in semi-layman's terms.

Remember linear regression? You applied an algorithm to optimize the linear equation y = mx + b given a dataset of x and y values. Neural networks operate on a similar principle but are nonlinear and approximate a complex curve to fit multidimensional data. In other words, the main differences is that the simple linear regression model is working with one dependent and one independent variable (x and y) to determine slope and intercept (m and b) while neural networks can have many thousands of inputs (like an image), usually a few outputs (is the image a cat?) and can approximate highly complex, multi-dimensional surfaces depending on the number neurons and topology (number of layers) of the network.

How does the network work? Each neuron operates like a transistor with an activation function that dictates its output given a set of inputs. When the network is presented with an input, the first layer of neurons compute their output and feedforward them to the next layer. This is repeated until all layers of the network are computed and the final layer shares its result.

Of course the initial result is likely to be wrong and the network has to be tuned. Given a dataset of known inputs and outputs, backpropagation is one of the many "learning" algorithms used to tune the neurons (adjust their activation functions) so that the network outputs the correct values for the inputs.

If you programmed the network by hand, you'd have to sequence through all the layers of the network and all the neurons in each later to compute their activation functions, capture outputs, and arrive at the network's output. You'd then have to compute the backpropagation algorithm and apply iteratively for every datapoint in your training set. So while the code was relatively straightforward, the computation was very slow and inefficient. Cloud computing and parallel computing models solves this issue.

Neural Networks Today


Of course today, it's unlikely that a programmer or data scientist would be programming the model by hand as there are many libraries and APIs that have these services available. The researcher still has to pick the topology of the network, activation functions, learning algorithm, and other parameters. More importantly, the trainer has to select a way to present the problem set to the network (the inputs and outputs) and identify appropriate training and test datasets.

And that's just one example of "supervised" learning where there is a training set that can be used to teach the network, There is a whole class of unsupervised learning where the network is trained based on the quality of its response.

So while cloud computing and the availability of deep learning APIs has made neural networks available to the masses, it's still not a straightforward undertaking. AI still requires significant investment in agile experimentation to test approaches, validate conclusions, and configure the next set of experiments.

But the results are impressive and many companies with strategic datasets are exploring the science and business value of deep learning. Just in the news the last few weeks - measuring store visits, sports analytics, healthcare research including diagnosing cancer, and even beekeeping. With a forecast to grow to $10.2B by 2025, I suspect we'll see a lot more investment and experimentation over the next several years.  

No comments:

Post a Comment

Comments on this blog are moderated and we do not accept comments that have links to other websites.

Share

About Isaac Sacolick

Isaac Sacolick is President of StarCIO, a technology leadership company that guides organizations on building digital transformation core competencies. He is the author of Digital Trailblazer and the Amazon bestseller Driving Digital and speaks about agile planning, devops, data science, product management, and other digital transformation best practices. Sacolick is a recognized top social CIO, a digital transformation influencer, and has over 900 articles published at InfoWorld, CIO.com, his blog Social, Agile, and Transformation, and other sites. You can find him sharing new insights @NYIke on Twitter, his Driving Digital Standup YouTube channel, or during the Coffee with Digital Trailblazers.