Neural Networks

A neural network is an adaptive, parallel and distributed information processing system. Neural network research is generally considered to be a branch of Artificial Intelligence, and its techniques have found a broad range of application in all sorts of engineering and programming domains from signal processing to financial analysis. These systems are called "neural" because they draw on an analogy to information processing in the brain, which is achieved by networks of cells called neurons. The art of building neural networks, sometimes called Neurocomputing, Connectionism or Parallel and Distributed Processing (PDP), actually consists of a varied range of techniques which differ in many ways, and can take the form of programs or specialized hardware.

Neural networks were first conceived by Warren McCulloch and Walter Pitts in 1943. The McCulloch-Pitts neuron is a binary switch, or logic gate, which could be linked into networks in order to compute logical functions. Donald Hebb (1949) first proposed a mechanism for learning in networks of logic gates based on conditioned response. In Hebbian learning, a connection strength between two neurons is increased whenever the two neurons fire at the same time, and decays over time otherwise. Thus, subnetworks involved in a shared function will grow more tightly coupled.

The first neural network computer, the SNARC, was built by Marvin Minsky, the future founder of the MIT Artificial Intelligence Laboratory, in 1951 for his doctoral thesis at Princeton. Frank Rosenblatt's (1958) Perceptron was the first adaptive threshold-gate neuron model, and was proven to have the ability to learn any function which it could represent. The Perceptron was studied both in the form of Cornell University's Mark I Perceptron, and in various digital computer simulations throughout the 1960s. Another early algorithm for which iterative convergence on a good answer could be proven was the Least Mean Squares Algorithm, developed by Bernard Widrow and Theodore Hoff at Stanford in 1959. Widrow and Hoff constructed the ADALINE (adaptive linear neuron) analog computer as a realization of this algorithm, and Hoff later went on to invent the Microprocessor at the INTEL Corporation. Carver Mead has also done a great deal of work on VLSI designs for implementing neural networks.

What all neural networks have in common is that they can be represented by a set of computational elements-mathematically idealized neurons-which are linked together by weighted connections. A more formal way to represent a neural network is as a graph, where the information processing capacity rests in the weights of the edges, and each node computes its own function. The differences among the various neural network techniques consist in the different connective architectures of networks, the different mathematical functions which the neuronal elements compute, and the different means of training the networks to get them to perform as desired. There are two main types of architectures: feedforward networks which can be represented by a directed acyclic graph, and recurrent or feedback networks whose graphs contain loops or cycles.

In general, feedforward neural networks perform one version or another of statistical pattern analysis and classification. That is to say that they induce statistical patterns from training data to learn a representative function, then apply this function to classify future examples. A classification is simply a mapping function from inputs to outputs, in other words it maps the objects to be classified into their types or classes. Consider, for example, the problem of classifying all 2-dimensional geometric shapes into one type of the set: [square, circle, triangle, other]. A total mapping function would assign every member of the set of 2-dimensional geometric shapes to one of these four types. There are many possible mapping functions, however, and only a few of these will classify the inputs in a desirable way.

Supervised learning offers a way to automatically learn a good mapping function by presenting the network with correctly classified examples. A network is typically trained until it is able to correctly classify most of the examples it has seen, up to some desired percentage. The accuracy of a trained network on new examples will depend on the size of the training data set, the number of times it sees the training set or number of training epochs, the learning algorithm used to modify the network after each epoch, the complexity of the classification space, the complexity of the example space, and how representative the training data is of the real world. A network can also be overtrained if it is trained for too many epochs and begins to lose the ability to generalize to new examples when they do not look exactly like training examples.

Other learning techniques simply learn statistical patterns in the data-such techniques include grouping data points into clusters and learning to map examples to key or prototypical examples which represent the "center"of a class. These techniques are called "unsupervised learning" because they learn representations of the data without receiving corrections from a supervisor.

The first feedforward networks contained a single-layer of neurons connected to inputs, while more recent multilayer network architectures contain an input layer, output layer, and one or more hidden layers. Single layer networks, such as Perceptron, are limited to learning mapping functions for data which is linearly separable-if the data points for two classes were plotted on a 2-dimensional plane, then it would be possible to divide the classes using a straight line. The concept of linear separability can be extended to many dimensional spaces using a measure called the VC-dimension (Vapnik 1998). The addition of hidden layers greatly increases the complexity of the possible mapping functions a network can learn-such networks can classify data even when they are not linearly separable. Multilayer networks were only made possible by the introduction of the Error Backpropagation Algorithm, which propagates output errors back through multiple hidden layers. Backpropagation was independently discovered at least three times (Amari 1967, Bryson and Ho 1969, Werbos 1972).

Recurrent networks were first introduced by the physicist John Hopfield (1982). The two principle types of recurrent neural networks are Boltzmann Machines and Hopfield Networks. Both architectures draw on the mathematics of thermodynamics and physical dynamics respectively to define a phase space. This phase space contains points which are "attractors" and represent equilibria points for the network. Once configured, the network is allowed to run as a dynamic system with quantities of energy cycling through the network repeatedly until the network settles into a stable equilibrium state. In these networks, the inputs are represented by the initial configuration parameters, and the output is represented by the final steady state.

Both kinds of neural networks can be viewed as optimization techniques: feedforward networks find optimal mapping functions between inputs and outputs, while recurrent networks find the lowest energy state of a physical system. Unfortunately, all neural network architectures are prone to finding only local maxima or minima rather that global optimizations, and special mathematical techniques are employed by the various architectures to increase their chances of finding better optimizations.

Neural networks are "distributed" in the sense that what the network represents is distributed over all of the elements of the network. Consider, for example, a neural network which classifies geometric shapes into circles and squares. In such a network, there will not be a single node which represents squares and another which represents circles. Instead there will be many nodes, all of which together represent both squares and circles-each node might represent something more like the "flatness," "roundness," or "cornerness" of a small geometric region. On its own, the information processed by each node is not particularly useful or interesting, it is only when the outputs of all the nodes are taken together that a useful result can be obtained.

Similarly, these networks are "parallel" in the sense that each of the nodes acts on their own and (at least in theory) simultaneously. Whereas most programming techniques employ algorithms which work by proceeding in sequential steps, the computational elements of a neural network depend only upon the information they receive as inputs and their local memory, and only effect those elements which receive their outputs. In the case of a network which classifies circles and squares, the "flatness" nodes and the "roundness" nodes for all the different regions can compute their functions independently and simultaneously-in parallel-without depending on knowing the judgements of their neighbors. Thus each neuron acts on its own and on local information, while the network as a whole performs a useful function.

Neural network systems are argued to be superior to traditionally programmed systems because they do not require the programmer to completely understand the processes which the system carries out, and the systems can be quite flexible and robust. Since neural networks learn from training data, the programmer does not have to "discover" the underlying patterns of the data, since the network can learn these for itself. And because neural network representations are distributed, they can give good results even if some nodes are removed or are faulty. And by providing a mapping of every possible input to some output, they offer a robust form of generalizing from training data to new data. Thus, even though a network hasn't seen a particular example, as long as it is sufficiently similar to previous examples from the same class, the network will correctly classify this new example.

The most notable applications of neural networks involve various forms of pattern recognition and control systems. The neural networks of the 1960s were able to recognize hand-written characters and digits, as well as spoken phonemes and words, though not very robustly. NETtalk was a neural network which learned to read written text aloud by finding a mapping from text to phonemes (Sejenowski and Rosenberg 1987). It is said to have sounded like a babbling baby as it learned, and did much to popularize neural network research in the media. ALVINN (Autonomous Land Vehicle in a Neural Network) is a neural network which learned to map images of a road from a video camera to control signals for a steering wheel, accelerator and brake of a car (Pomerleau 1993). ALVINN's successor successfully drove a car across the United States in 1995 (the neural net was in control 98% of the way). An enormous amount of effort has been put into building neural networks to predict financial trends in stock and commodities markets. These networks can perform quite well but are still susceptible to the great uncertainties and capriciousness of economic markets. More practical applications have included forgery analysis of signatures on credit card purchases, the analysis of an individual's credit risk, credit and insurance fraud detection, and the visual inspection of manufactured parts for flaws.

by Peter M. Asaro

1773 words

For Further Research

Anderson, James A., and Edward Rosenfeld, Editors. Talking Nets: An Oral History of Neural Networks. Cambridge, MA: MIT Press, 1998.

Anderson, James A., and Edward Rosenfeld, Editors. Neurocomputing: Foundations of Research. Cambridge, MA: MIT Press, 1988.

Anderson, James A., A. Pellionisz, and Edward Rosenfeld, Editors. Neurocomputing 2: Directions for Research. Cambridge, MA: MIT Press, 1990

Arbib, M. A. (1995). The Handbook of Brain Theory and Neural Networks. Cambridge, MA: Bradford Books/MIT Press.

Bishop, C. M. Neural Networks for Pattern Recognition. Oxford, UK: Oxford University Press, 1995.

Duda, R. O., and P. E. Hart. Pattern Classification and Scene Analysis. New York, NY: Wiley, 1973.

Fukunaga, K. Introduction to Statistical Pattern Recognition. 2nd edition. San Diego, CA: Academic Press, 1990.

Haykin, Simon. Neural Networks: A Comprehensive Foundation. 2nd edition. New York, NY: Macmillan College Publishing, 1999.

Kearns, M. J., and U. V. Vazirani. An Introduction to Computational Learning Theory. Cambridge, MA: MIT Press, 1994.

McClelland, John, David Rumelhart, and the PDP Research Group, Editors. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Volume 2. Cambridge, MA: MIT Press, 1986.

Rumelhart, David, John McClelland and the PDP Research Group, Editors. Parallel and Distributed Processing: Explorations in the Microstructure of Cognition, Volume 1: Foundations. Cambridge, MA: MIT Press, 1986.

Vapnik, Vladimir N. Statistical Learning Theory. New York, NY: Wiley, 1998.

Widrow, B., and M. A. Lehr. "30 years of adaptive neural networks: Perceptron, madeline, and backpropagation." Proceedings of the IEEE 78(9): pp. 1415 - 1442, 1990.


Amari, S. "A Theory of Adaptive Pattern Classifiers." IEEE Transactions on Electronic Computers, vol. EC-16: pp.299-307, 1967.

Hopfield, John J. "Neural networks and physical systems with emergent collective computational abilities." Proceedings of the National Academy of Sciences 79: 2554 - 2558, 1982.

McCulloch, Warren S., and Walter Pitts. A Logical Calculus of the Ideas Immanent in Nervous Activity." Bulletin of Mathematical Biophysics, 5 (1943): 115-133. Reprinted in The Collected Works of Warren S. McCulloch, vol. 1, Edited by Rook McCulloch, Salinas, CA: Intersystems Publications, 1989: 343-361.

Minsky, Marvin, and Seymour Papert. Perceptrons: An Introduction to Computational Geometry. Cambridge, MA: MIT Press, 1969.

Pitts, Walter, and Warren S. McCulloch. "On How We Know Universals: The Perception of Auditory and Visual Forms." Bulletin of Mathematical Biophysics, 9:127-147, 1947. Reprinted in Embodiments of Mind and The Collected Works of Warren S. McCulloch, vol. 2, Edited by Rook McCulloch, Salinas, CA: Intersystems Publications, 1989: 530-550.

Rosenblatt, Frank. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Washington, DC: Spartan Books, 1962.

Pomerleau, D. Neural Network Perception for Mobile Robot Guidance. New York, NY: Kluwer Academic Publisher, 1993.

Sejnowski, T., and C. R. Rosenberg. "Parallel Networks the Learn to Pronounce English Text." Complex Systems, vol.1, pp. 145-168, 1987.

Werbos, P. "Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences." PhD. Thesis, Cambridge, MA: Harvard University, 1972.