Perceptron: Basic Principles of Deep Neural Networks

Article information

Cardiovasc Prev Pharmacother. 2021;3(3):64-72
Publication date (electronic) : 2021 July 31
doi : https://doi.org/10.36011/cpp.2021.3.e9
1Department of Artificial Intelligence and Software Technology, Sunmoon University, Asan, Korea
2Department of Medical Informatics, College of Medicine, The Catholic University of Korea, Seoul, Korea
3Division of Endocrinology and Metabolism, Department of Internal Medicine, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Korea
Correspondence to Hun-Sung Kim, MD, PhD Department of Medical Informatics, College of Medicine, The Catholic University of Korea, 222 Banpo-daero, Seocho-gu, Seoul 06591, Korea. E-mail: 01cadiz@hanmail.net
Received 2021 July 14; Accepted 2021 July 31.

Abstract

Big data, artificial intelligence, machine learning, and deep learning have received considerable attention in the medical field. Attempts to use such machine learning in areas where medical decisions are difficult or necessary are continuously being made. To date, there have been many attempts to solve problems associated with the use of machine learning by using deep learning; hence, physicians should also have basic knowledge in this regard. Deep neural networks are one of the most actively studied methods in the field of machine learning. The perceptron is one of these artificial neural network models, and it can be considered as the starting point of artificial neural network models. Perceptrons receive various inputs and produce one output. In a perceptron, various weights (ω) are given to various inputs, and as ω becomes larger, it becomes an important factor. In other words, a perceptron is an algorithm with both input and output. When an input is provided, the output is produced according to a set rule. In this paper, the decision rules of the perceptron and its basic principles are examined. The intent is to provide a wide range of physicians with an understanding of the latest machine-learning methodologies based on deep neural networks.

INTRODUCTION

Machine learning improves information processing ability through learning acquired through various processing experiences of data inputs into a computer program. Previously,1) machine learning was interpreted, in an informal manner, as “a field in which a machine attempts to derive a solution with high performance, for a specific problem, by applying a learning strategy to given data.” Along with extensive research on various types of machine learning, information technology (IT) related to data production, storage, and management has been developed.2)3) With the development of IT, machine learning in various fields, such as drug discovery,4) medical diagnosis,5) protein structure,6) and functionality7) prediction methodologies are being used successfully.

Machine learning can be largely grouped into supervised learning, unsupervised learning or reinforcement learning.8)9) Supervised learning is used to provide the correct answer in advance, with data already secured, and to learn it.10) Good examples to illustrate this are “if your blood pressure is high, then you have hypertension” or “if you have high blood pressure and diabetes, then the incidence of cardiovascular disease increases.” Similarly, supervised learning favors this “if–then” structure. In such supervised learning, there are “classifications” that classify and divide data according to a set rule, and “regression,” which predicts the result by finding a pattern.11)

In this article, the classification problem is discussed from the perspective of supervised learning, which is the most basic among machine-learning techniques and is most familiar to physicians. Decision trees are the most representative operating method, and the process is repeated continuously until a final diagnosis is made. Deep neural networks are one of the most actively researched and utilized methodologies in supervised learning. Machine learning is approached herein through a detailed explanation of the working principle of a “perceptron,” which is the basis of deep neural networks. In addition, this article is intended to help physicians in various fields to intuitively understand machine learning.

PRELIMINARY KNOWLEDGE ON TRAINING DATA EXPRESSED BY VECTORS

To gain insight into machine learning, it is necessary to understand the vectors representing data and basic operations on these vectors. Supervised learning refers to the process of deriving a function that provides an appropriate output when specific data are input. The data used in this process are called “training data,” and a typical form of the training data is shown in Table 1. Table 1 shows the data of 8 patients and consists of the results of 2 tests (tests A and B), that each patient received, together with the resulting diagnosis (+1: cardiovascular disease, −1: normal). At this time, each test is referred to as a “feature,” and the diagnosis result is referred to as a “label.” In summary, given the feature list (result values of tests A and B), a function to predict the appropriate label (whether the patient has cardiovascular disease or is normal) is derived based on the training data expressed in Table 1. This process is called “supervised learning.” “Learning” is the process of finding the pattern between the features and the label existing in the training data as a function.

Typical form of the training data in supervised learning: diagnosis results, where “cardiovascular disease” and “normal” are denoted by +1 and −1, respectively

The training data in Table 1 can be expressed in a 2-dimensional space, as shown in Figure 1. The data C (5,3), in Figure 1, expressed on the coordinate system can be interpreted as a movement of 5 along the test A axis and 3 along the test B axis, with respect to the origin. Therefore, Table 1 can be described as a vector in a 2-dimensional space as follows (Figure 1): [{A[1,3], −1}, {B[7,2], +1},{C[5,3], +1}, {D[1,1], −1}, {E[6,3], +1}, {F[3,2], −1}, {G[2,1], −1}, {H[6,5], +1}].

Figure 1.

Training data expressed by vector in 2-dimensional space.

In general, a vector (Ʋ) described in the form of Ʋ=[Ʋ1, Ʋ2, Ʋ3, …, Ʋn] is a quantity having a direction and a magnitude. Each Ʋn is a component of Ʋ, where Ʋ is called an n-dimensional vector. Moreover, if each element of the vector Ʋ is multiplied by α, which is a positive number, a vector that is doubled by α, while maintaining the direction of the vector, is obtained. However, if α is a negative number, it is possible to obtain a vector that has a magnitude of α times the direction opposite to the existing direction. The notation |v| is used to indicate the magnitude of the vector and is calculated using Equation 1. One of the advantages of examining the training data from a vector point of view is that various operators related to vectors can be employed.

(1) v=i=1n(vi)2

BASIC OPERATORS REQUIRED TO UNDERSTAND PERCEPTRONS

In this section, the addition, subtraction, and dot product of the vectors are briefly discussed. Addition, subtraction, and dot product for given 2-dimensional vectors Ʋ and Ư are defined as shown in Table 2. As can be seen in Table 2, the result of addition and subtraction between 2 n-dimensional vectors is another n-dimensional vector, whereas the result of the dot product is a single scalar value. In addition and subtraction, where the result of the operation is a new vector, as shown in Figure 2, the meaning of the operation can be understood and interpreted as a physical movement. Considering the addition of 2 vectors “Ʋ+Ư” as an example, it is interpreted as the direction and magnitude of moving along vector Ʋ from the origin and then additionally moving along Ư (the starting point of the vector is not necessarily the origin; however, it is assumed so for convenience).

Definitions of “addition,” “subtraction,” and “dot product” of vectors

Figure 2.

Understanding of the meaning of the operation as a physical movement. (A) Two dimensional sample vector v and u, (B) geometric representation of vector addition and (C) geometric representation of vector subtraction.

However, because the dot product of a vector is the result of a single number (scalar) operation, there is a limit to understanding its meaning through movement. Equation 2 defines the dot product of a vector as follows:

(2) v·u=v×u×cosθ

where θ is the difference in direction between Ʋ and Ư.

Equation 2 can be interpreted as reflecting the difference θ in the directions between vectors after multiplying the magnitudes of the vectors. Because the cosine graph has a value of 1 when θ is 0° (cos 0°=1), if the directions of the 2 vectors are the same, the dot product has the same result as multiplying the magnitudes of the 2 vectors. However, because the directions of the 2 vectors are different, the value of the vector dot product decreases. Furthermore, if it is an obtuse angle, a negative sign is used. As the magnitude of the vector cannot be negative, it can be recognized that the sign of the resulting value of the dot product operation, on the 2 vectors, is ultimately determined by the angle θ that formed between the 2 vectors.

PERCEPTRON AND THE DECISION RULE

The perceptron is a machine-learning methodology that can be regarded as the progenitor of deep neural networks.12)13) The perceptron utilizes 1-dimensional data (i.e., points), 2-dimensional data (lines), 3-dimensional data (planes), and 4-dimensional data (hyperplanes). The perceptron is a representative linear classifier model that bisects the space in which the training data exist in this manner.14)

As shown in Figure 3, the perceptron is a model that classifies the learning data existing in a 2-dimensional space using a straight line. One straight line, in the 2-dimensional space, is in the form of y=αχ+β using the slope α and the intercept β, and the point (χ0, y0) through which the straight line passes and the slope α based “y−y0=α(χ−χ0)” can be expressed in various ways. In this section, the most common (standard) form, αχ+βy=ƈ, is used for ease of explanation. At this time, because the constants α and β are weights, multiplied by the values of χ and y, α is substituted with ω1 and β with ω2. Furthermore, because the variables χ and y are arbitrarily assigned simple variable names, if χ1 and χ2 are replaced, the straight line in Figure 3 can be expressed as ω1χ12χ2=ƈ.

Figure 3.

Model that classifies learning data in a 2-dimensional space using straight lines.

Among the training data shown in Figure 3, the 4 blue dots belonging to the normal class are the points where the result of the operation is smaller than ƈ when substituted into ω1χ12χ2. In addition, the 4 red dots can be interpreted as the points where the operation result is larger than ƈ. In other words, the perceptron shown in Figure 3 can be regarded as a model that divides a given space into −1 and +1 based on the straight line ω1χ12χ2=ƈ. This is described in more detail as follows.

(3) If w1x1+w2x2<ƈ, then 1"If w1x1+w2x2>ƈ, then +1"

The decision rule of the perceptron expressed in Equation 3 is limited to 2-dimensional data, that is, when data are described with 2 characteristics, χ1 and χ2. Therefore, if the classification rule of the perceptron is generalized and expressed in a form applicable to data existing in the n-dimensional space, it is expressed as Equation 4.

(4) if i=1nωixi<c, then-1 if i=1nωixi-c<0, then-1if i=1nωixi>c, then+1  if i=1nωixi-c>0, then+1

In addition, if χn+1 of a hypothetical (bias) having all values of “+1” and a corresponding weight ωn+1 (however, ωn+1=−ƈ) are given, Equation 4 is transformed, as shown in Equation 5.

(5) if i=1n+1ωixi<0, then-1 if i=1n+1ωixi>0, then+1 

Considering the decision rule of the perceptron expressed in Equation 6, it can be interpreted that the given data are classified according to the sign of i=1n+1ωiχi. The operation process and result of i=1n+1ωiχi are the same as the dot product of 2 vectors ω and χ of the n+1 dimension introduced earlier. Therefore, the decision rule of the perceptron can be described in a simplified form using the dot product of a vector, as in Equation 6. At the same time, the perceptron can be interpreted as an n+1-dimensional vector ω.

(6) if sign(ω·χ)<0,then-1if sign(ω·χ)>0then+1

PERCEPTRON LEARNING ALGORITHM (PLA)

Previously, the classification rules used by the perceptron, to classify the given data, have been discussed. In this section, the derivation of a perceptron that perfectly classifies the given training data (red and blue dots), similar to the straight line in Figure 3, is discussed. In other words, the PLA is a method of finding a perceptron ω that perfectly divides “−1” and “+1” among countless candidates of perceptrons in space.

By inputting the training data D={(χ1, y1), (χ2, y2), (χ3, y4), …, (χƙ, yƙ)} composed of ƙ data, the PLA learns and returns the perceptron ω through the following process (Table 3). Similar to other machine-learning methodologies, the PLA randomly generates perceptron ω and then updates it to learn (line 1). Each datum belonging to the training data (line 5) is classified according to the perceptron decision rule described in Equation 6, and whether it matches the actual value (line 6) is checked. If even one data point is misclassified, the perceptron ω is updated (line 8). If all data are correctly classified, the training process ends, the update is completed, and ω is returned (line 13). In the end, the process of updating the current perceptron ω to “ω+y×χ” using the misclassified data (χ, y) is the core of learning. Understanding the effectiveness of this operation is essential. The update process is performed when “sign (ω·χ)≠sign (y)”, the condition described in line 6, is satisfied.

Perceptron learning algorithm

Figure 4A shows the case where the sign of the dot product operation (resulting from the data χ and the current perceptron ω, where the correct answer is −1 (y=−1) is positive. As mentioned, the sign of the resulting value of the inter-vector dot product operation is determined by the angle θ between the 2 vectors. As shown in Figure 4A, if the sign is positive, it is an acute angle. In other words, the current perceptron misclassifies the data ω because the angle θ, formed by the 2 vectors, is an acute angle. In this case, if ω is updated according to the perceptron update rule in line 8 (Table 3), the angle formed by the updated ω and χ becomes an obtuse angle, as shown in Figure 4A, and χ is correctly classified.

Figure 4.

Perceptron update effect where blue w: current perceptron, x: misclassified data by w, green w: updated perceptron. (A) representation of update mechanism for w that classifies x whose label is −1 as +1 and (B) representation of update mechanism for w that classifies x whose label is +1 as −1.

Figure 4B can be interpreted in a similar way. Figure 4B can be considered an error because θ, which is the difference in direction between the current perceptron ω, and the data where the correct answer is +1 (y=+1), is an obtuse angle. If θ were an acute angle, the error could be avoided. In this case, if ω is updated according to the perceptron update rule ω← ω+y×χ, the size of the angle between the updated ω and the data χ becomes an acute angle, confirming that the data are correctly classified.

CONCLUSION

The decision rules and learning algorithms of the perceptron, the progenitor of deep neural networks, were examined. The perceptron is an algorithm that derives a hyperplane that in turn divides a given space into 2 subspaces.15) At this time, only data belonging to the same label exist in each divided space. However, data that cannot be distinguished by points, straight lines, planes, or hyperplanes can be easily encountered in the real world. To solve this nonlinear problem, a network structure composed of multiple perceptrons in multiple layers, called a “deep neural network”16)17) is proposed. Deep neural networks, such as the input layer, hidden layer, output layer, and activation function, are composed of more diverse components than perceptrons.18)19) However, the goal of learning is the same in that it performs the task of dividing the space into a form suitable for the given learning data.

Various deep neural networks are already being used in many fields of cardiovascular disease research, and there are many actual clinical applications.20)21) The decision rules and principles of the perceptron, introduced in this article, are the basis for understanding the latest machine-learning methodologies based on deep neural networks, such as convolutional neural networks22) and recurrent neural networks.23)

Notes

Funding

This research was supported by the Bio & Medical Technology Development Program of the National Research Foundation (NRF) funded by the Korean government (MSIT) (No. 2019M3E5D3073104).

Conflict of Interest

The author has no financial conflicts of interest.

Author Contributions

Conceptualization: Kim HS; Data curation: Kim EH, Kim HS; Formal analysis: Kim EH, Kim HS; Writing - original draft: Kim EH, Kim HS; Writing - review & editing: Kim HS.

References

1. Mitchell TM. Machine Learning 1st edth ed. New York, NY: McGraw-Hill; 1997.
2. Kim HS, Yoon KH. Lessons from use of continuous glucose monitoring systems in digital healthcare. Endocrinol Metab 2020;35:541–8.
3. Lee H, Kim HS. Logistic regression and least absolute shrinkage and selection operator. Cardiovasc Prev Pharmacother 2020;2:142–6.
4. Lee M, Kim H, Joe H, Kim HG. Multi-channel PINN: investigating scalable and transferable neural networks for drug discovery. J Cheminform 2019;11:46.
5. Miotto R, Li L, Kidd BA, Dudley JT. Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Sci Rep 2016;6:26094.
6. Senior AW, Evans R, Jumper J, Kirkpatrick J, Sifre L, Green T, Qin C, Žídek A, Nelson AW, Bridgland A, Penedones H, Petersen S, Simonyan K, Crossan S, Kohli P, Jones DT, Silver D, Kavukcuoglu K, Hassabis D. Improved protein structure prediction using potentials from deep learning. Nature 2020;577:706–10.
7. Xu J, Zhang H, Zheng J, Dovoedo P, Yin Y. eCAMI: simultaneous classification and motif identification for enzyme annotation. Bioinformatics 2020;36:2068–75.
8. Russell S, Norvind P. Artificial Intelligence: a Modern Approach 3rd edth ed. Englewood Cliffs, NJ: Prentice Hall; 2010.
9. Herold D, Lutter D, Schachtner R, Tome AM, Schmitz G, Lang EW. Comparison of unsupervised and supervised gene selection methods. Annu Int Conf IEEE Eng Med Biol Soc 2008;2008:5212–5.
10. Goudbeek M, Swingley D, Smits R. Supervised and unsupervised learning of multidimensional acoustic categories. J Exp Psychol Hum Percept Perform 2009;35:1913–33.
11. Dreiseitl S, Ohno-Machado L. Logistic regression and artificial neural network classification models: a methodology review. J Biomed Inform 2002;35:352–9.
12. Lorencin I, ; Anđelić N, Španjol J, Car Z. Using multi-layer perceptron with Laplacian edge detector for bladder cancer diagnosis. Artif Intell Med 2020;102:101746.
13. Castro W, Oblitas J, Santa-Cruz R, Avila-George H. Multilayer perceptron architecture optimization using parallel computing techniques. PLoS One 2017;12e0189369.
14. Metsker O, Magoev K, Yanishevskiy S, Yakovlev A, Kopanitsa G, Zvartau N. Identification of diabetes risk factors in chronic cardiovascular patients. Stud Health Technol Inform 2020;273:136–41.
15. Rossi F, Conan-Guez B. Theoretical properties of projection based multilayer perceptrons with functional inputs. Neural Process Lett 2006;23:55–70.
16. Ghosh K, Stuke A, Todorović M, ; Jørgensen PB, Schmidt MN, Vehtari A, Rinke P. Deep learning spectroscopy: neural networks for molecular excitation spectra. Adv Sci (Weinh) 2019;6:1801367.
17. Shafi N, Bukhari F, Iqbal W, Almustafa KM, Asif M, Nawaz Z. Cleft prediction before birth using deep neural network. Health Informatics J 2020;26:2568–85.
18. Kim EH, Oh SK, Pedrycz W. Design of double fuzzy clustering-driven context neural networks. Neural Netw 2018;104:1–14.
19. Li Y, Zhang T. Deep neural mapping support vector machines. Neural Netw 2017;93:185–94.
20. Hong S, Zhou Y, Wu M, Shang J, Wang Q, Li H, Xie J. Combining deep neural networks and engineered features for cardiac arrhythmia detection from ECG recordings. Physiol Meas 2019;40:054009.
21. Acharya UR, Oh SL, Hagiwara Y, Tan JH, Adam M, Gertych A, Tan RS. A deep convolutional neural network model to classify heartbeats. Comput Biol Med 2017;89:389–96.
22. Cai L, Gao J, Zhao D. A review of the application of deep learning in medical image classification and segmentation. Ann Transl Med 2020;8:713.
23. Mishra S, Banerjee M. Automatic caption generation of retinal diseases with self-trained RNN merge model. In : Chaki R, Cortesi A, Saeed K, Chaki N, eds. Advanced Computing and Systems for Security. Advances in Intelligent Systems and Computing, Vol. 1136 Springer: Singapore; 2020. p. 1–10.

Article information Continued

Figure 1.

Training data expressed by vector in 2-dimensional space.

Figure 2.

Understanding of the meaning of the operation as a physical movement. (A) Two dimensional sample vector v and u, (B) geometric representation of vector addition and (C) geometric representation of vector subtraction.

Figure 3.

Model that classifies learning data in a 2-dimensional space using straight lines.

Figure 4.

Perceptron update effect where blue w: current perceptron, x: misclassified data by w, green w: updated perceptron. (A) representation of update mechanism for w that classifies x whose label is −1 as +1 and (B) representation of update mechanism for w that classifies x whose label is +1 as −1.

Table 1.

Typical form of the training data in supervised learning: diagnosis results, where “cardiovascular disease” and “normal” are denoted by +1 and −1, respectively

Patient Feature
Label
Test A Test B Diagnosis Interpretation
Patient A 1 3 −1 Normal
Patient B 7 2 +1 Cardiovascular disease
Patient C 5 3 +1 Cardiovascular disease
Patient D 1 1 −1 Normal
Patient E 6 3 +1 Cardiovascular disease
Patient F 3 2 −1 Normal
Patient G 2 1 −1 Normal
Patient H 6 5 +1 Cardiovascular disease

Table 2.

Definitions of “addition,” “subtraction,” and “dot product” of vectors

Operation Definition Example
Addition v+u=<v1+u1,v2+u2,…,vn+un> <2, 5>+<1, 1>=<3, 6>
Subtraction vu=<v1u1,v2u2,…,vnun> <2, 5>−<1, 1>=<1, 4>
Dot product v·u=i=1n(vi×ui) <2, 5>·<1, 1>=2×1+5×1=7

Table 3.

Perceptron learning algorithm

Input: training data D={(x1,y1), (x2,y2), …, (xk,yk)}
Output: perceptron ω
1: Randomly generate ω
2: flagtrue
3: While flag=true do
4:  flagfalse
5:  For each (x,y)∈D do
6:   If sigh(wx)≠sign(y) then
7:    flagtrue
8:    ω←ω+y×x
9:    Break
10:   End if
11:  End for
12: End while
13: Return ω