In the second post Support Vector Machines (SVM) Fundamentals Part-II , I discussed about margin maximization, support planes and support vectors. Lets have a quick recap:
- In a basic sense SVM is a two class classifier. it learns from the data lying in support planes.
- Data lying on support planes are called support vectors.
- Support Vector Machine aims to maximize Dt (margin) . So its a maximum margin classifier with objective function (dual problem) Minimize (||W||2/2 )
Before discussing the learning process of support vector machines (SVM) you must understand what Karush–Kuhn–Tucker (KKT) conditions are. So before going ahead I recommend you to get some understanding of Karush–Kuhn–Tucker (KKT) conditions. Here are some references:
At this point I assume that you are having basic understanding of KKT conditions. For the sake of clarity I am just mentioning KKT in breif.
min f (x) subject to gi(x) <= 0, i = 1, . . . , n The Lagrangian for the function L (x, λ) = f (x) +λ1g1(x)+λ2g2(x)………+λngn(x)
Lets say x* is a local optimum point so KKT conditions for point x* to be local minimum are
∇L (x*, λ)=0—————(stationary)
gi(x*) <= 0, for all i ————- (primal feasibility)
λigi(x*)=0 for all i ———————-(complementary slackness)
λi=>0 for all i ———————-(dual feasibility)
Lets take a case of one dimensional data-set. I am taking One dimensional case for the sake of simplicity. It can be easily extended for multidimensional case. Also data patterns are linearly separable. I will show you afterwards how it can be extended for not-linearly separable cases.
Suppose there are three patterns. Pattern 1 (value=2) and 2 (value=4) belong to class # and pattern 3 (value=10) belongs to class * shown in figure below.
In this case w will be a single value instead of a vector. So we have to minimize ||w||2/2 . Constraints are:
2w-c<=-1 ; 4w-c<=-1 and 10w-c=>1
so Lagrangian will be
min L= ||w||2/2 –λ1(-2w+c-1)-λ2(-4w+c-1)-λ3(10w-c-1).
Taking derivative of L with respect to w ,c and λ we get
–2w+c-1=0 (3) ; -4w+c-1=0 (4); 10w-c-1=0 (5)
also other kkt conditions are λ1(-2w+c-1)=0 (6) ; λ2(-4w+c-1)=0 (7); λ3(10w-c-1)=0 (8)
solving (4) and (5) we get 6w=2 or w=1/3 and c=7/3 . Also it is implied from (6) λ1 = 0 so from (2) λ2=λ3 and from (1) λ2=1/18=λ3.
now decision boundary is wx-c=0 so x/3-7/3=0 or x=7. It is obvious from above figure that point x=7 classifies the two classes with maximum margin (3) as pattern 2 is a boundary for class # and pattern 3 is a boundary for class *. Same reasoning can be extended for higher dimensions.
Not-Linearly Separable Case:
Lets take a case of data shown be
as it is evident from above figure class # and class * are not linearly separable. But when we project this data into two dimension (x, x2) it becomes linearly separable as shown in figure below.
Now we can use formulation discussed to classify the data with maximum margins. The trick here is to use advantage of higher dimension without actually going into it and thats the beauty of support vector machines(SVM).
this completes the blog posts “Support Vector Machines Fundamentals” series. For any queries you can mail me at firstname.lastname@example.org.
Thanks for reading post.
- Support Vector Machines (SVM) Fundamentals Part-II (panthimanshu17.wordpress.com)
- Support Vector Machines (SVM) Fundamentals Part-I (panthimanshu17.wordpress.com)