Support Vector Machines (SVM) Fundamentals Part-III

In the second post Support Vector Machines (SVM) Fundamentals Part-II  , I discussed about margin maximization, support planes and support vectors. Lets have a quick recap:

  • In a basic sense SVM is a two class classifier. it learns from the data lying in support planes.
  • Data lying on support planes are called support vectors.
  • Support Vector Machine aims to maximize Dt (margin) . So its a maximum margin classifier with objective function (dual problem)                                                                                                                                          Minimize (||W||2/2  )

Before discussing the learning process of support vector machines (SVM) you must understand what Karush–Kuhn–Tucker  (KKT) conditions are. So before going ahead I recommend you to get some understanding  of  Karush–Kuhn–Tucker  (KKT) conditions. Here are some references:

http://www.onmyphd.com/?p=kkt.karush.kuhn.tucker  http://apmonitor.com/me575/index.php/Main/KuhnTucker  http://www.aass.oru.se/Research/Learning/drdv_dir/seminar_optim_2011/kkt.pdf

At this point I assume that you are having basic understanding of KKT conditions. For the sake of clarity I am just mentioning KKT in breif.

 min f (x) subject to                                                                                                                      gi(x) <= 0, i = 1, . . . , n                                                                                                                    The Lagrangian for  the function                                                                                                   L (x, λ) = f (x) +λ1g1(x)+λ2g2(x)………+λngn(x)

Lets say x* is a local optimum point so KKT conditions for point x* to be local minimum are

L (x*, λ)=0—————(stationary)

 gi(x*) <= 0,   for all i ————- (primal feasibility)  

 λigi(x*)=0     for all i    ———————-(complementary slackness)    

 λi=>0     for all i ———————-(dual feasibility)     

                                                                                                                                                                      Lets take a case of one dimensional data-set. I am taking One dimensional case for the sake of simplicity. It can be easily extended for multidimensional case. Also data patterns are linearly separable. I will show you afterwards how it can be extended for not-linearly separable cases.

Suppose there are three patterns. Pattern 1 (value=2) and 2 (value=4) belong to class # and pattern 3 (value=10) belongs to class * shown in figure below.

One dimensional case with three patterns
One dimensional case with three pattern 

In this case will be a single value instead of a vector. So we have to minimize ||w||2/2 . Constraints are:

2w-c<=-1  ; 4w-c<=-1 and  10w-c=>1

so Lagrangian will be

min L=   ||w||2/2 –λ1(-2w+c-1)-λ2(-4w+c-1)-λ3(10w-c-1).

Taking derivative of L with respect to w ,c and λ we get

w+2λ1+4λ2-10λ3=0  (1)

λ1+λ2-λ3=0   (2)

2w+c-1=0  (3) ;    -4w+c-1=0    (4);      10w-c-1=0    (5)

also other kkt conditions are                                                                                                               λ1(-2w+c-1)=0 (6)  ;   λ2(-4w+c-1)=0  (7);    λ3(10w-c-1)=0   (8)

solving (4) and (5) we get 6w=2  or w=1/3 and c=7/3 . Also it is implied from (6) λ1 = 0 so from (2) λ2=λ3 and from (1) λ2=1/18=λ3.

now decision boundary is wx-c=0 so x/3-7/3=0 or   x=7. It is obvious from above figure that point x=7 classifies the two classes with maximum margin (3) as pattern 2 is a boundary for class # and pattern 3 is a boundary for class *. Same reasoning can be extended for higher dimensions.

Not-Linearly Separable Case: 

Lets take a case of data shown be

not-linearly separable case
not-linearly separable case

as it is evident from above figure class # and class * are not linearly separable. But when we project this data into two dimension (x, x2) it becomes linearly separable as shown in figure below.

two dimension projection of one dimensional data
two dimension projection of one dimensional data

Now we can use  formulation discussed to classify the data with maximum margins. The trick here is to use advantage of higher dimension without actually going into it and thats the beauty of support vector machines(SVM).

this completes the blog posts “Support Vector Machines Fundamentals” series. For any queries you can mail me at panthimanshu17@gmail.com.

Thanks for reading post.

Advertisements

7 thoughts on “Support Vector Machines (SVM) Fundamentals Part-III

  1. Part 1 and 2 … were very interesting and i understood all very easily, however, Part three is difficult…. as i donot have a background knowledge of KKT and Lagrangian. So if it can be re-written for people like me in a little digestible

  2. Well I must admit this is the best introduction to SVM . But i would like you to make further articles on SVM advance topic like kernels and others . Again, I really learnt from you . Thanks.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s