Lets talk about supervised learning. Supervised learning concept is nothing but, if we
ask the machine to learn from our data when we specify a target variable.
This reduces the machine’s task to only divining some pattern from the input
data to get the target variable.
We address two cases of the target variable. The first case occurs when the target
variable can take only nominal values: true or false; reptile, fish, mammal, plant, fungi. The second case of classification occurs when the target variable
can take an infinite number of numeric values, such as 0.100, 42.001, 1000.743, ….
This case is called regression. We’ll study regression later on . Let us first
focus on classification. Now let us look at a very popular Machine learning technique knn algorithm. Also , you should have prior knowledge of R basics.
Have you ever seen movies categorized into genres? What defines these genres, and
who says which movie goes into what genre? The movies in one genre are similar
but based on what? I’m sure if you asked the people involved with making the mov-
ies, they wouldn’t say that their movie is just like someone else’s movie, but in some
way you know they’re similar. What makes an action movie similar to another action
movie and dissimilar to a romance movie? Do people kiss in action movies, and do
people kick in romance movies? Yes, but there’s probably more kissing in romance
movies and more kicking in action movies. Perhaps if you measured kisses, kicks,
and other things per movie, you could automatically figure out what genre a movie
Lets discuss our first machine-learning algorithm: k-Nearest Neighbors Algorithm
Neighbors. k-Nearest Neighbors is easy to grasp and very effective. We’ll first discuss
the theory and how you can use the concept of a distance measurement to classify
The first machine-learning algorithm we’ll look at is k-Nearest Neighbors (kNN). It
works like this: we have an existing set of example data, our training set. We have
labels for all of this data—we know what class each piece of the data should fall into.
When we’re given a new piece of data without a label, we compare that new piece of
data to the existing data, every piece of existing data. We then take the most similar
pieces of data (the nearest neighbors) and look at their labels. We look at the top k
most similar pieces of data from our known dataset; this is where the k comes from. (k
is an integer and it’s usually less than 20.) Lastly, we take a majority vote from the k
most similar pieces of data, and the majority is the new class we assign to the data we
were asked to classify.
Where to learn:
K-Nearest Neighbor Algorithm in R , 15-minute walkthrough. This video introduces the k-NN (k-nearest neighbor) model in R using the famous iris dataset. This video by is part of a tutorial series on R, Data Science, and Machine Learning.