KNN (K-Nearest Neighbors) algorithm is a machine learning algorithm used for both classification and regression tasks. It is a non-parametric and lazy learning algorithm, which means it doesn't make any assumptions about the underlying distribution of the data and it does not learn a model during training, but instead stores all of the training data.
The KNN algorithm works as follows:
Given a new input instance, the KNN algorithm finds the K training instances (i.e., data points) that are closest to the new instance in the feature space. The distance metric used can be any metric that measures the similarity or dissimilarity between the instances.
For classification tasks, the KNN algorithm assigns the class label that is most common among the K nearest neighbors. For regression tasks, the KNN algorithm assigns the average of the target values of the K nearest neighbors.
The value of K is a hyperparameter that must be set prior to train the model. If K is too small, the model can be sensitive to noise in the data and can overfit the training data. If K is too large, the model can be overly biased towards the most common class or value in the data.
KNN can be used with any number of features and is especially useful when the underlying distribution of the data is unknown or difficult to model. KNN is often used as a baseline algorithm to compare the performance of more complex models.
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=5,
p=2,
metric='minkowski')
knn.fit(X_train_std, y_train)
plot_decision_regions(X_combined_std, y_combined,
classifier=knn,
test_idx=range(105,150))
plt.xlabel('petal length [standardized]')
plt.ylabel('petal width [standardized]')
plt.legend(loc='upper left')
plt.show()