what is classification?
Classification is the process of using a model to predict unknown values (output
variables), using a number of known values (input variables). For example we might
want to predict whether a stock market is currently a bull or a bear market, based
on a number of market indicators, or we might want to predict whether a patient
has a certain disease given a number of symptoms.
In order to perform classification, first we need to model the relationship between
the input variables and the output variables we are predicting. This process involves
learning a model using data in which both the input variables and the output variables
are present. Expert opinion can also be used to build/enhance a model. This model
can subsequently be used on unseen data in which only the input data is present,
in order to predict the output variables.
Classification is termed a supervised learning approach, because a model is trained
specifically for the purpose of predicting the output variable.
Typically, the term classification is concerned with predicting discrete variables.
The term regression is used when predicting continuous variables.