Data science: the power of data

Data science is a set of tools that allow us to extract knowledge from data. It is an interdisciplinary field that encompasses statistical, mathematical, programming, data mining, machine learning and data visualization skills, as well as business and industry knowledge to which it is applied.

Datamining:

It is the process of information exploration and analysis, in automatic or semi-automatic terms, of enormous volumes of information in the process of discovering behavior patterns and important rules for a company.

Types of models.

Predictive or supervised
(if you know what you are looking for):

  • Classification Model: Seeks to predict the class (discontinuous value) of the existing information.
    Example: Validate if the transaction is normal or fraudulent.
  • Regression Model: Seeks to predict the continuous value of existing information.
    Example: Estimate demand for products

Descriptive or unsupervised
(it is not known what is sought):

  • Grouping (Cluster): Seeks to form a group that has common characteristics. 
    Example: Identify people who have the same shopping habits.
  • Association rule: Seeks to identify rules that involve the occurrence of simultaneous events.
    Example: Product offers such as "Noodles with Tomato Sauce"
  • Correlational Analysis: Seek to identify correlations between variables of interest. 
    Example: It is required to know the factors that influence to contract lung cancer.

Example case: Estimate the price of automobiles according to their characteristics according to their historical data.

Training and Model Validation Files

Archive of vehicles with prices for training the model

File with vehicles to estimate prices

Apply the model

Use 2 models where the first is the Linear Regression model and the second is the Multilayer Perceptron neural network model and with this it is compared for its predictive power and the percentage of adjustment they have

Model evaluation

According to the prediction results, the best is the Multilayer Perceptron Model, because it has a 95% prediction and a 3% adjustment.

Deliverables

Implementation of the Model to Estimate the Price