Monday, November 30, 2020

Feature Engineering - Numerical Data Scaling

 








Numerical data can be scaled to ensure proportionate influence on the prediction


Common techniques for scaling

So how do we do it, exactly? How can we align different features into the same scale?

Keep in mind that not all ML algorithms will be sensitive to different scales of inputted features. Here is a collection of commonly used scaling and normalizing transformations that we usually use for data science and ML projects:

  • Mean/variance standardization
  • MinMax scaling
  • Maxabs scaling
  • Robust scaling
  • Normalizer
==================================================================

So how do you encode nominal variables? The one-hot encoding method is a good choice. Here’s how it works.



In this example, you have the one column called home Type, and three different levels: House, Apartment, and Condo. The data frame has five observations for that particular feature.  

With one-hot encoding, you convert this one column of home Type into three columns: a column for House, a column for Apartment, and a column for Condo. You encode each observation with either a 1 or 0: 1 to indicate the home type of that particular observation, or 0 for the other options.


============================================================


Topics related to this subdomain

Here are some topics you may want to study for more in-depth information related to this subdomain:



  • Scaling

  • Normalizing

  • Dimensionality reduction

  • Date formatting

  • One-hot encoding

No comments:

Post a Comment