AllAboutNoSQLDatabases: November 2020

Monday, November 30, 2020

Feature Engineering - Numerical Data Scaling

Numerical data can be scaled to ensure proportionate influence on the prediction

Common techniques for scaling

So how do we do it, exactly? How can we align different features into the same scale?

Keep in mind that not all ML algorithms will be sensitive to different scales of inputted features. Here is a collection of commonly used scaling and normalizing transformations that we usually use for data science and ML projects:

Mean/variance standardization
MinMax scaling
Maxabs scaling
Robust scaling
Normalizer

==================================================================

So how do you encode nominal variables? The one-hot encoding method is a good choice. Here’s how it works.

In this example, you have the one column called home Type, and three different levels: House, Apartment, and Condo. The data frame has five observations for that particular feature.

With one-hot encoding, you convert this one column of home Type into three columns: a column for House, a column for Apartment, and a column for Condo. You encode each observation with either a 1 or 0: 1 to indicate the home type of that particular observation, or 0 for the other options.

============================================================

Topics related to this subdomain

Here are some topics you may want to study for more in-depth information related to this subdomain:

Scaling

Normalizing

Dimensionality reduction

Date formatting

One-hot encoding

Tuesday, November 24, 2020

Seaborn Cheatsheet

Monday, November 16, 2020

UNIVARIATE Analysis

VISUALIZING UNIVARIATE CONTINUOUS DATA

Uni-variate plots are of two types:

1)Enumerative plots and

2)Summary plots

Univariate enumerative Plots :

These plots enumerate/show every observation in data and provide information about the distribution of the observations on a single data variable. We now look at different enumerative plots.

examples:

1. UNIVARIATE SCATTER PLOT

2. LINE PLOT (with markers)

3. STRIP PLOT

4. SWARM PLOT

Uni-variate summary plots :

These plots give a more concise description of the location, dispersion, and distribution of a variable than an enumerative plot. It is not feasible to retrieve every individual data value in a summary plot, but it helps in efficiently representing the whole data from which better conclusions can be made on the entire data set.

5. HISTOGRAMS

6. DENSITY PLOTS

7. RUG PLOTS

8. BOX PLOTS

9. distplot() :

10. VIOLIN PLOTS

VISUALIZING CATEGORICAL VARIABLES :

11. BAR CHART :

12. PIE CHART :

https://www.analyticsvidhya.com/blog/2020/07/univariate-analysis-visualization-with-illustrations-in-python/

Monday, November 9, 2020

HyperParameters for each ML algorithm

In machine learning, a hyperparameter (sometimes called a tuning or training parameter) is defined as any parameter whose value is set/chosen at the onset of the learning process. Whereas other parameter values are computed during training.

K-Nearest Neighbors : K , Leaf_size , Weights and Metric

Decision Trees and Random Forests : N_estimators, Max_depth , Min_samples_split , Min_samples_leaf and Criterion

AdaBoost and Gradient Boost : N_estimators, Learning_rate and Base_estimator (AdaBoost) / Loss (Gradient Boosting)

Support Vector Machines : C, Kernel, and Gamma.

Specifically, I will focus on the hyperparameters that tend to have the greatest effect on the bias-variance tradeoff

However, it is very, very important to keep in mind the bias-variance tradeoff, as well as the tradeoff between computational costs and scoring metrics. Ideally, we want a model with low bias and low variance to limit overall error

https://medium.com/swlh/the-hyperparameter-cheat-sheet-770f1fed32ff

The Hyperparameter Cheat Sheet A quick guide to hyperparameter tuning utilizing Scikit Learn’s GridSearchCV, and the bias/variance trade-off

J.P. Rinfret

---------------------------------------------------------------------------------------------------------------------

=====================================================================

https://towardsdatascience.com/model-parameters-and-hyperparameters-in-machine-learning-what-is-the-difference-702d30970f6

Examples of hyperparameters used in the scikit-learn package

1.Perceptron Classifier

Perceptron(n_iter=40, eta0=0.1, random_state=0)

2. Train, Test Split Estimator

train_test_split( X, y, test_size=0.4, random_state=0)

3. Logistic Regression Classifier

LogisticRegression(C=1000.0, random_state=0)

4. KNN (k-Nearest Neighbors) Classifier

KNeighborsClassifier(n_neighbors=5, p=2, metric='minkowski')

5. Support Vector Machine Classifier

SVC(kernel='linear', C=1.0, random_state=0)

6. Decision Tree Classifier

DecisionTreeClassifier(criterion='entropy',

max_depth=3, random_state=0)

7. Lasso Regression

Lasso(alpha = 0.1)

8. Principal Component Analysis

PCA(n_components = 4)