Sunday, March 28, 2021

Visualization Books

 Visualization Analysis and Design,Tamara Munzner,A K Peters,Much of the course material is taken from this book

Making Data Visual,Danyel Fischer & Miriah Meyer,O'Reilly,Very good introductory text


Monday, January 18, 2021

DataFrames vs Datasets vs RDD

 https://www.youtube.com/watch?v=9yNmTucj6HU&list=PLfxl5dzojKr4K_NtVFKDnecP3mUp8K6Fj&index=5



















Tuesday, January 12, 2021

SPARK Websites

 https://www.composablesystems.org/17-400/fa2020/schedule/


https://www.composablesystems.org/17-400/fa2020/#course-information



https://heather.miller.am/index.html#teaching


https://heather.miller.am/teaching/cs4240/spring2018/


https://heather.miller.am/index.html#teaching

Friday, December 25, 2020

Deep Learning Hyperparameter Tuning example

 https://www.kaggle.com/jamesleslie/titanic-neural-network-for-beginners


titanic-neural-network-for-beginners  : 


Summary:


Create_model is the key concept in the whole algorithm.

def create_model(lyrs=[8], act='linear', pt='Adam', dr=0.0):


used GridsearchCV to find the best Hyperparameter Tuning.

Hyperparameters:  batch_size , epochs , optimizer , layers and drops 



Hyperparameter Tuning


Grid searchCV - batch size and epochs

batch_size = [16, 32, 64]

epochs = [50, 100]


Best: 0.822671 using {'batch_size': 32, 'epochs': 50}

===================================================

Grid searchCV - Optimization Algorithm

optimizer = ['SGD', 'RMSprop', 'Adagrad', 'Adadelta', 'Adam', 'Nadam']


Best: 0.822671 using {'opt': 'Adam'}


===================================================


Grid searchCV - Hidden neurons

       layers = [[8],[10],[10,5],[12,6],[12,8,4]]


Best: 0.822671 using {'lyrs': [8]}


===================================================



Grid searchCV - Dropout

drops = [0.0, 0.01, 0.05, 0.1, 0.2, 0.5]


Best: 0.824916 using {'dr': 0.2}


===================================================


model = create_model(lyrs=[8], dr=0.2)


training = model.fit(X_train, y_train, epochs=50, batch_size=32, 

                     validation_split=0.2, verbose=0)




Still have few questions -- 


a. Initial train model given val_acc: 86.53% but where as train model at the end given acc: 83.16%

b. making batch size and epochs as constant values and then the remaining hyperparmaters found the best value

   rather than adding one hyperparameter and then another hyperparameter.