New York
home > Data Science, Machine Learning & Artificial Intelligence Email

Data Science, Machine Learning & Artificial Intelligence
On This Page More Other Pages

- intro -
- x -
- x -



Intro ------------------------------

How to learn "Machine Learning" and "Artificial Intelligence"
updated September 10, 2018

All my slides are here:

I have uploaded some stuff here:

You need basic working knowledge of Unix / Linux.
 - buy yourself a Macbook and learn to work from terminal.
 - book - Learning the UNIX Operating System O'Reilly
 - tutorials on youtube
     grep, find, set vs env, running script vs sourcing script 
 - vi editor:
   - tutorials on youtube

  ipython notebook (Jupyter)
  python versions 2.7 vs 3.x
  numpy, pandas DataFrame, pd.read_csv(file), df.to_csv(file)
 - calculus: 
     e = 2.71828... lim( (1+1/n)**n ) for large n
     derivatives and integrals, 
     Dirac Delta Function,
     multivariate calculus, partial derivatives, 
 - linear algebra: 
     vector, matrix, dot-product of vectors and matrices,
     vector spaces, base vectors, matrices of space transformations,
     determinant, rank, 
     eigen vectors, eigen values, 
     inverse matrix, 
Probability & Statistics:

 - probability: 
     combinatorics, subsets,  C(n,m) = n!/(m!*(n-m)!)
     Venn Diagrams, 
     Conditional Probability, Bayes Theorem,
     random variable, probability distribution, expected value, 
     variance, standard deviation,
     discrete and continous cases,
     PDF (Probability Density Function) vs Cumulative Probability Function ( CPF),
     Binomial Distribution, 
     Poisson and Exponential Distributions, 
     Uniform Distribution,
     Central Limit Theorem and Normal (Gaussian) Distribution

 - statistics: 
     Sample mean (average), Sample Standard Deviation (why /(N-1) ?), Median 
     Linear Regression - draw line through points, 
     OLS = Ordinary Least Squares (minimizing quadratic error), 
     R2  = Coefficient of Determination

     Confidence interval, z-score
       not needed: Student’s t-distribution, Chi-squared distribution, F-distribution, Gamma distribution
     Stochastic Processes, Time Series Analysis, 
     Random Walk, Brownian motion, Diffusion,
     Poisson Process/distribution, exponential distribution
     white noise, gaussian noise
     Markov Process
     Monte Carlo method, MCMC (Markov Chain Monte Carlo)
     Correlation function, Autocorrelation
     Fourier Analysis, filtering to reduce noise
     Extracting Signal from Noise by synchronization (S/N improves ~sqrt(N)),

You need to learn the meaning of these words:
Data Science (DS) - use computer to process data from different sourceess
     (CSV files, Databases), apply statistics, make graphs 
Machine Learning (ML) - subset of DS to extract patterns from data
                        to do predictions. 
  ML Example - Linear Regression (draw streight line through points)

     Input data array of points data = [(x0,y0), (x1,y1), ...]
     model function                                
          def linear_regression(x, [a, b]):                                  
              y = a*x + b
              return y

     training of the model: find two numbers (slope a and intercept b)
                            which do best fit between the model and data.
                            (minimize the error)

  ML - many types of models (linear regression and logistic regression,
       support vector machines, K-nearest neighbors, K-means clustering,
       decision trees (RandomForest, XGBoost, etc.), Neural Networks, etc.) 
Deep Learning (DL) - ML implemented using multi-layered structures (Networks)
Artificial Intelligence (AI) - DL algorithm trained to perform function
  which are usually associated only with humans 
  (vision, speech comprehension, autonomous driving, etc.) 
     data cleaning, scaling, normalization
     synthetic data augmentation
     highly-unbalanced data (minority & majority class, imputing data in minority class)
     sparse matrix, sparse matrix data representation (Yale format)

     feature engineering (extraction/preparation)
Dimensional Reduction, 
PCA (Principal Components Analysis)  
LDA (Linear Discriminant Analysis) statistical method to find a linear combination of features to achieve separation of two or more classes. Used for dimensionality reduction before classification. Similar to PCA (Principal Component Analysis).
Note: LDA also stands for Latent Dirichlet Allocation - a generative probabilistic model (to find topics in texts).
Regression regress vs proress, simplify (for example, from 100 (x,y) pairs to 2 numbers (slope, intercept))
Linear regression OLS = Ordinary Least Squares
Logistic regression

classification : 1 var to 1 binary, multi-var to 1 binary,or multi-var to several classes (multinomial)
We model log-likelihood as a linear combination of some predictors:
  logit(p) = log(p/(1-p)) = b0 + b1*x1 + b2*x2 + . . . + bN*xN

Bayesian theorem / approach  
K Nearest Neighbors (KNN) - tuning "K"  
K-means clustering  
SVM = Support Vector Machines  
Decision Tree Ensemble Methods, bagging, boosting, Random Forest, XGBoost

     training and test data, overfitting
     Regularization (ridge regression, LASSO)
       dropout, adding noise

     bias-variance tradeoff
     Determining feature importance

     Supervised vs unsupervised Machine Learning

     Stochastic Gradient Descent
NLP = Natural Language Processing
sentiment analysis
text as a 'bag of words'
TF-IDF = Term frequency–inverse document frequency


Outlier/anomaly Detection

     ROC curve = Receiver Operating Characteristic curve
        True Positive Rate vs False Positive Rate (TPR vs FPR)
     Precision P = 1-FPR
     Recall = TPR
     F1 score = 2/(1/TPR + 1/P)   

     confusion matrix = actual (0,1) vs predicted (0,1)

                | predicted No | predicted Yes
    Actual No   | TN=50        |  FP=10    
    Actual Yes  | FN=5         |  TP=100

    TP, TN, FP, FN = True/False Positive/Negative

ML/DL libraries and tools:
     Scikit-learn (sklearn) -
     TF = TensorFlow -
     PyTorch -
     Keras -
     MXNet (Apache DL library) - 
     fastText -
     NLTK - Natural Language Toolkit -
     CNTK = Microsoft Cognitive Toolkit - 
     H2O - - parallel training and execution on cloud
     SageMaker - - on Amazon Cloud
     Google AI - - Cloud AutoML, Cloud Machine Learning Engine, Cloud TPUs, BigQuery ML, ...
     Nicrosoft Azure ML - - 

     Daily time-series - detecting seasonality with Fourier Transforms
     removing trend
     Moving-Window Averages

     Recommendation Engine
     Collaborative Filtering

     Neural Networks (NN)
     perceptron, multilayer perceptron   
     The XOR problem, hidden layers, non-linearity, ReLU
     Feed-Forward Network
     nodes, connections, weights, biases, activations

     learning as optimization problem
     cost function, objective function, loss function, regret
     back-propagation, SGD (Stochastic Gradient Descent)

     Boltzman Machine
     RBM (Restricted Boltzman Machine, 2006), 
     Deep Belief Network
CNN - Convolutional Neural Network
        layers: convolutional, pooling, fully-connected
     Short overview and comparison of CNNs:

 - LeNet-5 (1998 - by Yann LeCun, 60K parameters), 
       MNIST database - handwritten digits (60K training, 10K testing) 
       ImageNet - 14 Mln images, 1000 classes
 - AlexNet (2012 - Alex Krizhevsky, Geoffrey Hinton, and Ilya Sutskever, 60 Mln parameters)
 - VGGNet (2014, 138 Mln parameters)
 - GoogLeNet (2014, inception blocks, 19 layers, 4 Mln parameters)
 - ResNet (2015 - Residual Neural Network, 152 layers, 25 Mln parameters, Microsoft Research)
 - Google AutoML, NASNet architecture (2017 - ) 

YOLO - You Look Only Once - real time object detection
Image segmentation (semantic segmentation)
U-Net: Convolutional Networks for Biomedical Image Segmentation
Variational AutoEncoder

word2vec, embeddings (king - man + woman ~= queen)
RNN - Recurrent NN
BRNN - Bi-directional RNN Exploding/Vanishing Gradient problem LSTM - Long Short Term Memory (1997 - Sepp Hochreiter and Jürgen Schmidhuber) GRU - Gated Recurrent Unit (2014, Univ. of Montreal, Canada) - simpler than LSTM Attention (2014) - Atention Is All You Need (2017, Transformer) - Read this: - Google Translate: - - 31 authors
deep LSTM network
8 encoder and 8 decoder layers
parallelism - decreasing training time
using attention and residual connections
attention mechanism - connects the bottom layer of the decoder to the top layer of the encoder
low-precision arithmetic for inference computations Chatbots: (Natural Language Comprehension => Understanding Intent => Action) Amazon Connect + Amazon Lex Google Dialog Flow

     regularization - overfitting & dropout
     Changing learing rate dynamically , ADAM optimizer
     batch-processing, minibatch, batch-size, SGD

     Cross Entropy 
     Softmax classification (uses same formula, but different meaning)
     KL-divergence (Kullback–Leibler divergence, also called relative entropy) 

     GANs (Generative Adversarial Networks)
     Generative Adversarial Examples

Reinforcement Learning, Deep Reinforcement Learning
       Agent, Policy, Reward, Regret

AlphaGo, DeepMind

multi-armed bandit problem - a fixed limited set of resources 
       must be allocated between competing (alternative) choices 
       in a way that maximizes their expected gain, 
       when each choice's properties are only partially known 
       at the time of allocation, and may become better understood 
       as time passes or by allocating resources to the choice.
       The name comes from imagining a gambler at a row of 
       slot machines (sometimes known as "one-armed bandits").


Good courses about Deep Learning (DL) and ML:

About Coursera:

================================================================ - 5 courses - all videos on youtube:

courses 1,2,3 of 5 - 98 videos:
  course 1: video 1-41
  course 2: video 41-70
  course 3: video 71-98

course 4 of 5 - 43 videos
Convolutional Neural Network (CNN)

course 5 of 5 - 33 videos
Recurrent Neural Networks (RNN)
 - videos on youtube (parts 1,2 - 14 videos):

You can also read this book:
or find youtube videos where people discuss chapters of this book.
The book is comprehensive - but difficult to read.
You will need to do a lot of internet browsing to clarify things.

Audio interviews:
   (TWiML&AI) podcast
   - -

For Russian speaking - good channel on Youtube

Google's Tensor Processing Units (TPUs):

Here are some links related to ML & AI

Nice 2-y old tutorial with pictures:

New online publication:

Christopher Olah has a great blog with
very clear explanations of DL concepts

Nice short online book about DL:

Good 1-hr lecture by Yann LeCun:

Stanford - 15 lectures ( CS231n ) Fei-Fei Li & Andrej Karpathy & Justin Johnson

Oxford - Deep Learning lectures - Nando de Freitas

Hinton lectures (Neural Networks for Machine Learning)

Ian Goodfellow PhD Defense Presentation

GANs - short 5min video by Siraj Raval (he has lots of videos)

The Great A.I. Awakening - by Gideon Lewis-Kraus, Dec. 14, 2016

Google Translate research paper (Sep 2016)

Andrew Ng: Artificial Intelligence is the New Electricity

Andrew Ng - The State of Artificial Intelligence (Dec 15, 2017)

Follow on Facebook:
 - Yann LeCun
 - Adversarial Training
 - Deep AI
 - Deep Learning Patterns, Methodology and Strategy
 - Montreal.AI
 - …

Newsletters, meetups, courses:
 - NYC Artificial Intelligence & Deep
 - - (in Russian)
 - ...

TensorFlow is an open source software library
Written by Google Brain Team (C++, Python)
Available for Linux, Mac, Windows

Deep Learning Frameworks Compared (5:10):
TFLearn – a beginners wrapper around TensorFlow: