|
home > Data Science, Machine Learning & Artificial Intelligence |
|
|
Data Science, Machine Learning & Artificial Intelligence
On This Page |
More |
Other Pages |
- intro -
- x -
- x - |
-
|
-
|
How to learn "Machine Learning" and "Artificial Intelligence"
updated September 10, 2018
All my slides are here:
https://goo.gl/3v8DAS
I have uploaded some stuff here:
http://myhash.com/ai/
===========================================
Unix:
You need basic working knowledge of Unix / Linux.
- buy yourself a Macbook and learn to work from terminal.
- book - Learning the UNIX Operating System O'Reilly
- tutorials on youtube
grep, find, set vs env, running script vs sourcing script
- vi editor:
- http://www.levselector.com/vi.html
- tutorials on youtube
===========================================
Python:
anaconda: https://www.anaconda.com/download/
ipython,
ipython notebook (Jupyter)
python versions 2.7 vs 3.x
numpy, pandas DataFrame, pd.read_csv(file), df.to_csv(file)
===========================================
Math:
- calculus:
e = 2.71828... lim( (1+1/n)**n ) for large n
derivatives and integrals,
Dirac Delta Function,
multivariate calculus, partial derivatives,
gradient,
- linear algebra:
vector, matrix, dot-product of vectors and matrices,
vector spaces, base vectors, matrices of space transformations,
determinant, rank,
eigen vectors, eigen values,
inverse matrix,
tensor
===========================================
Probability & Statistics:
- probability:
definition,
combinatorics, subsets, C(n,m) = n!/(m!*(n-m)!)
Venn Diagrams,
Conditional Probability, Bayes Theorem,
random variable, probability distribution, expected value,
variance, standard deviation,
discrete and continous cases,
PDF (Probability Density Function) vs Cumulative Probability Function ( CPF),
Binomial Distribution,
Poisson and Exponential Distributions,
Uniform Distribution,
Central Limit Theorem and Normal (Gaussian) Distribution
exp(-x^2/2)
- statistics:
Sample mean (average), Sample Standard Deviation (why /(N-1) ?), Median
Linear Regression - draw line through points,
OLS = Ordinary Least Squares (minimizing quadratic error),
R2 = Coefficient of Determination
Confidence interval, z-score
not needed: Student’s t-distribution, Chi-squared distribution, F-distribution, Gamma distribution
Stochastic Processes, Time Series Analysis,
Random Walk, Brownian motion, Diffusion,
Poisson Process/distribution, exponential distribution
white noise, gaussian noise
Markov Process
Monte Carlo method, MCMC (Markov Chain Monte Carlo)
Correlation function, Autocorrelation
Fourier Analysis, filtering to reduce noise
Extracting Signal from Noise by synchronization (S/N improves ~sqrt(N)),
===========================================
You need to learn the meaning of these words:
Data Science (DS) - use computer to process data from different sourceess
(CSV files, Databases), apply statistics, make graphs |
Machine Learning (ML) - subset of DS to extract patterns from data
to do predictions.
ML Example - Linear Regression (draw streight line through points)
Input data array of points data = [(x0,y0), (x1,y1), ...]
model function
def linear_regression(x, [a, b]):
y = a*x + b
return y
training of the model: find two numbers (slope a and intercept b)
which do best fit between the model and data.
(minimize the error)
ML - many types of models (linear regression and logistic regression,
support vector machines, K-nearest neighbors, K-means clustering,
decision trees (RandomForest, XGBoost, etc.), Neural Networks, etc.) |
Deep Learning (DL) - ML implemented using multi-layered structures (Networks) |
Artificial Intelligence (AI) - DL algorithm trained to perform function
which are usually associated only with humans
(vision, speech comprehension, autonomous driving, etc.) |
data cleaning, scaling, normalization
synthetic data augmentation
highly-unbalanced data (minority & majority class, imputing data in minority class)
sparse matrix, sparse matrix data representation (Yale format)
- https://en.wikipedia.org/wiki/Sparse_matrix
feature engineering (extraction/preparation)
Dimensional Reduction,
|
|
PCA (Principal Components Analysis) |
|
LDA (Linear Discriminant Analysis) |
statistical method to find a linear combination of features to achieve separation of two or more classes.
Used for dimensionality reduction before classification. Similar to PCA (Principal Component Analysis).
Note: LDA also stands for Latent Dirichlet Allocation - a generative probabilistic model (to find topics in texts). |
Regression |
regress vs proress, simplify (for example, from 100 (x,y) pairs to 2 numbers (slope, intercept)) |
Linear regression |
OLS = Ordinary Least Squares |
Logistic regression |
classification
:
1 var to 1 binary, multi-var to 1 binary,or
multi-var to several classes (multinomial)
We model log-likelihood
as a linear combination of some predictors:
logit(p) = log(p/(1-p)) = b0 + b1*x1 + b2*x2 + . . . + bN*xN |
Propensity |
|
Bayesian theorem / approach |
|
K Nearest Neighbors (KNN) - tuning "K" |
|
K-means clustering |
|
SVM = Support Vector Machines |
|
Decision Tree |
Ensemble Methods, bagging, boosting, Random Forest, XGBoost |
softmax
training and test data, overfitting
Regularization (ridge regression, LASSO)
dropout, adding noise
bias-variance tradeoff
Determining feature importance
Supervised vs unsupervised Machine Learning
Stochastic Gradient Descent
NLP = Natural Language Processing
sentiment analysis
text as a 'bag of words'
TF-IDF = Term frequency–inverse document frequency |
---
Classifier
Outlier/anomaly Detection
ROC curve = Receiver Operating Characteristic curve
True Positive Rate vs False Positive Rate (TPR vs FPR)
Precision P = 1-FPR
Recall = TPR
F1 score = 2/(1/TPR + 1/P)
confusion matrix = actual (0,1) vs predicted (0,1)
------------+--------------+---------------
| predicted No | predicted Yes
------------+--------------+---------------
Actual No | TN=50 | FP=10
Actual Yes | FN=5 | TP=100
------------+--------------+---------------
where
TP, TN, FP, FN = True/False Positive/Negative |
ML/DL libraries and tools:
Scikit-learn (sklearn) - http://scikit-learn.org/stable/
TF = TensorFlow - https://www.tensorflow.org/
PyTorch - https://pytorch.org/
Keras - https://keras.io/
XGBoost - https://xgboost.ai/
MXNet (Apache DL library) - https://mxnet.apache.org/
fastText - https://fasttext.cc/
NLTK - Natural Language Toolkit - https://www.nltk.org/
CNTK = Microsoft Cognitive Toolkit - https://cntk.ai/
H2O - https://www.h2o.ai/ - parallel training and execution on cloud
SageMaker - https://aws.amazon.com/sagemaker/ - on Amazon Cloud
Google AI - https://cloud.google.com/products/ai/ - Cloud AutoML, Cloud Machine Learning Engine, Cloud TPUs, BigQuery ML, ...
Nicrosoft Azure ML - https://azure.microsoft.com/en-us/overview/machine-learning/ -
---
Daily time-series - detecting seasonality with Fourier Transforms
removing trend
Moving-Window Averages
Recommendation Engine
Collaborative Filtering
Neural Networks (NN)
perceptron, multilayer perceptron
The XOR problem, hidden layers, non-linearity, ReLU
Feed-Forward Network
nodes, connections, weights, biases, activations
learning as optimization problem
cost function, objective function, loss function, regret
back-propagation, SGD (Stochastic Gradient Descent)
Boltzman Machine
RBM (Restricted Boltzman Machine, 2006),
Deep Belief Network
CNN - Convolutional Neural Network
layers: convolutional, pooling, fully-connected
Short overview and comparison of CNNs:
- https://medium.com/@sidereal/cnns-architectures-lenet-alexnet-vgg-googlenet-resnet-and-more-666091488df5
- LeNet-5 (1998 - by Yann LeCun, 60K parameters),
MNIST database - handwritten digits (60K training, 10K testing)
ImageNet - 14 Mln images, 1000 classes
- AlexNet (2012 - Alex Krizhevsky, Geoffrey Hinton, and Ilya Sutskever, 60 Mln parameters)
- VGGNet (2014, 138 Mln parameters)
- GoogLeNet (2014, inception blocks, 19 layers, 4 Mln parameters)
- ResNet (2015 - Residual Neural Network, 152 layers, 25 Mln parameters, Microsoft Research)
- Google AutoML, NASNet architecture (2017 - https://ai.googleblog.com/2017/11/automl-for-large-scale-image.html )
YOLO - You Look Only Once - real time object detection
Image segmentation (semantic segmentation)
U-Net: Convolutional Networks for Biomedical Image Segmentation
|
AutoEncoder
Variational AutoEncoder
seq2seq word2vec, embeddings (king - man + woman ~= queen)
|
RNN - Recurrent NN BRNN - Bi-directional RNN
Exploding/Vanishing Gradient problem
LSTM - Long Short Term Memory (1997 - Sepp Hochreiter and Jürgen Schmidhuber)
GRU - Gated Recurrent Unit (2014, Univ. of Montreal, Canada) - simpler than LSTM
Attention (2014) - https://arxiv.org/abs/1409.0473
Atention Is All You Need (2017, Transformer) - https://arxiv.org/abs/1706.03762
Read this:
- https://machinelearningmastery.com/how-does-attention-work-in-encoder-decoder-recurrent-neural-networks/
Google Translate:
- https://arxiv.org/pdf/1609.08144v2.pdf - 31 authors deep LSTM network 8 encoder and 8 decoder layers parallelism - decreasing training time using attention and residual connections attention mechanism - connects the bottom layer of the decoder to the top layer of the encoder low-precision arithmetic for inference computations
Chatbots:
(Natural Language Comprehension => Understanding Intent => Action)
Amazon Connect + Amazon Lex
Google Dialog Flow
|
regularization - overfitting & dropout
Changing learing rate dynamically , ADAM optimizer
batch-processing, minibatch, batch-size, SGD
Cross Entropy
Softmax classification (uses same formula, but different meaning)
KL-divergence (Kullback–Leibler divergence, also called relative entropy)
GANs (Generative Adversarial Networks)
Generative Adversarial Examples
Reinforcement Learning, Deep Reinforcement Learning
Agent, Policy, Reward, Regret
AlphaGo, DeepMind
multi-armed bandit problem - a fixed limited set of resources
must be allocated between competing (alternative) choices
in a way that maximizes their expected gain,
when each choice's properties are only partially known
at the time of allocation, and may become better understood
as time passes or by allocating resources to the choice.
The name comes from imagining a gambler at a row of
slot machines (sometimes known as "one-armed bandits"). |
===========================================
Good courses about Deep Learning (DL) and ML:
- https://www.coursera.org/specializations/deep-learning
- https://www.coursera.org/specializations/machine-learning-tensorflow-gcp
- http://www.fast.ai
- https://www.udacity.com/course/deep-learning--ud730
etc.
About Coursera:
- https://www.ted.com/talks/daphne_koller_what_we_re_learning_from_online_education
================================================================
deeplearning.ai - 5 courses - all videos on youtube:
courses 1,2,3 of 5 - 98 videos:
- https://www.youtube.com/watch?v=7PiK4wtfvbA&list=PLBAGcD3siRDguyYYzhVwZ3tLvOyyG5k6K
course 1: video 1-41
course 2: video 41-70
course 3: video 71-98
course 4 of 5 - 43 videos
Convolutional Neural Network (CNN)
- https://www.youtube.com/watch?v=Z91YCMvxdo0&list=PLBAGcD3siRDjBU8sKRk0zX9pMz9qeVxud
course 5 of 5 - 33 videos
Recurrent Neural Networks (RNN)
- https://www.youtube.com/watch?v=5Vl-bK7tfD8&list=PLBAGcD3siRDittPwQDGIIAWkjz-RucAc7
fast.ai videos on youtube (parts 1,2 - 14 videos):
- https://www.youtube.com/watch?v=IPBSB1HLNLo&list=PLCdvEQLhYkYmKTKWTrH7bHtQ1CsKZaQBl
================================================================
You can also read this book:
- http://www.deeplearningbook.org
or find youtube videos where people discuss chapters of this book.
The book is comprehensive - but difficult to read.
You will need to do a lot of internet browsing to clarify things.
Audio interviews:
(TWiML&AI) podcast
- https://twimlai.com -
For Russian speaking - good channel on Youtube
- https://www.youtube.com/watch?v=MYp3OwkiJAs
- https://www.youtube.com/channel/UCQj_dwbIydi588xrfjWSL5g/videos
Google's Tensor Processing Units (TPUs):
- https://www.wired.com/2017/05/google-rattles-tech-world-new-ai-chip/
Here are some links related to ML & AI
Nice 2-y old tutorial with pictures:
- http://www.iro.umontreal.ca/%7Ebengioy/talks/DL-Tutorial-NIPS2015.pdf
New online publication:
- http://distill.pub
Christopher Olah has a great blog with
very clear explanations of DL concepts
- http://colah.github.io/
- https://github.com/colah/
Nice short online book about DL:
- http://neuralnetworksanddeeplearning.com/index.html
Good 1-hr lecture by Yann LeCun:
- https://www.youtube.com/watch?v=IbjF5VjniVE
Stanford - 15 lectures ( CS231n ) Fei-Fei Li & Andrej Karpathy & Justin Johnson
- https://www.youtube.com/channel/UC2__PIf36huAgKFumlOIs6A
Oxford - Deep Learning lectures - Nando de Freitas
- https://www.youtube.com/user/ProfNandoDF/videos
Hinton lectures (Neural Networks for Machine Learning)
- https://www.youtube.com/user/colinmcd94/videos
Ian Goodfellow PhD Defense Presentation
- https://www.youtube.com/watch?v=ckoD_bE8Bhs
GANs - short 5min video by Siraj Raval (he has lots of videos)
- https://www.youtube.com/watch?v=deyOX6Mt_As
The Great A.I. Awakening - by Gideon Lewis-Kraus, Dec. 14, 2016
- https://www.nytimes.com/2016/12/14/magazine/the-great-ai-awakening.html
Google Translate research paper (Sep 2016)
- https://arxiv.org/pdf/1609.08144v2.pdf
Andrew Ng: Artificial Intelligence is the New Electricity
- https://www.youtube.com/watch?v=21EiKfQYZXc
Andrew Ng - The State of Artificial Intelligence (Dec 15, 2017)
- https://www.youtube.com/watch?v=NKpuX_yzdYs
Follow on Facebook:
- Yann LeCun
- Adversarial Training
- Deep AI
- Deep Learning Patterns, Methodology and Strategy
- Montreal.AI
- …
Newsletters, meetups, courses:
- https://opendatascience.com/
- http://DataScienceWeekly.org
- NYC-Machine-Learning-list@meetup.com
- NYC Artificial Intelligence & Deep Learning@meetup.com
- http://machinelearningmastery.com
- http://byteacademy.co/all-courses/data-science-mini-courses/
- https://www.coursera.org/courses/?languages=en&query=deep+learning
- https://www.udacity.com/course/deep-learning--ud730
- https://www.youtube.com/watch?v=MYp3OwkiJAs - (in Russian)
- ...
TensorFlow is an open source software library
Written by Google Brain Team (C++, Python)
Available for Linux, Mac, Windows
Deep Learning Frameworks Compared (5:10):
- https://www.youtube.com/watch?v=MDP9FfsNx60
TFLearn – a beginners wrapper around TensorFlow:
- http://tflearn.org