All are welcome including post high school students.
Familiarity with a programming language (Python, etc.). Strong general math skills.
Gain understanding of fundamental concepts. Acquire skills to apply relevant theories and models in practical situations to solve data problems.
1) Introduction to Data Science
AI: Artificial Intelligence
Data Scientist
Big Data
Languages
Trends
Tools
Lab: Introduction to iPython
2) Models
Machine Learning
Modeling principles
Under-fitting, over-fitting
Lab: Model Complexity
3) Data Science Process
Data Types
Data Manipulations (Cleaning, ETL, Reduction, Transformation, etc.)
Encodings
Lab: Data Analysis
4) Visualizations
Data Presentation
Dishonest Charts
Lab: Visualizations
5) Probability & Statistics
Famous Probability Problems
Statistical Formulas
Markov Models
Hidden Markov Models
Monte Carlo Methods
Lab: Probability & Statistics
6) Statistical Distributions / Models
Statistical Models
Distributions (Bernoulli, Binomial, Poisson, etc.)
Normal Distribution
Beta Distribution
Student’s t-Distribution
Sampling Methods
Bias-Variance Trade-off
Statistical Significance
Confidence Interval
z-test
Student’s t-test
(Statistical) Power
Lab: Statistical Distributions
7) Linear Models
Linear Regression
Lasso Regression
L1 & L2 Regularization
Logistic Regression
Lab: Linear Models
8) Dimensionality Reduction
Curse of Dimensionality
PCA: Principal Component Analysis
SVD: Singular Value Decomposition
t-SNE: t-Distributed Stochastic Neighbor Embedding
Lab: Dimensionality Reduction
9) Supervised Learning
Perceptron
kNN: k-Nearest Neighbors
SVM: Support Vector Machines
Multi-Class Classification
Decision Tree (Entropy, Impurity, Information Gain, etc)
Mutual Information
MIC: Maximal Information Coefficient
Importance, Relevance and Error Measures
- Confusion Matrix, ROC Curve, AUC, Precision & Recall, F-score, tf-idf, etc.
A/B testing
Lab: Supervised Learning
10) Time Series Analysis
Lab: Time Series Analysis
11) Bayesian Methods
Bayes' Theorem
Bayesian Reasoning
Naive Bayes Classifier
Multi-Armed Bandit
Lab: Bayesian Methods
12) Unsupervised Learning
k-Means Clustering
Expectation Maximization
Lab: k-Means Clustering
13) Ensemble Methods
Bootstrapping
Bagging
Boosting
Random Forest
Unbalanced Classes
Lab: Ensemble Methods
14) Interesting Laws, Paradoxes and Models
Benford’s Law
Power Law
Class Size Paradox
Small World Models
Community Detection Algorithms
Lab: Interesting Laws, Paradoxes and Models
15) Deep Learning
Loss/Error Functions
SGD: Stochastic Gradient Descent
FFN: Feed Forward Neural Networks
CNN: Convolutional Neural Networks
RNN: Recurrent Neural Networks
LSTM
Auto-Encoders
GAN: Generative Adversarial Net(work)s
Data Augmentation
Dropout-Regularization
Transfer Learning
Tensorflow
Keras
Lab: Deep Learning
16) Advanced Topics
Recommender Systems
Market Basket Analysis
Natural Language Processing (NLP)
Reinforcement Learning (RL)
Algorithm Complexity
Big Data
Map-Reduce
Cloud (IBM Bluemix, Microsoft Azure, AWS, Google Cloud, etc.)
Lab: Hadoop, Spark