# Data Science

Data Science course consists of two semester (Fall, Spring) classes. Each class introduces related theoretical concepts and provides a hands on assignment including programming projects.
Audience
All are welcome including post high school students.
Eligibility
Familiarity with a programming language (Python, etc.). Strong general math skills.
Goal
Gain understanding of fundamental concepts. Acquire skills to apply relevant theories and models in practical situations to solve data problems.
Weekly classes are two semester long courses taken in the Fall and Spring terms. Each course is 12-weeks of 2 hours class time. Full urriculum is covered when both courses are taken. There are two weeks where students take exams and solutions are reviewed for practice.
The topics are as follows:

1) Introduction to Data Science

AI: Artificial Intelligence

Data Scientist

Big Data

Languages

Trends

Tools

Lab: Introduction to iPython​

2) Models

Machine Learning

Modeling principles

Under-fitting, over-fitting

Lab: Model Complexity

3) Data Science Process

Data Types

Data Manipulations (Cleaning, ETL, Reduction, Transformation, etc.)

Encodings

Lab: Data Analysis

4) Visualizations

Data Presentation

Dishonest Charts

Lab: Visualizations

5) Probability & Statistics

Famous Probability Problems

Statistical Formulas

Markov Models

Hidden Markov Models

Monte Carlo Methods

Lab: Probability & Statistics

6) Statistical Distributions / Models

Statistical Models

Distributions (Bernoulli, Binomial, Poisson, etc.)

Normal Distribution

Beta Distribution

Student’s t-Distribution

Sampling Methods

Statistical Significance

Confidence Interval

z-test

Student’s t-test

(Statistical) Power

Lab: Statistical Distributions

7) Linear Models

Linear Regression

Lasso Regression

L1 & L2 Regularization

Logistic Regression

Lab: Linear Models

8) Dimensionality Reduction

Curse of Dimensionality

PCA: Principal Component Analysis

SVD: Singular Value Decomposition

t-SNE: t-Distributed Stochastic Neighbor Embedding

Lab: Dimensionality Reduction

9) Supervised Learning

Perceptron

kNN: k-Nearest Neighbors

SVM: Support Vector Machines

Multi-Class Classification

Decision Tree (Entropy, Impurity, Information Gain, etc)

Mutual Information

MIC: Maximal Information Coefficient

Importance, Relevance and Error Measures

- Confusion Matrix, ROC Curve, AUC, Precision & Recall, F-score, tf-idf, etc.

A/B testing

Lab: Supervised Learning

The topics are as follows:

10) Time Series Analysis

Lab: Time Series Analysis

11) Bayesian Methods

Bayes' Theorem

Bayesian Reasoning

Naive Bayes Classifier

Multi-Armed Bandit

Lab: Bayesian Methods

12) Unsupervised Learning

k-Means Clustering

Expectation Maximization

Lab: k-Means Clustering

13) Ensemble Methods

Bootstrapping

Bagging

Boosting

Random Forest

Unbalanced Classes

Lab: Ensemble Methods

14) Interesting Laws, Paradoxes and Models

Benford’s Law

Power Law

Small World Models

Community Detection Algorithms

Lab: Interesting Laws, Paradoxes and Models

15) Deep Learning

Loss/Error Functions

FFN: Feed Forward Neural Networks

CNN: Convolutional Neural Networks

RNN: Recurrent Neural Networks

LSTM

Auto-Encoders

Data Augmentation

Dropout-Regularization

Transfer Learning

Tensorflow

Keras

Lab: Deep Learning

Recommender Systems

Natural Language Processing (NLP)

Reinforcement Learning (RL)

Algorithm Complexity

Big Data

Map-Reduce

Cloud (IBM Bluemix, Microsoft Azure, AWS, Google Cloud, etc.)