Skip Navigation
Search

AMS 380, Data Mining

Catalog Description

This course will teach the basic ingredients of classical and contemporary statistical data mining methods including dimension reduction, model selection, pattern recognition, and predictive modeling using traditional general linear models and generalized linear models, and modern statistical learning methods such as decision tree, random forest, neural networks, etc.  We will also teach how to run these procedures with the programming language Python. 


Prerequisite: AMS 210 or MAT 211; and AMS 311

3 credits
Offered initially spring 2021; thereafter, spring, summer and fall.

Course Materials for Fall 2024:

Required:

"An Introduction to Statistical Learning (with Applications in Python)" by Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani; Springer Publishing, 1st printing July 5, 2023; ISBN13: 9783031387463.  This comprehensive resource is essential for understanding statistical learning techniques and their implementation in Python. It is available at https://www.statlearning.com.

"Hands-on Machine Learning with Scikit-Learn, Keras and TensorFlow" (3rd edition) by Aurélien Geron; published by O'Reilly on November 8, 2022; ISBN: 978-1098125974. This book is highly recommended for students interested in practical machine learning projects using popular Python libraries.  It serves as an excellent companion for applying concepts learned in class.


Recommended Materials:  

"Introduction to Applied Linear Algebra – Vectors, Matrices, and Least Squares" by Stephen Boyd and Lieven Vandenberghe, Cambridge University Press, 1st edition, Published 2018; ISBN: 978-1316518960.  This textbook provides a solid foundation in applied linear algebra, crucial for machine learning applications. The material, along with slides and video lectures, can be found here: https://web.stanford.edu/~boyd/vmls/.

 

SYLLABUS

  1. Some basic statistical tests
  2. Linear regression and classic variable selection
  3. Regularized linear regression
  4. General linear model
  5. Cluster analysis
  6. Principle Component Analysis
  7. Statistical Resampling methods
  8. Random Forests
  9.  Neural Networks

 

Learning Outcomes for AMS 380, Data Mining:

1) Demonstrate understanding of classical and contemporary data mining methods including:
         *Dimension reduction;
         *Variable selection;
         *Pattern recognition.

2) Demonstrate understanding of predictive modeling using:
          *Traditional linear models;
          *Generalized linear models.

3) Demonstrate understanding of modern statistical learning models, including:
           *Classification and regression trees;
           *Random forests;
           *Neural networks.

4) Demonstrate mastery of using these statistical procedures with the programming languages:
             * Python.