AMS 598 | Applied Mathematics & Statistics

AMS 598, Big Data Analysis

This course introduces the application of the supercomputing to statistical data analyses, particularly on big data. Implementations of various statistical methodologies within parallel computing framework are demonstrated through all lectures. The course will cover (1) parallel computing basics, including architecture on interconnection networks, communications methodologies, algorithm and performance measurements, and (2) their applications to modern data mining techniques, including modern variable selection/Dimension reduction, linear/logistical regression, tree-based classification methods, Kernel-based methods, non-linear statistical models, and model inference/Resampling methods.

Prerequisite: AMS 507, AMS 580 and AMS 597
3 credits, ABCF grading

Text:

"Applied Parallel Computing"; by Yuefan Deng; 2012; World Scientific Publishing Company; ISBN: 9789814307604 (recommended/optional)

"The Elements of Statistical Learning: Data Mining, Inference, and Prediction", by Trevor Hastie, Robert Tibshirani, and Jerome Friedman; Second Edition; 2011; Springer Series in Statistics; ISBN: 9780387848570 (hardcover) (recommended/optional)

"Mining of Massive Datasets", by Jure Leskovec, Anand Rajaraman, Jeffrey David Ullman; 2nd edition, 2014; Cambridge University Press; ISBN: 9781107077232 (recommended/optional)

Offered every Fall semester

Learning Outcomes:

Demonstrate knowledge of parallel computing basics:

Node architecture, central processing units, and accelerators;
Distributed – and shared-memory

Demonstrate skills with software architecture and R:

Communication patterns and protocols;
Process creation and management;
Mapreduce framework;
Hapdoop in R;
Demonstrate mastery of basic tools for big data analysis:
Linear regression
Logistic regression
Dimension reduction

Demonstrate understanding of advanced methods for big data analysis:

Classification and regression trees
Random forest
Gradient boosting
Support vector machine
Neural network

Demonstrate understanding of model selection and performance evaluation:

Best subset; forward selection; backward selection
Cross-validation
Bootstrap

Graduate Courses

AMS 500
AMS 501
AMS 502
AMS 503
AMS 506
AMS 507
AMS 510
AMS 511
ams 512
AMS 513
AMS 514
AMS 515
AMS 516
AMS 517
aMS 518
AMs 519
AMS 520
AMS 522
AMS 523
AMS 526
AMS 527
AMS 528
AMS 530
AMS 531
AMS 532
AMS 533
AMS 534
AMS 535
AMS 536
AMS 537
AMS 539
AMS 540
AMS 542
AMS 544
AMS 545
AMS 546
AMS 547
AMS 548
AMS 549
AMS 550
AMS 552
AMS 553
AMS 554
AMS 555
AMS 556
AMS 559
AMS 560
AMS 561
AMS 562
AMS 565
AMS 566
AMS 569
AMS 570
AMS 571
AMS 572
AMS 573
AMS 575
AMS 577
AMS 578
AMS 580
AMS 582
AMS 583
AMS 585
AMS 586
AMS 587
AMS 588
AMS 589
AMS 591
AMS 593
AMS 595
AMS 596
AMS 597
AMS 598
AMS 599
AMS 600
AMS 601
AMS 603
AMS 676
AMS 683
AMS 691
ams 698
AMS 699
ams 700
AMS 701
AMS 800
Plagiarism & integrity of science slides of Richard Clark

Graduate Courses

AMS 500
AMS 501
AMS 502
AMS 503
AMS 506
AMS 507
AMS 510
AMS 511
ams 512
AMS 513
AMS 514
AMS 515
AMS 516
AMS 517
aMS 518
AMs 519
AMS 520
AMS 522
AMS 523
AMS 526
AMS 527
AMS 528
AMS 530
AMS 531
AMS 532
AMS 533
AMS 534
AMS 535
AMS 536
AMS 537
AMS 539
AMS 540
AMS 542
AMS 544
AMS 545
AMS 546
AMS 547
AMS 548
AMS 549
AMS 550
AMS 552
AMS 553
AMS 554
AMS 555
AMS 556
AMS 559
AMS 560
AMS 561
AMS 562
AMS 565
AMS 566
AMS 569
AMS 570
AMS 571
AMS 572
AMS 573
AMS 575
AMS 577
AMS 578
AMS 580
AMS 582
AMS 583
AMS 585
AMS 586
AMS 587
AMS 588
AMS 589
AMS 591
AMS 593
AMS 595
AMS 596
AMS 597
AMS 598
AMS 599
AMS 600
AMS 601
AMS 603
AMS 676
AMS 683
AMS 691
ams 698
AMS 699
ams 700
AMS 701
AMS 800
Plagiarism & integrity of science slides of Richard Clark