Skip to content Skip to navigation

Mining Massive Data Sets

Date: 
Saturday, September 12, 2015 to Saturday, October 31, 2015
Go to Course

About the Course

We introduce the participant to modern distributed file systems and MapReduce, including what distinguishes good MapReduce algorithms from good algorithms in general.  The rest of the course is devoted to algorithms for extracting models and information from large datasets.  Participants will learn how Google's PageRank algorithm models importance of Web pages and some of the many extensions that have been used for a variety of purposes.  We'll cover locality-sensitive hashing, a bit of magic that allows you to find similar items in a set of items so large you cannot possibly compare each pair.  When data is stored as a very large, sparse matrix, dimensionality reduction is often a good way to model the data, but standard approaches do not scale well; we'll talk about efficient approaches.  Many other large-scale algorithms are covered as well, as outlined in the course syllabus.

Course Syllabus

Week 1:
MapReduce
Link Analysis -- PageRank

Week 2:
Locality-Sensitive Hashing -- Basics + Applications
Distance Measures
Nearest Neighbors
Frequent Itemsets

Week 3:
Data Stream Mining
Analysis of Large Graphs

Week 4:
Recommender Systems
Dimensionality Reduction

Week 5:
Clustering
Computational Advertising

Week 6:
Support-Vector Machines
Decision Trees
MapReduce Algorithms

Week 7:
More About Link Analysis --  Topic-specific PageRank, Link Spam.
More About Locality-Sensitive Hashing

Recommended Background

A course in database systems  is recommended, as is a basic course on algorithms and data structures.  You should also understand mathematics up to multivariable calculus and linear algebra.

Suggested Readings

There is a free book "Mining of Massive Datasets, by Leskovec, Rajaraman, and Ullman (who by coincidence are the instructors for this course :-).  You can download it at http://www.mmds.org/  Hardcopies can be purchased from Cambridge Univ. Press.

Course Format

There will be about 2 hours of video to watch each week, broken into small segments.  There will be automated homeworks to do for each week, and a final exam.

FAQ

  • Will I get a Statement of Accomplishment after completing this class?

    Yes. Participants who successfully complete the class will receive a Statement of Accomplishment signed by the instructors.  A level designated "distinction" will also be offered.

Instructors

Jure Leskovec, Stanford University

Anand Rajaraman, Stanford University

Jeff Ullman, Stanford University

Mining course feature image

View All Courses

Access learning material from upcoming, self-study, and completed courses...