Mining Massive Data Sets


Stanford School of Engineering



The importance of data to business decisions, strategy and behavior has proven unparalleled in recent years. Predictive analytics, data mining and machine learning are tools giving us new methods for analyzing massive data sets. Companies place true value on individuals who understand and manipulate large data sets to provide informative outcomes.

Pivotal issues pertaining to mining massive data sets will range from how to deal with huge document databases and infinite streams of data to mining large social networks and web graphs. An emphasis will be on Map Reduce as a tool for creating parallel algorithms that can process very large amounts of data.

Practical hands-on experience will entail the design of algorithms for analyzing very large amounts of data and to learn existing data mining and machine learning algorithms. As a useful analytic tool, case studies will provide first-hand insight into how big data problems and their solutions allow companies like Google to succeed in the market. 


At least one: Computer Organizations & Systems (CS107) or Introduction to Databases (CS145) or equivalent


At least one: Intro to Probability for Computer Scientists (CS109) or Theory of Probability (STATS116) or equivalent

Topics include

  • Big data systems like Hadoop, Spark and Hive
  • Link analysis such as PageRank, spam detection and hubs-and-authorities
  • Similarity search such as locality-sensitive hashtag and random hyperplanes
  • Stream data processing
  • Algorithms for large-scale mining
  • Large-scale machine learning 
  • Submodular function optimization
  • Computational advertising

Course Availability

The course schedule is displayed for planning purposes – courses can be modified, changed, or cancelled. Course availability will be considered finalized on the first day of open enrollment. For quarterly enrollment dates, please refer to our graduate education section.

Pre-register Now

Dates:March 29 - June 4, 2021
Units: 3.00-4.00
Instructors: Jure Leskovec
Delivery Option:
For Credit $4,056.00-$5,408.00
Notes: Pre-registration Dates: February 1, 2021 at 9:00am to March 12, 2021 at 5:00pm

Computer Science Department Requirement
Students taking graduate courses in Computer Science must enroll for the maximum number of units and maintain a B or better in each course in order to continue taking courses under the Non Degree Option.

Pre-registration for this course will secure your enrollment request and ensure timely processing of your application for potential course approval. Please note: course enrollment will be confirmed after March 19, 2021; after completing your pre-registration, no further action is required on your part.


This course may not currently be available to learners in some states and territories.