Mining Massive Data Sets Hadoop Lab


Stanford School of Engineering



Establish a solid framework for data mining by taking advantage of this lab course, which builds on the MapReduce framework Hadoop introduced in the first part of Mining Massive Data Sets, CS246. Hadoop will be covered in depth to give students a more complete understanding of the platform and its role in data mining and machine learning. This is a partner course to CS246 and does not include additional assignments.

What you will learn

  • Implement data mining algorithms discussed in CS246 using Hadoop
  • Implement and debug complex MapReduce jobs in Hadoop
  • Use some of the tools in the Hadoop ecosystem for data mining and machine learning


Enrolled in CS246, and Computer Organizations & Systems (Stanford Course CS107) or equivalent.

Topics include

  • Hadoop
  • MapReduce
  • Hive
  • Cloudera ML/Oryx
  • Mahout
  • TF-IDF
  • Pig, Sqoop, Oozie, HBase and Impala

Course Availability

The course schedule is displayed for planning purposes – courses can be modified, changed, or cancelled. Course availability will be considered finalized on the first day of open enrollment. For quarterly enrollment dates, please refer to our graduate education section.

Thank you for your interest. The course you have selected is not open for enrollment. Please click the button below to receive an email when the course becomes available again.

Request Information