Skip to content
Skip to navigation
# Statistics and Data Science

Go to Course## ABOUT THIS COURSE

## REQUIREMENTS

## COURSE STAFF

### Jure Leskovec

### Anand Rajaraman

### Jeffrey D. Ullman

## FREQUENTLY ASKED QUESTIONS

### Do I need to buy a textbook?

### How much work is expected?

### Will statements of accomplishment be offered?

Go to Course## Statistical Learning is now self-paced!

## About This Course

## Prerequisites

## Course Staff

### Trevor Hastie

### Rob Tibshirani

## Course Production Team

## Frequently Asked Questions

### Do I need to buy a textbook?

### Is R and RStudio available for free.

### How many hours of effort are expected per week?

### Will I receive a statement of accomplishment?

Go to Course**Course Description**

Go to Course## ABOUT STATISTICAL REASONING

## THE CONTENT

## FREQUENTLY ASKED QUESTIONS

### How much time will it take to complete this course?

### Does this course require any software?

### Does this course offer a Statement of Accomplishment?

Go to Course## ABOUT PROBABILITY AND STATISTICS

## THE CONTENT

## REQUIREMENTS

## FREQUENTLY ASKED QUESTIONS

### How much time will it take to complete this course?

### Does this course require any software?

### Does this course offer a Statement of Accomplishment?

Go to Course## ABOUT THIS COURSE

## PREREQUISITES

## COURSE STAFF

### Trevor Hastie

### Rob Tibshirani

## COURSE PRODUCTION TEAM

## FREQUENTLY ASKED QUESTIONS

### Do I need to buy a textbook?

### Is R and RStudio available for free.

### How many hours of effort are expected per week?

### Will I receive a statement of accomplishment?

Go to Course
## Overview

## Topics Include

## Grading

## Instructors

## Units

## Prerequisites

### Tuition & Fees

## Certificates and Degrees

Go to Course
## Overview

## Topics Include

## Instructors

## Units

## Grading

## Prerequisites

### Tuition & Fees

## Pages

Go to Course
## Overview

## Learn How To

##

## Instructors

### Questions

### Course Preview

## Tuition

## Certificates and Degrees

Course topic:

Now Open!

*Application and fee apply.*

The company that has the most paying customers wins. But how do you get the word out, drive demand for your products and services, and generate sales? Today good marketing involves a clear strategy to reach the target audience, execute appropriate tactics, and measure results. In this course, you will master the fundamentals of outbound and inbound marketing and explore the myriad of options available in today’s world of traditional and social media. Learn how to apply your skills to create a robust and innovative marketing strategy for a new product or a new company.

- Combine traditional, social and mobile media to drive viral demand
- Virality does not just happen, though it may look that way. It generally takes months or years of careful planning and experimentation. Learn how to use product design, outbound and inbound marketing to drive viral demand for a business-to-consumer product. Learn how marketing today requires a thorough understanding of the target market and a multitude of traditional, social and innovative marketing programs.

- Leverage outbound demand generation
- Outbound marketing is what most people think of when they think of marketing. It is the act of “:buying” a prospects attention or seeking them out. Learn how marketers provide air cover through effective PR and Buzz marketing as well as the basics of driving action that results in people buying something.

- Tap inbound demand
- Learn what inbound marketing is all about, how it got started, and what is fundamentally different from the more traditional world of outbound. Explore the new tools marketers now have in hand and are learning how to use every day.

- Use core demand generation principles and guidelines
- Create and use a messaging platform for optimal public relations and buzz marketing.

- Donna Novitsky
*CEO*,*Yiftee* - Lynda Kate Smith
*Lecturer*,*Management Science and Engineering*

Please contact us at 650.741.1630 or

stanford-innovation@stanford.edu

Watch a brief overview of the Creating Demand course

- $995 per online course
- $75 one-time document fee

Date:

Tuesday, October 11, 2016 to Tuesday, December 13, 2016

Course topic:

The course is based on the text Mining of Massive Datasets by Jure Leskovec, Anand Rajaraman, and Jeff Ullman, who by coincidence are also the instructors for the course. The book is published by Cambridge Univ. Press, but by arrangement with the publisher, you can download a free copy at http://www.mmds.org/. The material in this on-ine course closely matches the content of the Stanford course CS246.

The major topics covered include: MapReduce systems and algorithms, Locality-sensitive hashing, Algorithms for data streams, PageRank and Web-link analysis, Frequent itemset analysis, Clustering, Computational advertising, Recommendation systems, Social-network graphs, Dimensionality reduction, and Machine-learning algorithms.

The course is intended for graduate students and advanced undergraduates in Computer Science. At a minimum, you should have had courses in Data structures, Algorithms, Database systems, Linear algebra, Multivariable calculus, and Statistics.

Jure is an associate professor of computer science at Stanford. His research area is mining of large social and information networks. He is the author of the Stanford Network Analysis Platform, a general-purpose network analysis and graph mining library. For more information, see his Home Page.

Anand is a serial entrepreneur, venture capitalist, and academic, based in Silicon Valley. He founded two successful startups, Junglee (acquired by Amazon) and Kosmix (acquired by Walmart). At Amazon, he was co-inventor of Mechanical Turk. Currently, he is a founding partner of Milliways Labs, an early-stage venture-capital firm. For more information, see his Blog, called "Datawocky".

Jeff Ullman is a retired professor of Computer Science at Stanford. His Home Page offers additional information about the instructor.

No. The course follows the text *Mining of Massive Datasets* by Jure Leskovec, Anand Rajaraman, and Jeff Ullman. It is published by Cambridge University Press, but by permission of the publishers, you can download a free copy Here.

The amount of work will vary, depending on your background and the ease with which you follow mathematical and algorithmic ideas. However, 10 hours per week is a good guess.

Yes. You need to get 50% of the marks (half for homework, half for the final). An SoA with Distinction requires 80% of the marks.

Go to Course
**About this course**

**Who is this class for:**

## Taught by:

Course topic:

Learn how to model social and economic networks and their impact on human behavior. How do networks form, why do they exhibit certain patterns, and how does their structure impact diffusion, learning, and other behaviors? We will bring together models and techniques from economics, sociology, math, physics, statistics and computer science to answer these questions. The course begins with some empirical background on social and economic networks, and an overview of concepts used to describe and measure networks. Next, we will cover a set of models of how networks form, including random network models as well as strategic formation models, and some hybrids. We will then discuss a series of models of how networks impact behavior, including contagion, diffusion, learning, and peer influences. You can find a more detailed syllabus here: http://web.stanford.edu/~jacksonm/Networks-Online-Syllabus.pdf You can find a short introductory videao here: http://web.stanford.edu/~jacksonm/Intro_Networks.mp4

**This course starts every four weeks. The next session begins October 10.**

The course is aimed at people interested in researching social and economic networks, but should be accessible to advanced undergraduates and other people who have some prerequisites in mathematics and statistics. For example, it will be assumed that students are comfortable with basic concepts from linear algebra (e.g., matrix multiplication), probability theory (e.g., probability distributions, expected values, Bayes' rule), and statistics (e.g., hypothesis testing). Beyond those concepts, the course is self-contained.

Matthew O. Jackson, Professor, Economics

Date:

Tuesday, June 28, 2016

Course topic:

The active course run for Statistical Learning has ended, but the course is now available in a self paced mode. You are welcome to join the course and work through the material and exercises at your own pace. When you have completed the exercises with a score of 50% or higher, you can generate your Statement of Accomplishment from within the course.

The course will remain available for an extended period of time. We anticipate the content will be available until at least August 2, 2017. You will be notified by email of any changes to content availability beforehand.

This is an introductory-level course in supervised learning, with a focus on regression and classification methods. The syllabus includes: linear and polynomial regression, logistic regression and linear discriminant analysis; cross-validation and the bootstrap, model selection and regularization methods (ridge and lasso); nonlinear models, splines and generalized additive models; tree-based methods, random forests and boosting; support-vector machines. Some unsupervised learning methods are discussed: principal components and clustering (k-means and hierarchical).

This is not a math-heavy class, so we try and describe the methods without heavy reliance on formulas and complex mathematics. We focus on what we consider to be the important elements of modern data analysis. Computing is done in R. There are lectures devoted to R, giving tutorials from the ground up, and progressing with more detailed sessions that implement the techniques in each chapter.

The lectures cover all the material in An Introduction to Statistical Learning, with Applications in R by James, Witten, Hastie and Tibshirani (Springer, 2013). The pdf for this book is available for free on the book website.

First courses in statistics, linear algebra, and computing.

Trevor Hastie is the John A Overdeck Professor of Statistics at Stanford University. Hastie is known for his research in applied statistics, particularly in the fields of data mining, bioinformatics and machine learning. He has published four books and over 180 research articles in these areas. Prior to joining Stanford University in 1994, Hastie worked at AT&T Bell Laboratories for 9 years, where he helped develop the statistical modeling environment popular in the R computing system. He received his B.S. in statistics from Rhodes University in 1976, his M.S. from the University of Cape Town in 1979, and his Ph.D from Stanford in 1984. Professor Hastie is an elected fellow of the Institute of Mathematical Statistics, the American Statistical Association, the International Statistics Institute, the South African Statistical Association and the Royal Statistical Society. He has received a number of awards and honors, including the Myrto Lefkopolous award from Harvard in 1994, the Parzen Prize for Innovation in 2014, and the Distnguished Rhodes University Alumni award in 2015.

Robert Tibshirani is a Professor in the Departments Health Research and Policy and Statistics at Stanford University. In his work he has made important contributions to the analysis of complex datasets, most recently in genomics and proteomics. His most well-known contribution is the Lasso, which uses L1 penalization in regression and related problems. He has co-authored over 200 papers and three books. Professor Tibshirani co-authored the first study that linked cell phone usage with car accidents, a widely cited article that has played a role in the introduction of legislation that restricts the use of phones while driving. He is one of the most widely cited authors in the entire mathematical sciences field. Professor Tibshirani is a Fellow of the American Statistical Association, the Institute of Mathematical Statistics and the Royal Society of Canada. He won the prestigious COPSS Presidents's award in 1996, the NSERC Steacie award in 1997 and was elected to the National Academy of Sciences in 2012.

Will Fithian and Sam Gross produced and formatted the quiz questions and review questions. Daniela Witten helped present some of the material in Chapter 5. Wes Choy managed the video production. Greg Maximov filmed and edited most of the course videos, as well as the interviews and group recordings. Greg Bruhns, Monica Diaz and Marc Sanders assisted with Open edX.

No, a free online version of An Introduction to Statistical Learning, with Applications in R by James, Witten, Hastie and Tibshirani (Springer, 2013) is available from that website. Springer has agreed to this, so no need to worry about copyright. Of course you may not distribiute printed versions of this pdf file.

Yes. You get R for free from http://cran.us.r-project.org/. Typically it installs with a click. You get RStudio from http://www.rstudio.com/, also for free, and a similarly easy install.

We anticipate it will take approximately 3-5 hours per week to go through the materials and exercises in each section.

Yes, if you complete the course, and achieve a passing grade of 50% on the quizzes, you can generate a Statement of Accomplishment from within the course. If you get 90% or higher, your statement will be "with distinction".

Date:

Saturday, March 28, 2015

**This course is offered through Stanford Continuing Studies.**

More and more people are starting to tap into the barely touched opportunities of data. Supporting marketing campaigns with more market data, understanding and preventing product failures with real-time measures, retaining customers with detailed behavior monitoring, or fighting fraud with real-time analysis of hundreds of millions of transactions are among the many examples that demonstrate how pervasive data has become across all lines of business. After years of buzz and mixed results, data technology, management techniques, and processes have gained maturity. Data is now more readily accessible to everyone. In this online course, students will learn how to engage with data and discover concrete and actionable business intelligence techniques to gain immediate control of data and deliver accurate insights, manage change to drive project acceptance, and design lean and sustainable processes. The course will also include detailed case studies and feature expert guest speakers to provide invaluable and fascinating field experience.

*Application and fee apply.*

Date:

Monday, February 1, 2016

Course topic:

This course is self-paced and is provided free of charge. There are no due dates, and students are welcome to work through as much or as little of the material as they wish. There is no instructor involved, and no credit, Statement of Accomplishment, or any type of verification or certification of completion is given. The course is simply here for people who want to learn more about Statistics.

The Statistical Reasoning course contains four main units that have several sections within each unit.

**Exploratory Data Analysis:** This unit is organized into two sections – Examining Distributions and Examining Relationships. The general approach is to provide students with a framework that will help them choose the appropriate descriptive methods in various data analysis situations.

**Producing Data:** This unit is organized into two sections – Sampling and Designing Studies

**Probability:** This course contains a streamlined version of probability that forgoes the classical treatment of probability in favor of an empirical approach using relative frequency. This course includes only those concepts that are necessary to support a conceptual understanding of the role of probability in inference. For the full, classical treatment of Probability, students may see the OLI Probability and Statistics course.

**Inference:** This unit introduces students to the logic as well as the technical side of the main forms of inference: point estimation, interval estimation and hypothesis testing. The unit covers inferential methods for the population mean and population proportion, inferential methods for comparing the means of two groups and of more than two groups (ANOVA), the Chi-Square test for independence and linear regression. The unit reinforces the framework that the students were introduced to in the Exploratory Data Analysis for choosing the appropriate, in this case, inferential method in various data analysis scenarios.

Throughout the course there are many interactive elements. These include: simulations, “walk-throughs” that integrate voice and graphics to explain an example of a procedure or a difficult concept, and, most prominently, interactive activities in which students practice problem solving, with hints and immediate and targeted feedback.

The course is built around a series of carefully devised learning objectives that are independently assessed.

**Note about Probability and Statistics vs. Statistical Reasoning: **One of the main differences between the courses is the path through probability. **Probability and Statistics** includes the classical treatment of probability. **Statistical Reasoning** places less emphasis on probability than does the Probability and Statistics course and takes an empirical approach.

REQUIREMENTS

Knowledge of basic algebra.

This course is designed to be equivalent to one semester of a college statistics course.

No.

No.

Date:

Monday, February 1, 2016

Course topic:

This course is self-paced and is provided free of charge. There are no due dates, and course participants are welcome to work through as much or as little of the material as they wish. There is no instructor involved, and no credit, Statement of Accomplishment, or any type of verification or certification of completion is given. The course is simply here for people who want to learn more about Statistics.

The Probability and Statistics course contains four main units that have several sections within each unit.

**Exploratory Data Analysis:** This unit is organized into two sections – Examining Distributions and Examining Relationships. The general approach is to provide learners with a framework that will help them choose the appropriate descriptive methods in various data analysis situations.

**Producing Data:** This unit is organized into two sections – Sampling and Designing Studies.

**Probability:** In this course the unit is a classical treatment of probability and includes basic probability principles, finding probability of events, conditional probability, discrete random variables (including the Binomial distribution) and continuous random variables (with emphasis on the normal distribution). The probability unit culminates in a discussion of sampling distributions that is grounded in simulation. For a streamlined version of probability that forgoes the classical treatment of probability in favor of an empirical approach using relative frequency, course participants may see the OLI Statistical Reasoning course.

**Inference:** This unit introduces learners to the logic as well as the technical side of the main forms of inference: point estimation, interval estimation and hypothesis testing. The unit covers inferential methods for the population mean and population proportion, inferential methods for comparing the means of two groups and of more than two groups (ANOVA), the Chi-Square test for independence and linear regression. The unit reinforces the framework that the course participants were introduced to in the Exploratory Data Analysis for choosing the appropriate, in this case, inferential method in various data analysis scenarios.

Throughout the course there are many interactive elements. These include: simulations, “walk-throughs” that integrate voice and graphics to explain an example of a procedure or a difficult concept, and, most prominently, interactive activities in which course participants practice problem solving, with hints and immediate and targeted feedback.

The course is built around a series of carefully devised learning objectives that are independently assessed.

**Note about Probability and Statistics vs. Statistical Reasoning: **One of the main differences between the courses is the path through probability. **Probability and Statistics** includes the classical treatment of probability. **Statistical Reasoning** places less emphasis on probability than does the Probability and Statistics course and takes an empirical approach.

Knowledge of basic algebra.

This course is designed to be equivalent to one semester of a college statistics course.

No.

No.

Date:

Tuesday, January 12, 2016

Course topic:

This is an introductory-level course in supervised learning, with a focus on regression and classification methods. The syllabus includes: linear and polynomial regression, logistic regression and linear discriminant analysis; cross-validation and the bootstrap, model selection and regularization methods (ridge and lasso); nonlinear models, splines and generalized additive models; tree-based methods, random forests and boosting; support-vector machines. Some unsupervised learning methods are discussed: principal components and clustering (k-means and hierarchical).

This is not a math-heavy class, so we try and describe the methods without heavy reliance on formulas and complex mathematics. We focus on what we consider to be the important elements of modern data analysis. Computing is done in R. There are lectures devoted to R, giving tutorials from the ground up, and progressing with more detailed sessions that implement the techniques in each chapter.

The lectures cover all the material in An Introduction to Statistical Learning, with Applications in R by James, Witten, Hastie and Tibshirani (Springer, 2013). The pdf for this book is available for free on the book website.

First courses in statistics, linear algebra, and computing.

Trevor Hastie is the John A Overdeck Professor of Statistics at Stanford University. Hastie is known for his research in applied statistics, particularly in the fields of data mining, bioinformatics and machine learning. He has published four books and over 180 research articles in these areas. Prior to joining Stanford University in 1994, Hastie worked at AT&T Bell Laboratories for 9 years, where he helped develop the statistical modeling environment popular in the R computing system. He received his B.S. in statistics from Rhodes University in 1976, his M.S. from the University of Cape Town in 1979, and his Ph.D from Stanford in 1984. Professor Hastie is an elected fellow of the Institute of Mathematical Statistics, the American Statistical Association, the International Statistics Institute, the South African Statistical Association and the Royal Statistical Society. He has received a number of awards and honors, including the Myrto Lefkopolous award from Harvard in 1994, the Parzen Prize for Innovation in 2014, and the Distnguished Rhodes University Alumni award in 2015.

Robert Tibshirani is a Professor in the Departments Health Research and Policy and Statistics at Stanford University. In his work he has made important contributions to the analysis of complex datasets, most recently in genomics and proteomics. His most well-known contribution is the Lasso, which uses L1 penalization in regression and related problems. He has co-authored over 200 papers and three books. Professor Tibshirani co-authored the first study that linked cell phone usage with car accidents, a widely cited article that has played a role in the introduction of legislation that restricts the use of phones while driving. He is one of the most widely cited authors in the entire mathematical sciences field. Professor Tibshirani is a Fellow of the American Statistical Association, the Institute of Mathematical Statistics and the Royal Society of Canada. He won the prestigious COPSS Presidents's award in 1996, the NSERC Steacie award in 1997 and was elected to the National Academy of Sciences in 2012.

Will Fithian and Sam Gross produced and formatted the quiz questions and review questions. Daniela Witten helped present some of the material in Chapter 5. Wes Choy managed the video production. Greg Maximov filmed and edited most of the course videos, as well as the interviews and group recordings. Greg Bruhns, Monica Diaz and Marc Sanders assisted with Open edX.

No, a free online version of An Introduction to Statistical Learning, with Applications in R by James, Witten, Hastie and Tibshirani (Springer, 2013) is available from that website. Springer has agreed to this, so no need to worry about copyright. Of course you may not distribiute printed versions of this pdf file.

Yes. You get R for free from http://cran.us.r-project.org/. Typically it installs with a click. You get RStudio from http://www.rstudio.com/, also for free, and a similarly easy install.

We anticipate it will take approximately 3-5 hours per week to go through the materials and exercises.

Yes, if you complete the course, and achieve a passing grade of 50% on the quizzes. If you get 90% or higher, your statement will be "with distinction".

Date:

Monday, January 4, 2016 to Wednesday, March 16, 2016

Cryptography is an indispensable tool for protecting information in computer systems. This introduction to the basic theory and practice of cryptographic techniques used in computer security will explore the inner workings of cryptographic primitives and how to use them correctly.

- Encryption (single and double key)
- Pseudo-random bit generation
- Authentication
- Electronic commerce (anonymous cash, micropayments)
- Key management, PKI, zero-knowledge protocols

There will be three written homework assignments and two programming projects. Final placement in the class will be determined by the following formula:

0.35 H + 0.35 P + 0.3 F

where:

- H is your average score on the four written homework assignments.
- P is the weighted average grade on the two programming projects.
- F is your final exam score.

- Dan Boneh
*Professor*,*Computer Science and Electrical Engineering*

3.0

The course is self-contained, however a basic understanding of probability theory and modular arithmetic will be helpful. The course is intended for advanced undergraduates and masters students.

For course tuition, reduced tuition (SCPD member companies and United States Armed forces), and fees, please click Tuition & Fees.

Date:

Monday, January 4, 2016 to Friday, March 18, 2016

Course topic:

Examine the application of probability in the computer science field and how it is used in the analysis of algorithms. Learn how probability theory has become a powerful computing tool and what current trends are causing the need for probabilistic analysis. Acquire an important understanding about randomness and its influence on the computing decisions made every day.

- Counting and combinatorics
- Conditional probability
- Distributions
- Point estimation
- Limit theorems

- Mehran Sahami
*Associate Professor*,*Computer Science*

3.0 - 5.0

Students enrolling under the non degree option are required to take the course for 5.0 units.

- Problem Sets- 45%
- Midterm- 20%
- Final- 35%

Mathematical Foundations of Computing (Stanford Course: CS103), and Programming Abstractions (Stanford Course:CS106B) or Accelerated Programming Abstractions (Stanford Course:CS106X), and Linear Algebra and Differential Calculus (Stanford Course: MATH51) or equivalent.

For course tuition, reduced tuition (SCPD member companies and United States Armed forces), and fees, please click Tuition & Fees.