Skip to content Skip to navigation

Statistics and Data Science

Monday, January 4, 2016 to Wednesday, March 16, 2016
Go to Course

Now Open! (Fee Applies.)


New techniques have emerged for both predictive and descriptive learning that help us make sense of vast and complex data sets. The particular focus of this course will be on regression and classification methods as tools for facilitating machine learning. In-class problem solving and discussion sessions will be used and computing will be done in R.


Topics Include

  • Introduction to supervised learning
  • Resampling, cross-validation and the bootstrap
  • Model selection and regularization methods
  • Tree-based methods, random forests and boosting
  • Support-vector machines
  • Nonlinear methods and generalized additive models
  • Principal components and clustering


First courses in statistics and/or probability, linear algebra, and computer programming.

View All Courses

Access learning material from upcoming, self-study, and completed courses...

Friday, September 18, 2015 to Saturday, November 21, 2015
Go to Course

About the Course

Social networks pervade our social and economic lives.   They play a central role in the transmission of information about job opportunities and are critical to the trade of many goods and services. They are important in determining which products we buy, which languages we speak, how we vote, as well as whether or not we decide to become criminals, how much education we obtain, and our likelihood of succeeding professionally.   The countless ways in which network structures affect our well-being make it critical to understand how social network structures impact behavior, which network structures are likely to emerge in a society, and why we organize ourselves as we do.  This course provides an overview and synthesis of research on social and economic networks, drawing on studies by sociologists, economists, computer scientists, physicists, and mathematicians.
The course begins with some empirical background on social and economic networks, and an overview of concepts used to describe and measure networks.   Next, we will cover a set of models of how networks form, including random network models as well as strategic formation models, and some hybrids.   We will then discuss a series of models of how networks impact behavior, including contagion, diffusion, learning, and peer influences.

Course Syllabus

  • Week 1: Introduction, Empirical Background and Definitions
Examples of Social Networks and their Impact, Definitions, Measures and Properties: Degrees, Diameters, Small Worlds, Weak and Strong Ties, Degree Distributions

  • Week 2: Background, Definitions, and Measures Continued
Homophily, Dynamics,  Centrality Measures: Degree, Betweenness, Closeness, Eigenvector, and Katz-Bonacich. Erdos and Renyi Random Networks: Thresholds and Phase Transitions,

  • Week 3: Random Networks 
Poisson Random Networks, Exponential Random Graph Models, Growing Random Networks, Preferential Attachment and Power Laws, Hybrid models of Network Formation

  • Week 4:   Strategic Network Formation 
Game Theoretic Modeling of Network Formation, The Connections Model, The Conflict between Incentives and Efficiency, Dynamics, Directed Networks, Hybrid Models of Choice and Chance
  • Week 5:  Diffusion on Networks. 
Empirical Background, The Bass Model, Random Network Models of Contagion, The SIS model, Fitting a Simulated Model to Data

  • Week 6:  Learning on Networks. 
Bayesian Learning on Networks, The DeGroot Model of Learning on a Network, Convergence of Beliefs, The Wisdom of Crowds, How Influence depends on Network Position.

  • Week 7: Games on Networks. 
Network Games, Peer Influences:  Strategic Complements and Substitutes, the Relation between Network Structure and Behavior, A Linear Quadratic Game, Repeated Interactions and Network Structures.

Recommended Background

The course has some basic prerequisites in mathematics and statistics.  For example, it will be assumed that students are comfortable with basic concepts from linear algebra (e.g., matrix multiplication), probability theory (e.g., probability distributions, expected values, Bayes' rule), and statistics (e.g., hypothesis testing), and some light calculus (e.g., differentiation and integration).  Beyond those concepts, the course will be self-contained.

Suggested Readings

The course is self-contained, so that all the definitions and concepts you need to solve the problem sets and final are contained in the video lectures.  Much of the material for the course is covered in a text: Matthew O. Jackson  Social and Economic Networks, Princeton University Press (Here are Princeton University Press and Amazon  pages for the book).  The text is optional and not required for the course.  Additional background readings, including research articles and several surveys on some of the topics covered in the course can be found on my web page.

Course Format

The course will run for seven weeks, plus two for the final exam.  Each week there will be video lectures available, as well as a standalone problem set and some occasional data exercises, and there will be a final exam at the end of the course for those who wish to earn a course certificate.  


Will I get a Statement of Accomplishment after completing this class?

Yes. Students who successfully complete the class (above 70 percent correct on the problem sets and final exam) will receive a Statement of Accomplishment signed by the instructor - and those earning above 90 percent credit on the problem sets and final will earn one with distinction.

View All Courses

Access learning material from upcoming, self-study, and completed courses...

Monday, June 22, 2015 to Friday, August 28, 2015
Go to Course

This course is offered through Stanford Continuing Studies.

Course Description

More and more people are starting to tap into the barely touched opportunities of data. Supporting marketing campaigns with more market data, understanding and preventing product failures with real-time measures, retaining customers with detailed behavior monitoring, or fighting fraud with real-time analysis of hundreds of millions of transactions are among the many examples that demonstrate how pervasive data has become across all lines of business. After years of buzz and mixed results, data technology, management techniques, and processes have gained maturity. Data is now more readily accessible to everyone. In this online course, students will learn how to engage with data and discover concrete and actionable business intelligence techniques to gain immediate control of data and deliver accurate insights, manage change to drive project acceptance, and design lean and sustainable processes. The course will also include detailed case studies and feature expert guest speakers to provide invaluable and fascinating field experience. 

Application and fee apply.

Big Data Continuing Studies

View All Courses

Access learning material from upcoming, self-study, and completed courses...

Monday, April 18, 2016
Go to Course

Accepting Applications 

November 25, 2015 – April 11, 2016 

Course Starts Online: 

April 18, 2016 

Come to Stanford: 

May 31-June 3, 2016 

Fee and Application. 

This course is offered through Worldview Stanford. Worldview Stanford is an innovative Stanford University initiative that creates interdisciplinary learning experiences for professionals to prepare them for the strategic challenges ahead. 


What's driving big data? We increasingly live our social, economic, and intellectual lives in the digital realm, enabled by new tools and technologies. These activities generate massive data sets, which in turn refine the tools. How will this co-evolution of technology and data reshape society more broadly? 

Creating new knowledge and value: Big data changes what can be known about the world, transforming science, industries, and culture. It reveals solutions to social problems and allows products and services to be even more targeted. Where will big data create the greatest sources of new understanding and value? 

Shifting power, security, and privacy: The promise of big data is accompanied by perils—in terms of control, privacy, security, reputation, and social and economic disruption. How will we manage these tradeoffs individually and in business, government, and civil society? 


Learn from a variety of sources and Stanford experts, including: 

Lucy Bernholz, philanthropy, technology, and policy scholar at the Center on Philanthropy and Civil Society 

Sharad Goel, computational scientist studying politics, media, and social networks 

Margaret Levi, political scientist specializing in governance, trust, and legitimacy 

Jennifer Granick, attorney and director of Civil Liberties at the Stanford Center for Internet and Society 

Michal Kosinski, psychologist and computational scientist studying online and organizational behavior at Stanford Graduate School of Business 

Margaret Levi, political scientist specializing in governance, trust, and legitimacy 

John Mitchell, computer scientist, cybersecurity expert, and Vice Provost of Teaching and Learning


Big Data

View All Courses

Access learning material from upcoming, self-study, and completed courses...

Tuesday, January 21, 2014
Go to Course

This is an introductory-level course in supervised learning, with a focus on regression and classification methods. The syllabus includes: linear and polynomial regression, logistic regression and linear discriminant analysis; cross-validation and the bootstrap, model selection and regularization methods (ridge and lasso); nonlinear models, splines and generalized additive models; tree-based methods, random forests and boosting; support-vector machines. Some unsupervised learning methods are discussed: principal components and clustering (k-means and hierarchical).

This is not a math-heavy class, so we try and describe the methods without heavy reliance on formulas and complex mathematics. We focus on what we consider to be the important elements of modern data analysis. Computing is done in R. There are lectures devoted to R, giving tutorials from the ground up, and progressing with more detailed sessions that implement the techniques in each chapter.

The lectures cover all the material in An Introduction to Statistical Learning, with Applications in R by James, Witten, Hastie and Tibshirani (Springer, 2013). As of January 5, 2014, the pdf for this book will be available for free, with the consent of the publisher, on the book website.

Trevor Hastie
Rob Tibshirani
Statistical Learning image

View All Courses

Access learning material from upcoming, self-study, and completed courses...


Subscribe to RSS - Statistics and Data Science