STA 141A Fundamentals of Statistical Data Science

Units: 4

Format:
Lecture: 3 hours
Discussion: 1 hour

Catalog Description:
Introduction to computing for data analysis and visualization, and simulation, using a high-level language (e.g., R). Computational reasoning, computationally intensive statistical methods, reading tabular and non-standard data

Prerequisite: Course 10 or course 13 or course 32 or course 100; course 108 or course 106

Goals:
Students become proficient in data manipulation and exploratory data analysis, and finding and conveying features of interest. They learn to map mathematical descriptions of statistical procedures to code, decompose a problem into sub-tasks, and to create reusable functions. They develop ability to transform complex data as text into data structures amenable to analysis. They learn how and why to simulate random processes, and are introduced to statistical methods they do not see in other courses.

Summary of course contents:
This course provides an introduction to statistical computing and data manipulation. It enables students, often with little or no background in computer programming, to work with raw data and introduces them to computational reasoning and problem solving for data analysis and statistics. The high-level themes and topics include doing exploratory data analysis, visualizing data graphically, reading and transforming data in complex formats, performing simulations, which are all essential skills for students working with data. This course provides the foundations and practical skills for other statistical methods courses that make use of computing, and also subsequent statistical computing courses. Additionally, some statistical methods not taught in other courses are introduced in this course. The course will teach students to be able to map an overall statistical task into computer code and be able to conduct basic data analyses.

Restrictions:
Not open for credit to students who have taken course 141 or course 242.

Illustrative reading:

  • R in a Nutshell, Adler.
  • The Art of R Programming, Matloff. R Graphics, Murrell.
  • R Graphics Cookbook, Chang.
  • ggplot2: Elegant Graphics for Data Analysis, Wickham

GE3:
None

Potential Overlap:
This course overlaps significantly with the existing course 141 course which this course will replace. Course 242 is a more advanced statistical computing course that covers more material. ECS145 involves R programming. However, the focus of that course is very different, focusing on more fundamental computer science tasks and also comparing high-level scripting languages. R is used in many courses across campus. This course teaches the fundamentals of R and in more depth that is intentionally not done in these other courses. Furthermore, the combination of topics covered in this course (computational fundamentals, exploratory data analysis and visualization, and simulation) is unique to this course.

History:
First offered Fall 2016.  Replacement for course STA 141.