618 - Data Manipulation and Analysis
This course aims to help students get started with their own data harvesting, processing, aggregation, and analysis. Data analysis is crucial to evaluating and designing solutions and applications, as well as understanding user's information needs and use. In many cases the data we need to access is distributed online among many webpages, stored in a database, or available in a large text file. Often these data (e.g. web server logs) are too large to obtain and/or process manually. Instead, we need an automated way of gathering the data, parsing it, and summarizing it, before we can do more advanced analysis. Therefore, students will learn to use Python and its modules to accomplish these tasks in a 'quick and easy' yet useful and repeatable way. Next, students will learn techniques of exploratory data analysis, using scripting, text parsing, structured query language, regular expressions, graphing, and clustering methods to explore data. R modules will be used to accomplish these tasks. Students will be able to make sense of and see patterns in otherwise intractable quantities of data.
- SI 504, Servers and the Shell
- ([Preceded or accompanied by SI 507] or SI 507 waiver or SI 508) and (SI 544 or SI 544 waiver or BIOSTAT 501 or BIOSTAT 521 or BIOSTAT 601); (C- or better) or Graduate standing in Applied Statistics, Data Science, or Engineering