Matteo Riondato

Contact info

COSC-254 Data Mining (Spring 2019)

Course info

Times & Location: MW 2—3.20pm, Science Center E110

Website: http://rionda.to/courses/cosc-254-s19/, Moodle for assignments and forum

Prerequisites: COSC-211 Data Structures

Instructor: Matteo Riondato (he/his, please call me "Matteo")
Contact: mriondato@amherst.edu (please use [COSC254] in front of your subject. Only for confidential messages that cannot go to the forum.)
Office Hours: M 3.30—5.30pm, Science Center C214. Please reserve a 15-minutes slot by the day before (Sunday) at 4pm.

TA: Alexander Einarsson
Office Hours: Th 3—5.00pm, Science Center E210.

Description

This course is an introduction to data mining, the area of computer science that deals with the development of efficient algorithms for extracting information from data. We will:
  • talk about the key tasks in the analysis of transactional datasets, time series, and graphs, and the most efficient algorithms to solve them;
  • learn about parallel/distributed systems to perform the analysis of massive datasets;
  • use interactive notebooks and large-scale systems to evaluate algorithms and analyze data.

Syllabus

Most of the information you need is available in the syllabus. For anything else, please ask on the Moodle forum or, if it is confidential, email Matteo (please use [COSC254] in front of your subject).

Schedule & Diary

For the past dates, the listed topics are the topics covered on those dates. For future dates, they are the planned topics, and subject to change. For the readings, MMD denotes the Mining of Massive Datasets book, and DMT denotes Data Mining — The Textbook.

  • Lecture of 2/20: Data Streams (1st part) Slides
  • Lecture of 2/18: Eclat algorithm (Slides), Compressing Patterns (Slides). Readings: N/A
  • Due to the network outage, both HW02 and HW03 are due on Wed 2/20 at 2pm.
  • Lecture of 2/13: Association Rules, Apriori algorithm Slides. Readings: MMD 6.2.5, DMT 4.4.1, 4.4.2
    HW03 is out! Due 2/20 at 1.59pm.
  • Lecture of 2/11: Intro to Association Rules Slides. Readings: MMD 6.1.3, DMT: 4.3
  • Lecture of 2/6: Communication costs, Intro to Pattern Mining Slides. Readings: MMD 2.5, 6.1.1, 6.1.2, 6.2.1, 6.2.3, 6.2.4; DMT 4.1, 4.2
    HW02 is out! Due 2/13 at 1.59pm.
  • Lecture of 2/4: Matrix-by-Vector Multiplication in Hadoop. Readings: MMD 2.3
  • Lecture of 1/30: MapReduce & Hadoop Slides. Readings: MMD 2.1, 2.2.
    HW01 is out! Due 2/6 at 1.59pm.
  • Lecture of 1/28: What is Data Mining? Slides. Readings: MMD Ch.1, DMT Ch. 1.
    HW00 is out! Due 1/30 at 1.59pm.

Future classes

  • Week of 2/25: More on Data Streams
  • Week of 3/4: Triangle Counting
  • Week of 3/11: Spring break
  • Week of 3/18: Outlier Detection
  • Week of 3/25: Link Analysis
  • Week of 4/1: Centrality Measures
  • Week of 4/8: Community detection
  • Week of 4/15: Clustering
  • Week of 4/22: Hypothesis testing
  • Week of 4/29: TBA