Master of Science in Management & Systems
Data Warehousing and Data Mining

MASY1-GC3510

Professor:Sam Sultan [sam.sultan@nyu.edu]
Class website: [oit2.sps.nyu.edu/~sultans/dwdm] (or) [samsultan.com/dwdm]
Course Days: Tuesdays & Thursdays
Course Hours: 6:00pm - 9:00pm

Announcement(s):
10/2/2018 - No class on Tues 10/9. Last class (final exam) is on Tues 10/16.
10/2/2018 - Final project is due midnight Thursday 10/18 (not Saturday 10/20 as stated in class)

+ syllabus
+ outline
+ books
+ grades
+ final project
+ student listing
+ examples & demos
+ homework submission
+ student feedback
+ student evaluation & comments

Session - 1   2   3   4   5   6   7   8a   8b   9   10   11   12  
                1-sql   3-design   5-join   7-aggr   8-DDL   9-DML   X-func  

Search -
ITS - Data Warehousing - Data Mining - Create DB Insert© - DataMining*Tools© - SQL*Tester©- SQL*Chart©
Site Helpful?

COURSE DESCRIPTION:

The course addresses the concepts, skills, methodologies, and models of data warehousing. The course addresses proper techniques for designing data warehouses for various business domains, and covers concpets for potential uses of the data warehouse and other data repositories in mining opportunities.


COURSE LEARNING GOALS:

1. Course Objectives:

In today's organization, the data warehouse is the center of the information systems' knowledge repository. Data warehousing supports informational processing by providing a solid platform of integrated, historical data from which to perform enterprise-wide data analysis. This helps improve profit and guide strategic decision making

Data mining is a recent advancement in data analysis. Data mining exploits the knowledge that is held in enterprise data warehouses and other data stores by examining the data to reveal untapped patterns that suggest better ways to improve quality of product, customer satisfaction and retention, and profit potentials

This course will cover the concepts and methodologies of both data warehousing and data mining.

       The focus of the course will be on the following topics:

2. Student Learning Outcomes:


COURSE REQUIREMENTS AND POLICIES:

See [Requirements and Policies]


BOOKS:

Required Reading & Materials -

Recommended Reading & Materials -

GRADE ASSIGNMENT AND EVALUATION

Contributing factors for determining your course grade include:

Details of Assignment and Evaluation. Grades are FINAL
Please do not negotiate for a better grade. Professor will not provide any "make-up" or "extra credit assignment" to make up for a low grade. If you are expecting to receive a grade of an "A" at the end of the semester, then I expect you to study hard, to attend all sessions (unless you previously notify me), to participate in all classes, to turn in your homework on time, and to keep up with the class reading material. If you see yourself falling behind do not hesitate to ask for help. This will ensure that you stay current with the class, and will ensure that you get a good grade on your work.


NYU SPS Grading Scale http://sps.nyu.edu/academics/academic-policies-and-procedures/graduate-academic-policies-and-procedures.html#Grades

Please Note: The Office of the University Registrar maintains individual records of students enrolled in NYU and is the only department authorized to record an official grade. Final grades are reported on NYU-Albert.
For more information: http://www.sps.nyu.edu/academics/academic-policies-and-procedures/graduate-academic-policies-and-procedures.html

To Receive Your Final Grade at the end of the semester, follow these steps:

  1. Log into Albert using your net id, at: https://admin.portal.nyu.edu/psp/paprod/EMPLOYEE/EMPL/h/?tab
  2. Click on "Student Center"
  3. Within your student center, in the "academics" section click on the dropdown: "other academic"
  4. From the dropdown select "grades"
  5. For complete instructions click http://www.sps.nyu.edu/academics/noncredit-offerings/academic-noncredit-policies-and-procedures.html#Obtaining_Grades


COURSE OUTLINE:

DATE SESSION TOPIC[s] COVERED
 
[Week 1] 1
  • Introduction to Data Warehousing
  • Relationship of Data Mining and Data Warehousing
  • What is a Data Warehouse?
  • Data Warehousing ROI
  • DSS - Decision Support Systems
  • Operational vs. Analytical Systems
  • Evolution of DSS and Data Warehousing
  • OLTP - Online Transaction Processing
  • Characteristics of a Data Warehouse
  • What is a Data Mart? Creating a Data Mart
  • Data Comparison Chart
  • OLAP - Online Analytical Processing
  • Reading: Chapter 1 (both DW Toolkit, and Building the DW),
    Skim thru Glossary (DW Lifecycle Toolkit)
     
    [Week 1] 2
  • Planning & Building the Data Warehouse
  • Sponsorship and Cost Justification
  • Project Prerequisites
  • Barriers, Challenges and Risks
  • Preparing for Implementation
  • Developing the Data Warehouse
  • SDLC Methodologies - Waterfall vs. RUP Approach
  • Planning & Project Management
  • Analysis
  • Logical & Physical Design
  • Implementation and Deployment
  • Operations
  • Reading: Chapter 1, 2 (The Data Warehouse Lifecycle Toolkit)
     
    [Week 2] 3
  • Data Warehouse Design
  • Drivers for Multi-Demensional Analysis
  • Limitations of Relational Models
  • The Data Cube
  • What is dimensional modeling?
  • Advantages of Dimensional Models
  • Logical and Physical Design
  • Data Normalization
  • Benefits and Drawbacks of Data Normalization
  • De-Normalizing of Data
  • Characteristics of a Data Warehouse
  • Subject Oriented, Integrated, Time Variant, Non-Volatile
  • The Star Schema
  • Reading: Chapter 6 (The Data Warehouse Lifecycle Toolkit)
     
    [Week 2] 4
  • Data Warehouse Schemas
  • Dimensions and Dimension Tables
  • Facts and Fact Tables
  • The Star Schema
  • The Snowflake Schema
  • Degenerate and Junk Dimensions
  • The Data Warehouse Bus Architecture
  • Conformed Dimensions and Standard Facts
  • Data Granularity
  • Changing Dimensions
  • Reading: Chapter 6 (The Data Warehouse Lifecycle Toolkit)
     
    [Week 3] 5
  • Components of a Data Warehouse
  • Source Systems, Staging Area, Presentation, Access Tools
  • Building the Data Matrix
  • The Four Steps Process
  • Multiple Fact Tables in a single Data Mart
  • Chain, Heterogeneous, Transaction/Snapshot & Aggregate Facts
  • Fact and Dimension Table Detail
  • Identifying Source for each Fact & Dimension
  • Mapping from Source to Target
  • Reading: Chapter 7, 4 (The Data Warehouse Lifecycle Toolkit)
     
    [Week 3] 6
  • The ETL Process
  • Extracting the Data into the Staging Area
  • The Challenge of Extracting from Disparate Platforms
  • Full vs. Incremental Extracts
  • Detecting Changes to Data
  • Transforming the Data
  • Complexity of Data Integration
  • Dealing with Missing & Dirty Data
  • Data Transformation Tasks
  • Loading the Data
  • Timing and Job Control of Data Loads
  • Reading: Chapter 9 (The Data Warehouse Lifecycle Toolkit)
     
    [Week 4] 7
  • Midterm Exam

  • Aggregating Data
  • Goals and Risks of Data Aggregation
  • Deciding What to Aggregate
  • Data Sparsity
  • Design Requirement for Aggregates
  • The problem with Aggregates
  • Aggregate Navigators
  • Reading: Chapter 8 p353-357(The Data Warehouse Lifecycle Toolkit)
     
    [Week 4] 8a
  • Selecting the Business Subject
  • Declaring the Grain
  • Choosing the Dimension
  • Identify the Fact
  • Avoiding Null Keys
  • Retail Market Basket Analysis
  • Additive and Semi-Additive Facts
  • The Value Chain Integrated Inventory Model
  • Order Management Data Marts
  • Date and Other Dimension Role Playing
  • Allocation to Lower Level Facts
  • Profit and Loss Data Marts
  • Reading: Chapter 2, 3, 5 (The Data Warehouse Toolkit)
     
      8b
  • CRM Overview
  • Customer Dimension
  • Demographic Dimension Outriggers
  • Date Dimension Outriggers
  • Large Changing Customer Dimension
  • Mini-Dimensions
  • Commercial Customer Hierarchies
  • Fixed vs. Variable Level Hierarchies
  • General Ledger Accounting
  • OLAP role in G/L and Chart of Accounts
  • Time Stamped Employee Dimensions
  • Reading: Chapter 6, 7, 8 (The Data Warehouse Toolkit)
     
    [Week 5] 9
  • Clickstream Data Warehouses & Analytics
  • Overview of Web Based Interaction
  • Challenges of Tracking Data
  • Creating Persistent State on the Web
  • Techniques for Tracking States
  • Working with Cookies
  • User Registration
  • Web Server Log Files
  • Online Advertising
  • Online Page Tracking and Analytics
  • User Dimension and Page Hits Facts
  • Reading: Chapter 15 (The Data Warehouse Toolkit)
     
    [Week 5] 10
  • Data Mining
  • What is Data Mining Good For?
  • Statistics, Artificial Intelligence & Machine Learning
  • Data Mining Examples and Tools
  • Connection between Data Mining and Data Warehousing
  • Retrospective Reporting vs. Predictive
  • Data Mining Applications
  • Data Mining vs. Statistics vs. OLAP
  • Data Mining Statistical Techniques (Sampling, Regression & Decision Trees)
  • Clustering, Segmentation and Nearest Neighbor Techniques
  • Keys to commercial success of Data Mining
  • Reading: Online
     
    [Week 6] 11
  • Data Mining Techniques
  • Hands-on Presentation and Lab
  • Classification, Regression, Similarity Matching, Co-occurence Grouping
  • Predictive Modeling
  • Clustering/Segmentation
  • Data Mining and Statistics Terminologies
  • Supervised vs. Unsupervised
  • Tree Induction
  • Entropy and Information Gain
  • Reading: Online
     
    [Week 6] 12
  • Final Exam
  • Final Project Due


  • All contents © Sam Sultan.
    NYU SPS Master's Degree Program web site
    For more information, send e-mail to: sam.sultan@nyu.edu