[Data Mining]

ENCE 688P: Data Mining and Machine Learning for the Built Environment


Mark Austin,
Department of Civil and Environmental Engineering,
University of Maryland, College Park.

CLASS

Notes from Class: [ Spring 2022 ]

PROJECTS

Project Abstracts: [ Fall 2020 ] [ Fall 2021 ] [ Spring 2022 ]

GOALS

This course is a hands-on introduction to tools and techniques for data mining and machine learning analysis of data from the built environment. Students will learn how to write and compile simple software programs in Python and Java (no prior experience required), and then work step-by-step toward the design of software solutions using objects, data structures and software patterns. Students will be introduced to a spectrum of software tools, basic methods, and algorithms for data mining (e.g., Weka) and machine learning analysis of data (e.g., DL4J, TensorFlow, Keras). The semester will conclude with completion and presentation of a project.


SPRING SEMESTER, 2022

The topics will be as follows:

Part 1: Data and Information Management in the Built Environment (2 weeks)

  • Learning about the Built Environment
    Topic: Modern Infrastructure Systems and Near-Term Challenges.
    Topic: Engineering Sensor Systems.
    Topic: Large-scale Urban and Global Sensing.
  • Opportunities for AI, Data Mining and Machine Learning
    Topic: AI in the 1980s, 1990s, 2000s, and post 2010 era.
    Topic: Recent Advances in Data Mining/AI/Machine Learning.
    Topic: Cyber-Physical and Digital Twin Systems.

Part 2: Object-Oriented Software Development (3 weeks)

  • Introduction to Engineering Software Development
    Topic: Evolution of computer languages over the past 50 years?
    Topic: Low- and high-level languages
    Topic: Scripting languages versus compiled languages
  • Getting Started with Python
    Topic: Programming (data types, expressions, assignments, branching constructs, loops),
    Topic: Arrays and dictionaries
    Topic: Data visualization (charts)
    Topic: Python Productivity Tools: Google Colab, Apache Zeppelin
  • Getting Started with Java
    Topic: Writing and Compiling a Simple Python/Java Program,
    Topic: Programming (data types, expressions, assignments, branching constructs, loops, methods),
    Topic: Single- and multi-dimensional arrays,
    Topic: Java Productivity Tools: Ant
  • Object-based Modeling with Python and Java
    Topic: Classes, objects, association and inheritance relationships.
    Topic: Abstract classes and interfaces.
    Topic: Guidelines for class and package design.

Part 3: Collections, Basic Data Structures + Algorithms (2 weeks)

  • Java Collections Framework
    Topic: Collection, Set and Map Interfaces
    Topic: Working with Arraylists, Sets and HashMaps.
    Topic: Basic Algorithms for Searching and Sorting
  • Python Collections Module
    Topic: Basic Collections: lists, sets, tuples, dictionaries.
    Topic: Working with lists, sets, tuples and dictionaries.

Part 4: Hands-On Data Mining and Machine Learning (5 weeks)

  • Working with Real-World Datasets
    Topic: Urban, Government and Geographic Data Portals
    Topic: Data Analysis with Pandas
  • Data Mining Tools and Techniques
    Topic: Basic Methods (Classification, Association, Clustering).
    Topic: Synthesis of Decision Trees and Rules
    Topic: Data Mining with Weka (Applications)
  • Introduction to Machine Learning Techniques and Tools
    Topic: Machine Learning Capabilities
    Topic: Taxonomy of Machine Learning Problems
    Topic: Types of Machine Learning Systems
    Topic: Urban Applications of Machine Learning
  • Introduction to Neural Networks
    Topic: Perceptron Models (Building Block of Machine Learning)
    Topic: Activation Functions, Loss Functions, Metrics of Evaluation
    Topic: Training Neural Networks (Backpropogation and Optimization Algorithms)
  • Multilayer Neural Networks
    Topic: Multilayer Network Architectures and Capabilities
    Topic: Neural Networks with One Hidden Layer
    Topic: Neural Networks with Two Hidden Layers
  • Hands-On Machine Learning
    Topic: Software Setup (TensorFlow, Keras, Jupyter, Anaconda, Deep Learning 4 Java)
    Topic: Tensors and TensorFlow Graphs
    Topic: Working with TensorFlow and Keras
    Topic: Working with DL4J (Deep Learning 4 Java)

Part 5: Additional Topics (Class Interest and Time Permitting)

  • Software Design Patterns
    Topic: Design Patterns in Architecture and City Planning.
    Topic: Software Design Patterns (e.g, builder, composite hierarchy, visitor, model-view-controller).
  • Advanced Machine Machine Learning
    Topic: AutoEncoder Neural Networks (new for 2021/2022)
    Topic: Recurrent Neural Networks (new for 2021/2022)

Students will complete individual homework assignments, and work in small teams on a data mining/machine learning project.

ENCE 688P ONLINE

Here's what online means:

  • There with no in-person contact.

  • For each lecture I will post the "lecture content" (pdf) and a "recorded video" (zoom video) to the notes from class page.

  • I will also post handouts and links to interesting web sites on notes from class .

  • Online office hours (zoom) where we can review the material and answer your questions.

  • E-mail submission of homework and project work.
    Please see the detailed instructions (for naming of files) below.


Office Hours / Synchronous Class Sessions

  • Mark Austin . Office hours and synchronous class sessions will be Monday and Wednesday at 5-6 pm.
    Join Zoom Meeting: https://umd.zoom.us/j/6517468335

    Even if you just want to drop-in to catch up, that'll be fine too!
    If 5-6 pm doesn't work for you, send an e-mail (austin "at" umd.edu) and we will work something out.


Submission of Homework and Project Work

  • Homework will be posted on the notes from class web page.
    Please submit your homework as a zip file and send either as an attachment to an e-mail or via Dropbox.
    Also, please indicate in your e-mail subject heading the class and purpose of the e-mail, e.g,,
        ENCE688P: Homework 2 ...
    


Class Text and Resources

  • Text not required, but there will be lots of class handouts on data mining and machine learning.
  • There is a great data mining text:

    • Witten I.H., Frank E., Hall M.A., and Pal C.J., Data Mining: Practical Machine Learning Tools and Techniques,
      Fourth Edition, Morgan Kaufman, 2017.

    which you might consider getting -- well worth the money!

  • Java and Python software will be distributed via Dropbox.


Course Assessment and Exam Schedule

The course will be assessment will be as follows:

  • Homework (30%).
  • Mid-semester project proposal and presentation (20%).
  • End-of-semester project/report in data mining/machine learning (50%).

Note.

  • No mid-term exams. No final exam. Let's focus on making fabulous projects!
  • Accommodation for students with disabilities will be made.
  • At the end of the semester, please participate in the evaluation of courses through CourseEvalUM.
    Your feedback is confidential and an important means of improving the course in future semesters.

Download Python and Java

  • Download Python 3.X . It seems that Apple ships its Macs with Python 2.X pre-installed.
    But for the purposes of this class I am going to assume you have Python 3.7 (or Python 3.8) installed.
    This detail matters because the language is not backwards compatible (Strike 1 against Python).
  • Download Java . I have Java 1.11 on my laptop, but you are
    certainly welcome to download a more recent version.
  • Download Apache Ant . Apache Ant is a Java Library that
    manages the compilation of Java programs and execution of programs and test cases. Extremely useful.
  • Click here to download the Eclipse IDE.


  • If your computer is a Mac think about downloading Homebrew and then using brew to
    automatically download and install the Ant packages on your machine.

    Note (Oct, 2020). I just installed homebrew on my iMac (home) running Catalina (10.15) and, sadly, Apple has not
    made this process easy. Here's the bottom line to get things working:

    Homebrew uses something called the "command line tools", which is an addition to Xcode.
    So before you can install Homebrew, you need to download and install the command line tools.
    I downloaded the dmg file for the command line tools from the Apple developers web site --
    just create an account, it's free. Run the installation program and the tools will be put
    in /Library/Developers/CommandLineTools/

    Then, to install Homebrew, cut-and-paste the command listed on the Homebrew page into a terminal window
    running the bash shell. The installation only takes are few minutes and you are good to go.
    For example, to install ant, simply type: brew install ant at the prompt in a terminal window.


Python and Java Programming Resources



AI and ML Software

  • Artificial Intelligence: A Modern Approach (AIMA). Code website on Github.
  • Apache OpenNLP : A Machine Learning-based Toolkit for the processing of Natural Language Text.
    Written completely in Java. Source code: github .



Real-World Datasets


Working with Real-World Data


Data Science, Data Mining, Neural Networks





Digital Twins


Algorithms and Software for Anomaly Detection


Time Series

  • Darts: Time Series Made Easy in Python.


Big Data Algorithms + Tutorials (useful techniques that are beyond this course)


Miscellaneous Real-World Applications and Resources

Developed in August 2020 by Mark Austin
Copyright © 2020-2022, Department of Civil and Environmental Engineering, University of Maryland