Natural Language Processing and Learning-based Methods for Enhancing Automation, Alignment, and Interpretability in Model-driven Planning, Monitoring and Control of Construction Projects
Advisor: Professor Mani Golparvar-Fard
Abstract
Today, model-driven planning and control continue to face several challenges that negatively impact its widespread adoption across various types of construction projects. A key barrier lies in the high level of manual effort required during the planning phase, particularly associated with mapping elements within a building information model (BIM). This complexity arises from the varying levels of maturity among BIM models and levels of details in activities across different levels of a schedule. Another challenge is coordinating and communicating schedule activities with the trades, so they know who does what work in which location, daily. This level of day-to-day coordination and communication require superintendents to create and use short-term look-ahead plans. However, disconnections between long-term master schedules and short-term look-ahead plans complicate project coordination, and in particular monthly progress updates, and payment application reviews. Third, synchronizing planned and actual progress during progress tracking is also challenging as granularity of daily construction reports submitted by the trades often do not match rules of credits used for tracking physical and non-physical progress. While reality capture has become a common practice for progress tracking, in the absence of computer vision progress tracking, interpreting the content of the collected visual data has remained a subjective and manual process. These challenges break the feedback loop needed to use data from the field to frequently and reliably update progress through project controls and communicate the updated schedule accordingly.
To address these challenges, this PhD dissertation introduces multiple domain-specific language and learning-based methods to enhance automation, alignment, and interpretability across project planning and control workflows. First, UniformatBridge, a new transformer-based natural language processing model, automatically labels activities in a project schedule with ASTM Uniformat classification. In this model, construction sequencing tokens are presented that capture logistically-constrained predecessor and successor activities into the BERT architecture. This model unlocks automated creation of 4D BIMs and brings consistency from the semantic segmentation of reality capture data to schedule or payment application structures, with or without BIM. The experimental data is comprised of manually labeled master schedules from ten different commercial building projects, totaling 35,998 activity sequence tuples.
To further synchronize short-term planning from long-term schedules, the dissertation presents a domain-specific language model, ExpertScheduler, that automatically generates a list of potential look-ahead planning tasks directly from the master schedule activities. ExpertScheduler consists of a Mixture-of-Experts (MoE) mechanism in a Transformer encoder-decoder architecture and is integrated with Curriculum Learning to mimic human learning. Evaluated on data from four real-world projects, ExpertScheduler outperforms retrieval-augmented generation (RAG) frameworks in large language models (LLMs), achieving state-of-the-art performance among both fine-tuned and RAG-enhanced LLMs. The model supports superintendents in generating look-ahead tasks that are consistent with master schedules, while enabling them to easily make adjustments as needed. In addition, VisualSiteDiary, a Vision Transformer-based image captioning model, creates human-readable captions directly from visual logs for automated daily reporting and improved image retrieval tasks. The model incorporates pseudo-region features and diverse captioning styles, validated on a dataset of 6,406 training images with 12,812 captions and 1,409 validation images with 2,818 captions. This dataset includes many realistic, yet challenging cases commonly observed in commercial building projects.
The findings from this end-to-end project management research unlock new capabilities for learning and modeling construction planning and control knowledge, directly from project data as opposed to experience-driven heuristics. By formalizing scheduling practices, interpreting multimodal project data, and enabling alignment between planning layers, the presented methods empower more proactive forecasting, delay mitigation, and risk management throughout the project lifecycle. These contributions mark a step forward in transforming construction planning from static, expert-driven processes into dynamic, data-informed systems that can adapt and evolve with the realities of the jobsite.