Title: Achieving Robust Neuro-Symbolic Reasoning in High-Impact Domains through Knowledge Graphs and Large Language Models

Date/Time: Monday, January 22nd at 4:10 p.m. in Barnard 108

Speaker: Mayank Kejriwal

Abstract: While large language models (LLMs) like ChatGPT have ushered in a new era of opportunities and challenges and their performance has been impressive, there have been some concerns about their responsible use, especially in high-impact domains such as healthcare and crisis response. Recently, neuro-symbolic AI has emerged as a subfield of AI aiming to bridge the representational differences between neural and symbolic approaches to apply LLMs more responsibly. This talk will describe my group's research in defining and designing neuro-symbolic AI for solving complex problems. Drawing on real case studies and application areas, I will argue that a judicious combination of neural reasoning and symbolic techniques can help us to design systems that are more explainable, robust, and consequently, trustworthy.

Brief Bio: Mayank Kejriwal is a research assistant professor in the Department of Industrial & Systems Engineering at the University of Southern California, and a research team leader in the USC Information Sciences Institute (ISI). Prior to joining USC, he received his PhD in computer science from the University of Texas at Austin. He is the director of the Artificial Intelligence and Complex Systems group, and is also affiliated with the Center on Knowledge Graphs and the AI4Health initiative, at ISI. His research has been funded through multi-million dollar grants by the US Defense Advanced Research Projects Agency (DARPA), corporations and philanthropic foundations. His research has been published across almost a hundred peer-reviewed venues and featured in multiple press outlets, including The Guardian, The World Economic Forum, Popular Science, BBC, CNN Indonesia, and many others. He is the author of four books, including an MIT Press textbook on knowledge graphs that has been re-published in several languages. 


Beyond Modeling: Contextualizing Data and Improving Patient Representations in the Context of Learning Health Systems

Date/Time: Monday, January 29that 4:10 p.m. in Barnard 108

Speaker: Keith Feldman

Abstract: In line with the values of P4 (predictive, preventive, participatory, and personalized) medicine, healthcare today has continued to provide increasingly unique care for each patient. While this form individualized care has been shown to improve outcomes, there exists a fundamental conflict between completely personalized medicine and the success of machine learning and statistical tools that have excelled in extracting meaningful patterns from large repositories of data. There exists a need to reframe the expectations of computational tools from simply synthesizing increasingly large bodies of data and develop techniques that can draw insight from the increasingly diverse body of the data we collect through routine care. In this talk I illustrate how these techniques can be leveraged to improve representations of patient data and create for a more complete view of a given individual’s clinical state over time, as well as how contextualization of such information can not only aid in current clinical processes, but advance them.

Brief Bio: Dr. Feldman is a computer scientist by training who has spent his graduate, postdoctoral, and early faculty career developing a portfolio of research in the area of computational health, applying machine learning and data science techniques to problems across the healthcare domain. Tied to the potential of these techniques to capture variability between patient conditions and outcomes, his work has identified patient subtypes, evidenced-based risk measures, and treatment patterns tied to the quality and effectiveness of care. Working closely with multidisciplinary teams, this work is fundamentally motivated by the notion of augmentation, not automation. Where rather than utilizing computation to replicate healthcare decisions, he aims to augment the existing skillsets of those engaged in healthcare. Broadly seeking to determine what information, if available, would improve decisions relevant to their role. His work is funded by the NIH, AHA, Frontiers CTSI and generous philanthropic gifts.


Title: Knowledge-centric Machine Learning on Graphs

Date/Time: Friday, February 2nd at 4:10 p.m. in Roberts 209

Speaker: Yijun Tian

Abstract: Due to the surge of data and computational capacities in recent years, people in the field of artificial intelligence and machine learning (ML) focus on collecting high-quality data (i.e., data-centric) and developing complex model architectures (i.e., model-centric). However, these two paradigms come with inherent limitations, such as intensive labor demand for data annotation and specialized expertise for model refinements. Consequently, there emerges a need for a new paradigm: knowledge-centric. This paradigm seeks to leverage knowledge (important and useful information) to facilitate effective and efficient machine learning. By anchoring on the knowledge, there is a reduced reliance on massive labeled data and intricate model architectures. Graphs, one of the most common and effective data types to represent structured and relational systems, have attracted tremendous attention from academia and industry. My research focuses on developing a knowledge-centric learning framework to model graphs, with the ultimate goal of impacting various research areas and benefiting real-world applications. In this talk, I will describe how I design knowledge-centric ML algorithms to obtain and leverage valuable knowledge from multiple places, including 1) learning knowledge from data, 2) distilling knowledge from models, and 3) encoding knowledge from external sources.

Biography: Yijun Tian is a Ph.D. candidate in Computer Science and Engineering at the University of Notre Dame. His research interests lie in machine learning, data science, and network science. His research aims to empower machines with knowledge to positively influence real-world applications, health, and sciences. His work appears at venues such as AAAI, ICLR, and IJCAI, and has been recognized with oral and spotlight paper honors. 


The Language of Discovery: Listening to the Whispers of Scientific and Health Data and with Data Analytics and NLP

Date/Time: Monday, February 5that 4:10 p.m. in Barnard 108

Speaker: Prashanti Manda

Abstract: Artificial intelligence (AI), machine learning, and natural language processing (NLP) are revolutionizing healthcare research and clinical practice. In this talk, I will discuss how advanced AI algorithms, such as deep learning and unsupervised learning, can sift through massive EHR datasets, uncovering hidden patterns and correlations that traditional analysis methods miss.  Beyond EHRs, the talk will showcase the power of NLP in mining insights from the treasure trove of scientific literature. We will investigate techniques like information retrieval to automatically glean knowledge from medical publication. By bridging the gap between EHR analysis and scientific knowledge extraction, this talk will demonstrate how AI and NLP can revolutionize healthcare research and clinical practice. 

Brief Bio: In 2012, Dr. Manda earned a Ph.D. in Computer Science from Mississippi State University.  An NSF Career Award winner, she currently is an Associate Professor with the Department of Informatic and Analytics at UNC Greensboro.


Geometric Modeling and Physics-Informed Machine Learning for Computer Vision Applications

Date/Time: Monday, February 12th at 4:10 p.m. in Barnard 108

Speaker: Diego Patiño

Abstract: Our world is inherently geometric because it is composed of three-dimensional objects that exist in space and have physical dimensions. We use geometry to represent these objects' properties and relationships, such as angles, distances, and shapes. Moreover, objects (and quantities) in our world follow physics laws that determine their interaction and allow us to estimate their present and future state.Geometric computer vision and physics-informed machine learning are two powerful tools that are increasingly getting attention because of their applications in various fields of research and industry, such as medical imaging, autonomous vehicles, and 3D reconstruction. This talk discusses research examples incorporating prior knowledge about the geometrical and physical constraints inherent to the 3D world into state-of-the-art computer vision and machine learning pipelines. We will show how geometric computer vision enables the analysis and understanding of complex 3D structures and environments, while physics-informed machine learning provides insight into the underlying physical phenomena to drive the machine learning models into a better representation of complex systems.

Brief Bio: Diego Patiño is a Post-doctoral Fellow in the Department of Electrical and Computer Engineering at Drexel University, working with Professor David K. Han. Before joining Drexel, he was a Post-Doctoral Researcher within the GRASP Lab at the University of Pennsylvania, working under the supervision of Kostas Daniilidis. Diego Patiño received his B.S., M.S., and Ph.D. degrees in Computer Engineering from the National University of Colombia, in 2010, 2012, and 2020 respectively. He was a visiting researcher at the University of Wisconsin-Madison and later at the University of Pennsylvania.His research interests revolve around machine learning, physics-informed machine learning, and geometric approaches to computer vision with applications in areas such as robotics and medical imaging, among others. More specifically, his research focuses on 3D vision, symmetry detection, 3D reconstruction, graph neural networks, robot perception, and reinforcement learning applied to problems in science and engineering.


Data Imputation Framework for Time Series Data with Large Missing Data Gaps and Extreme Events

Date/Time: Friday, February 16that 4:10 p.m. in Roberts 209

Speaker: Rui Wu

Abstract: This presentation is about how to estimate missing values within time series data. This can be very challenging if a dataset has large missing data gaps and includes extreme events, i.e., rare events but can have important impacts. Missing data is a common issue with time series data across domains including environmental monitoring, structural health monitoring, bioinformatics, and other Internet of Thing (IoT) applications. Missing data gaps can occur for various reasons, such as damaged sensors, loss of power, and problems with data storage or transmission. Most existing machine learning models cannot be applied directly if historical data has missing values. To tackle the missing data issue, the data records are usually removed or estimated. However, when the missing data gap is very large (e.g., continuous 30% for a parameter), removing data records with missing values can break temporal information, and data imputation for continuous missing gaps can be very challenging. Another challenge for data imputation problems is extreme events, such as hurricanes and stock market crashes. These events do not happen very often but can have huge impacts on data patterns and increase the difficulty of missing data estimation. To address these challenges, this presentation introduces a novel data imputation framework that includes reshape and extreme event classification preprocessing steps, as well as machine learning models to learn temporal connections between observed and missing values. The experimental results demonstrate that the proposed framework outperforms cutting-edge methods in terms of accuracy. Therefore, this framework can provide a more effective solution for imputing missing data in time series datasets with large missing data gaps and extreme events.

Brief Bio: Rui Wu received a Bachelor degree in Computer Science and Technology from Jilin University, China, in 2013. He then pursued his Master and Ph.D. degrees in Computer Science and Engineering at the University of Nevada, Reno, completing them in 2015 and 2018, respectively. Currently, Rui works as an assistant professor in the Department of Computer Science at East Carolina University, collaborating with geological and hydrological scientists to protect the ecological system. His primary research interests lie in machine learning and data visualization using AR/VR devices. Dr. Wu has actively contributed to several NSF and NIH funded projects, serving as both a Principal Investigator (PI) and Co-PI. 


Harnessing Deep Neural Networks for Early Warning of Harmful Agal Blooms

Date/Time: Tuesday, February 20th at 4:10 p.m. in Roberts Hall 210

Speaker: Neda Nazemi

Abstract: The growing frequency, intensity, and complexity of climate-induced natural hazards call for innovative risk management methodologies. Advances in Artificial Intelligence (AI), especially in machine learning and deep learning, have revolutionized the analysis of big data, leading to the development of sophisticated predictive models. These models are essential for identifying the drivers of environmental challenges and delivering accurate forecasts, playing a key role in risk communication to policymakers. This facilitates the establishment of early warning systems and decision support tools, promoting proactive decision-making and the formulation of adaptive strategies to enhance community resilience against increasing threats. This presentation emphasizes the use of deep neural networks, specifically one-dimensional Convolutional Neural Networks (1D-CNNs), in addressing the challenge of harmful algal blooms (HABs) — a critical global water environmental issue. Due to the abrupt and difficult-to-control nature of HABs, traditional mechanistic and statistical models fall short in providing timely forecasts. I will explore the application of deep learning for generating accurate, potentially real-time forecasts of chlorophyll-a levels, serving as indicators of algal blooms in aquatic systems, thereby aiding in the pursuit of sustainable environmental management.

Brief Bio: Dr. Nazemi has been an Assistant Teaching and Research professor at the Gianforte School of Computing since 2022. She earned her Ph.D. in Systems and Information Engineering from the University of Virginia in 2023. Specializing in multidisciplinary research, Dr. Nazemi focuses on the application of machine learning, data analytics, and AI methods to tackle global environmental challenges. Her research includes developing AI-enhanced frameworks for modeling and managing natural disasters. These frameworks are designed to integrate multi-source environmental monitoring, AI/ML-driven early warning systems, and adaptive decision support systems. Her innovative approach leverages advanced sensing technologies, machine learning, and data analytics to improve the management and planning of environmental, infrastructure, and natural resources.


Date/Time: Monday, 26th February at 4:10 p.m. in Barnard Hall 108

Speaker: Iflaah Salmon

Abstract: I begin my talk by presenting how my industrial experience influenced the amalgam of software testing, cognitive psychology and organisational factors with its major highlights. I also present the role of machine learning in studying human emotions and personality considering its benefit for software engineering. Furthermore, I talk about certain important methodological aspects of experimentation in software engineering. I talk about my research expertise in relation to the potential collaboration and joint growth with MSU.

Brief Bio: Iflaah Salman, PhD (University of Oulu), is a Post-doctoral Researcher at the School of Engineering Science, Lappeenranta-Lahti University of Technology (LUT), Finland. Dr. Salman started her professional career in the software industry working as a software developer and a quality assurance engineer. Her research focuses on empirical software engineering, software testing, human factors (cognitive biases, emotions, personality), artificial intelligence for software engineering and organisational factors. Dr. Salman has published her work in top-tier software engineering venues like IEEE Transactions on Software Engineering and Empirical Software Engineering. She is a supporter of open data and open science.


Title: Detecting the Human Sense of Familiarity using Eye Gaze

Date/Time: Friday, March 22nd at 4:10pm in Barnard 126

Speaker: Iliana Castillon

Abstract: Understanding internal cognitive states, such as the sensation of familiarity, is crucial not only in the realm of human perception but also in enhancing interactions with artificial intelligence. One such state is the experience of familiarity, a fundamental aspect of human perception that often manifests as an intuitive recognition of faces or places. Automatically identifying cognitive experiences paves the way for more nuance in human-AI interaction. While other works have shown the feasibility of automatically identifying other internal cognitive states like mind wandering using eye gaze features, the automatic detection of familiarity remains largely unexplored. In this work, we employed a paradigm from cognitive psychology to induce feelings of familiarity. Then, we trained machine learning models to automatically detect familiarity using eye gaze measurements, both in experiments with traditional computer use (e.g., eye tracker attached to monitor) and in virtual reality settings, in a participant independent manner. Familiarity was detected with a Cohen's kappa value, a measurement of accuracy corrected for random guessing, of 0.22 and 0.21, respectively. This work showcases the feasibility of automatically identifying feelings of familiarity and opens the door to exploring automated familiarity detection in other contexts, such as students engaged with a learning task while interacting with an intelligent tutoring system.Bio: Iliana Castillon is a Research Assistant at Colorado State University in the Human-Centered AI Lab. Her research interests broadly cover machine learning and affective computing applications, with a specific focus on machine learning methods for identifying internal mental states, e.g., familiarity and mind wandering, with data modalities such as eye-gaze and computer vision. After graduating with a Computer Science Bachelor’s Degree in 2022, with a concentration in Human-Centered Computing, she successfully defended her Master Thesis in Computer Science and will officially graduate May 2024. Her Thesis, “Using Eye Gaze to Detect Familiarity” is the first to show that machine learning can be used to automatically identify feelings of familiarity. She won 3rd place in the ACM graduate research competition at TAPIA 2023, has published two papers at peer reviewed conferences, and gave a talk at the Society for Computation in Psychology 2023.


Title: New Results on the I/O Complexity for Some Numerical Linear Algebra Kernels and Their Applications

Date/Time: Monday, April 22nd at 4:10pm in Barnard 108

Speaker: Julien Langou

Abstract: When designing an algorithm, one cares about arithmetic/computational complexity, but data movement (I/O) complexity is playing an increasingly important role that highly impacts performance and energy consumption. The objective of I/O complexity analysis is to compute, for a given program, its minimal I/O requirement among all valid schedules. We consider a sequential execution model with two memories, an infinite one, and a small one of size S on which the computations retrieve and produce data. The I/O is the number of reads and writes between the two memories. From this model, we review various Numerical Linear Algebra kernels that are increasingly complicated from matrix-matrix multiplication, to LU factorization, then to symmetric rank-k update, to Cholesky factorization, then to Modified Gram-Schmidt to Householder QR factorization. We will show practical applications of these results too.Bio: Julien Langou is a Professor in the Department of Mathematical and Statistical Sciences at the University of Colorado Denver. In 1999, he obtained a MS in Propulsion Engineering from French Engineering School Supaéro (Toulouse) and, in 2003, a PhD in Applied Mathematics from National Polytechnique Institute of Toulouse. From 2003 to 2006, he was a Research Scientist at the University of Tennessee (Knoxville) in the Department of Computer Science. He has been a faculty member at the University of Colorado Denver since Fall 2006. His research area is Numerical Linear Algebra where he designs new algorithms that are more efficient or more stable or more parallelizable. He has been particularly active in the development of publicly available and successful software libraries such as LAPACK.


Title: Reliable and Expressive Probabilistic Modeling of Large, Complex Datasets

Date/Time: Monday, April 29th at 4:10pm in Barnard 108

Speaker: Mike Wojnowicz

Abstract: My research program develops interpretable models that blend the strengths of statistics and machine learning. The driving question is:  "Can we develop interpretable probabilistic models that are expressive, and which can be trained on complex datasets without human intervention in a fast, lightweight, and reliable manner?" Solutions are critical for fields like cybersecurity, finance, and healthcare which require flexible trustworthy models for complex data delivered by multiple real-time sensors.  I will highlight several projects, including scaling Bayesian categorical regressions to learn user-specific temporal usage patterns across thousands of computer processes, finding influential samples within millions of portable executable files characterized with 100,000 features, and identifying anomalous spatial segments from a chromosome with 250 million base pairs.  I also describe a vision for future work, with applications to the NSF-funded SMART FIRES (Sensors, Machine Learning, and Artificial Intelligence in Real Time Fire Science) project.Bio: Mike Wojnowicz is a research associate in the Dept. of Biostatistics at Harvard University.    Previously, he was a postdoctoral researcher in the Dept. of Computer Science at Tufts University, and a Data Scientist at Tufts' Data Intensive Studies Center.   He was also Distinguished Data Scientist at Cylance, a cybersecurity company acquired for $1.4 billion.  At Cylance, Mike developed statistical machine learning models for detecting malicious computer files and anomalous user activity, leading to 10 patents (5 first-author).  Mike's Ph.D. is from Cornell University, where his work in Cognitive Science led to the Dallenbach Fellowship for Research Excellence, the Cognitive Science Dissertation Proposal Award, and the Cognitive Science Graduate Research Award.  Mike also has master's degrees in mathematics (University of Washington) and statistics (University of California at Irvine).   Mike’s research interests include scalable Bayesian inference, spatiotemporal models, and applications of measure-theoretic probability.


Title: Welcome Seminar

Date/Time: Monday, August 26th at 4:10pm in Barnard 108

Speaker: John Paxton

Abstract: An opportunity to (1) meet students, staff and faculty, (2) learn what's new for the upcoming academic year and (3) ask questions.


Title: Introduction to MSU Computing Resources (RCI)

Date/Time: Monday, September 9th at 4:10pm in Barnard 108

Speaker: Nitasha Fazal, Alex Salois

Abstract: Members of Research Cyberinfrastructure (RCI) at MSU will present on the computing and data storage resources available to all researchers—including students—at the university. RCI is responsible for the Tempest high-performance computing (HPC) research cluster which is the largest supercomputer in Montana. A demo of Tempest will be shown.


Title: Humans vs The Computer Interfaces: The Challenge of Separating Deepfakes/Bots from People

Date/Time: Monday, September 16th at 4:10pm in Barnard 108

Speaker: Patrick Traynor

Abstract: The staggering advances in machine learning over the past decade have made it easy for computers to hear and sound just like us. While there are many positive applications of such technology, it is already having substantial impacts on fraud detection, political discussions, and more. In this talk, I’ll cover some of our efforts to characterize fundamental differences between the ways that humans and machines “speak”, challenges in how we evaluate defenses in this space, and why it’s currently so hard to deploy such defenses in the real world. My talk will pull techniques from fluid mechanics, psychoacoustics, classical information security, and user studies, but is designed to be accessible to a general computer science audience.

Bio: Patrick Traynor is the John and Mary Lou Dasburg Preeminent Chair in Engineering and a Professor in the Department of Computer and Information Science and Engineering (CISE) at the University of Florida. His research has uncovered critical vulnerabilities in cellular networks, developed techniques to find credit card skimmers that have been adopted by law enforcement and created robust approaches to detecting and combating Caller-ID scams. He received a CAREER Award from the National Science Foundation in 2010, was named a Sloan Fellow in 2014, a Fellow of the Center for Financial Inclusion at Accion in 2016 and a Kavli Fellow in 2017. Professor Traynor earned his Ph.D and M.S. in Computer Science and Engineering from the Pennsylvania State University in 2008 and 2004, respectively, and his B.S. in Computer Science from the University of Richmond in 2002. He is also a co-founder of Pindrop Security, CryptoDrop, and Skim Reaper.


Title: Climate Change, Health, and Data Science

Date/Time: Monday, September 23rd at 4:10pm in Barnard 108

Speaker: Cascade Tuholske

Abstract: Understanding how climate change is impacting human health and well-being requires assimilating increasingly large amounts of complex geospatial data. Recent advances in open-source interactive computational computing, like JupyterHub, have reduced barriers and enabled more researchers to leverage high-performance computing (HPCs) to process such data. However, many demographic and socioeconomic datasets do not easily integrate with climate and weather data. Moreover, many social scientists do not receive the requisite training to process such datasets on HPCs. Similarly, climate scientists do not often understand the many caveats of human data. Issues ranging from disparate coordinate reference systems, spatial and temporal resolutions, and edge effects to memory management, CPU optimization, data artifacts, lack of metadata, and data privacy must be anticipated. Here, I will discuss lessons learned from my efforts to integrate such human and environmental datasets, including my evolution from a largely self-taught data scientist to my current use of HPCs—namely Tempest—today. I will highlight the development of recent high-resolution global climate projections and how we are using those projections to understand current and future impacts on human health and well-being, both here in Montana and around the world. In sharing these insights, I am eager to engage in a dialogue with computer scientists to explore together how we can improve our approaches and drive meaningful impact in these critical areas of research.

Bio: Cascade Tuholske is Assistant Professor of Human-Environment Geography in the Department of Earth Sciences at Montana State University. Broadly, his research examines interactions between climate change, demography, human health, and food systems. To this end, he and his team develop novel geospatial algorithms to integrate and analyze large volumes of climate, environmental, and demographic datasets on high-performance clusters like Tempest. An example of this are new high-resolution (5 km) global climate projections for 2030 and 2050 that down-scale the standard 100-250km CMIP6 model outputs, recently published in Scientific Data. His research is currently funded by NASA, the Department of Defense, the National Institutes of Health, the Wellcome Trust, and Microsoft. Prior to joining Montana State, he worked as a Postdoctoral Research Scientist at the Center for Earth Science Information Network (CIESIN), which is a part of the Climate School at Columbia University, and he holds a PhD and a MA in Geography from the University of California, Santa Barbara.


Title: General Purpose Computing with a Graphics Processor Unit Using CUDA

Date/Time: Monday, October 14th at 4:10pm in Barnard 108

Speaker: Peter Zeno

Abstract: The advent of general-purpose computing using a graphics processor unit (GPGPU) was a significant milestone for massive acceleration of scientific computing software. Nvidia’s CUDA enabled hardware change to their graphics processors starting in 2006 has allowed for continued processing acceleration growth for certain applications, particularly for AI and ML, despite physics issues of continual miniaturization of transistor size in semiconductor chip fabrication. What is the CUDA enabled GPU architecture and how it differs from a typical CPU’s, the CUDA C program’s anatomy, thread and block level indexing, memory hierarchy, and performance considerations? These questions and more will be answered in this talk.

Bio: Dr. Zeno is currently a freelance consultant and the founder of ZenoRobotics, LLC. His main areas of interest lie in autonomous mobile robotic systems, high-performance computing as related AI and ML, FPGA design and verification, and neuroscience, to name a few. Dr. Zeno worked as an electrical engineer in industry for over 30 years. He has worked for the Department of Defense, Raytheon, Lockheed Martin, Lincoln Laboratories MIT, and Synopsys. He recently worked with Carnegie Mellon University in Pittsburgh where he assisted with the AMR testing for the DARPA Subterranean Challenge.


Title: Multidisciplinary Operations Research Applied to People and Things

Date/Time: Monday, November 4th at 4:10pm in Barnard 108

Speaker: Shane Hall

Abstract: This presentation describes specific problems solved through multidisciplinary operations research applied to people and things. The solution of these problems requires a variety of operations research methodologies with mathematical optimization being the unifying theme.

One problem seeks to improve child immunization rates (people).  Vaccination against infectious disease is hailed as one of the great achievements in public health. However, the United States Recommended Childhood Immunization Schedule has become increasingly complex as it is expanded to cover additional diseases. Moreover, biotechnology advances have allowed vaccine manufacturers to create combination vaccines that immunize against several diseases in a single injection. All these factors create many immunization choices (each with a different cost) for public health policy makers, pediatricians, and parents/guardians (each with a different perspective). The General Vaccine Formulary Selection Problem (GVFSP) is introduced to model general childhood immunization schedules that can be used to illuminate these choices by selecting a vaccine formulary that minimizes the cost of fully immunizing a child.

Another problem examines risk within a complex system of interdependent systems (things).  The United States Air Force (AF) is modeled as a network of diverse and interdependent core capabilities. These capabilities divide the AF budget into mutually exclusive and collectively exhaustive bins.  The quantification and modeling of the vulnerabilities and dependencies shared across the AF core capabilities have led to a better understanding of what factors drive risk across the AF enterprise and its overall contribution to the United States’ ability to win wars. This risk-focused cognitive decision framework is called the Comprehensive Core Capability Risk Assessment Framework (C3RAF) and provides key insights on strategic investment decisions to AF senior leaders. 

The presentation will also present additional recent research problems that explore the use of metaheuristic optimization coupled with simulation and machine learning (ML) models to perform prescriptive analytics and assist with the verification, validation, assurance and trust of ML models.

Bio: Dr. Shane N. Hall is an Assistant Professor of Management in the Jake Jabs College of Business and Entrepreneurship at Montana State University (MSU).  He is an expert in linear, integer, and combinatorial optimization models and methods.  His expertise extends to approximation and metaheuristic solution methods.  Dr. Hall has applied this expertise across a broad set of real-world military and healthcare problems.  He earned his B.S. in Mathematics from Brigham Young University-Provo, M.S. in Operations Research from the Air Force Institute of Technology (AFIT), and his Ph.D. in Industrial Engineering from the University of Illinois at Urbana-Champaign.  He also holds Department of Defense acquisition professional certifications in systems engineering, science and technology management, and project management.  Dr. Hall is the author of several peer-reviewed journal articles, conference papers, and book chapters.  He is an active member of INFORMS (Institute for Operations Research and the Management Sciences), MORS (Military Operations Research Society), and IISE (Institute of Industrial and Systems Engineers).

Prior to joining the faculty at MSU, Dr. Hall was a Principal Analyst at OptTek Systems, Inc. in Boulder, Colorado.  He was the principal investigator for several competitive Small Business Innovation Research/Small Business Technology Transfer (SBIR/STTR) research and developments efforts supporting organizations across the United States Department of Defense.  Before OptTek, Dr. Hall served over twenty years as an active-duty officer and operations research analyst in the United States Air Force. 

Title: Grammar-Based Compression for Pangenome Storage and Analysis

Date/Time: Monday, December 2nd at 4:10pm in Barnard 108

Speaker: Alan Cleary

Abstract: Innovations in data storage, sensing technology, and Artificial Intelligence has resulted in a data deluge across a variety of industries, necessitating the need for improved data processing systems. As a result, the once niche research topic of "computation over compressed data" has begun to receive much attention in the literature. While some of the recent developments in this area have been truly impressive, the results have been largely theoretical, with implementations typically being more demonstrative than practical.

The quintessential example of a domain that would benefit from practical methods for performing computation over compressed data is Microbiology, which has sustained exponential growth of DNA sequencing for more than a decade. Not only is there a deluge of newly sequenced genomes to be analyzed, there is also a need to analyze collections of genomes for a particular species. These pangenomes have already proven to be a powerful research tool but are challenging to analyze due to their size, which continues to grow. For example, large sequencing projects like the UK Biobank and the Human Pangenome Reference Consortium have already generated pangenomes of unprecedented size and will continue to sequence more genomes. For these reasons, the need for computational techniques to analyze pangenomes at scale is widespread and immediate.

In this talk, I will discuss a specific approach to computation over compressed data called grammar-based compression and show that it is particularly well-suited to pangenome analysis. I will then outline the requirements necessary to make grammar-based compression practical for pangenome analysis at scale and review progress we have made towards meeting these requirements. I will conclude with a discussion of our current and future research in this area and a discussion of how we may extend our techniques to other domains and data types.

Bio: Dr. Alan M. Cleary is a Computational Research Scientist and Senior Software Engineer at the National Center for Genome Resources—a nonprofit research institute located in Santa Fe, NM. In this role, Dr. Cleary works on a variety of projects across academia, government, and industry. Dr. Cleary's expertise is in applied algorithms and scientific software engineering for Computational Biology. His research interests include discrete algorithms and succinct data structures for (graphical) pangenomics; using Data Mining, Machine Learning, and Artificial Intelligence to perform analyses of pangenomic and multiomic data; and implementing software to perform these analyses at scale. Dr. Cleary earned his Ph.D. in Computer Science from Montana State University in 2018.


Seminars from 2023.