Computer Science Dept.357 EPS Building
Montana State University
Bozeman, MT 59717
Tel: (406) 994-4780
Department Head:John Paxton
Presented by: Liessman Sturlaugson
The continuous time Bayesian network (CTBN) enables reasoning about complex systems in continuous time by representing a system as a factored, finite-state, continuous-time Markov process. The dynamics of the CTBN are described by each node's conditional intensity matrices, determined by the states of the parents in the network. As the CTBN is a relatively new model, many extensions that have been defined with respect to Bayesian networks (BNs) have not yet been extended to CTBNs. This thesis presents five novel extensions to CTBN modeling and inference.
First, we prove several complexity results specific to CTBNs. It is known that exact inference in CTBNs is NP-hard due to the use of a BN as the initial distribution. We prove that exact inference in CTBNs is still NP-hard, even when the initial states are given, and prove that approximate inference in CTBNs, as with BNs, is also NP-hard. Second, we formalize performance functions for the CTBN and show how they can be factored in the same way as the network, even when the performance functions are define with respect to interaction between multiple nodes. Performance functions extend the model, allowing it to represent complex, user-specified functions of the behaviors of the system. Third, we present a novel method for node marginalization called "node isolation" that approximates a set of conditional intensity matrices with a single unconditional intensity matrix. The method outperforms previous node marginalization techniques in all of our experiments by better describing the long-term behavior of the marginalized nodes. Fourth, using the node isolation method we developed, we show how methods for sensitivity analysis of Markov processes can be applied to the CTBN while exploiting the conditional independence structure of the network. This enables efficient sensitivity analysis to be performed on our CTBN performance functions. Fifth, we formalize both uncertain and negative types of evidence in the context of CTBNs and extend existing inference algorithms to be able to support all combinations of evidence types. We show that these extensions make the CTBN more powerful, versatile, and applicable to real-world domains.
Presented by: Nathan Fortier
While Bayesian networks provide a useful tool for reasoning under uncertainty, learning the structure of these networks and performing inference over them is NP-Hard. We propose several heuristic algorithms to address the problems of inference, structure learning, and parameter estimation in Bayesian networks. The proposed algorithms are based on Overlapping Swarm intelligence, a modification of particle swarm optimization in which a problem is broken into overlapping subproblems and a swarm is assigned to each subproblem. We describe how the problems of inference, structure learning, and parameter estimation can be broken into subproblems, and provide communication and competition mechanisms that allow swarms to share information about learned solutions.
For the problems of full and partial abductive inference, a swarm is assigned to each relevant node in the network. Each swarm learns the relevant state assignments associated with the Markov blanket for its corresponding node. Swarms with overlapping Markov blankets compete for inclusion in the final solution.
For the problem of structure learning a swarm is associated with each node in the network. Each swarm learns the parents and children of its associated node. Swarms that learn conflicting substructures compete for inclusion in the final network structure. In our approach to parameter estimation, a swarm is associated with each node in the network that corresponds to either a latent variable or a child of a latent variable. Each nodes corresponding swarm learns the parameters associated with that nodes Markov blanket. Swarms with overlapping Markov blankets compete for inclusion in the final parameter set.
Presented by: Derek Reimanis
Replications play a pivotal role in Empirical Software Engineering, and although significant progress has been made in terms of vernacular and classification, the majority of this corpus refers to formal experiments. Herein, we present a replication case study where structural information about a system is used in conjunction with bug-related change frequencies to measure and predict architecture quality. We identified dependencies between components that change together even though they belong to different architectural modules, and as a consequence are more prone to bugs. We validated these dependencies by presenting our results back to the developers. The developers did not identify any of these dependencies as unexpected, but rather architectural necessities. This replication study adds to the knowledge base of CLIO (a tool that detects architectural degradations) by incorporating a new programming language (C++) and by externally replicating a previous case study on a separate commercial code base.
Presented by: Melissa Dale
The purpose of this research is to study the effects of code changes that violate a design pattern’s intended role on the quality of a project. We use technical debt as an overarching surrogate measure of quality. Technical debt is a metaphor borrowed from the financial domain used to describe the potential cost necessary to refactor a software system to agreed upon coding and design standards. Previous research by Izurieta and Bieman defined violations in the context of design patterns as grime. Because technical debt can ultimately lead to the downfall of a project, it is important to understand if and how grime may contribute to a system’s technical debt.
To investigate this problem, we have developed a grime injector to model grime growth on Java projects. We use SonarQube’s technical debt software to compare the technical debt scores of six different types of modular grime previously deﬁned by Schanz and Izurieta. These six types can be classified along three major dimensions: strength, scope, direction and.
We find that the strength dimension is the most important contributor to the quality of a design and that temporary grime results in higher technical debt scores than persistent grime. This knowledge will help to make design decisions which could help manage a project’s technical debt.
Defending Data from Digital Disasters: Engineering Next Generation Systems for Emerging Problems in Data Science
Presented by: Eric Rozier
Of the data that exists in the world, 90% was created in the last two years. Last year over 2,837 exabytes of data were produced, representing an increase of 230% from 2010. By next year this total is expected to increase to 8,591 exabytes, reaching 40,026 exabytes by 2020. Our ability to create data has already exceeded our ability to store it, with data production exceeding storage capacity for the first time in 2007. Our ability to analyze data has also lagged behind the deluge of digital information, with estimates putting the percent of data analyzed at less than 1%, while an estimated 23% of data created would be useful if analyzed. Reliability, security, privacy, and confidentiality needs are outpacing our abilities as well, with only 19% of data protected. For these reasons we need systems that are not only capable of storing the raw data, but doing so in a trustworthy manner, while enabling state of the art analytics.
In this talk we will explore problems in data science applications to medicine, climate science, natural history, and geography, and outline the reliability, availability, security, and analytics challenges to data in these domains. We will present novel, intelligent, systems designed to combat these issues by using machine learning to apply a unique software defined approach to data center provisioning, with dynamic architectures, and on-the-fly reconfigurable middleware layers to address emergent problems in complex systems. Specifically we will address issues of data dependence relationships, and the threat they pose to long term archival stores, and curation, as well as techniques to protect them using novel theoretical constructs of second-class data and shadow syndromes. We will discuss the growing problem presented by the exponential explosion of both system and scientific metadata, and illustrate a novel approach to metadata prediction, sorting, and storage which allow systems to better scale to meet growing data needs. We will explore problems in data access in the cloud of private records, illustrating the pitfalls of trusting provider claims with real world audits conducted by our lab which successfully extracted synthetic patient data through inadvertent side-channels, and demonstrate novel search techniques which allow for regular expression based search over encrypted data while placing no trust in the cloud provider, ensuring zero information leakage through side-channels. Finally, we will conclude by discussing future work in systems engineering for Big Data, outline current challenges, and future pitfalls of next generation systems for data science.
Dr. Eric Rozier is an Assistant Professor of Electrical and Computer Engineering, head of the Trustworthy Systems Engineering Laboratory, and director of the Fortinet Security Laboratory at the University of Miami in Coral Gables, Florida. His research focuses on the intersection of problems in systems engineering with Big Data, Cloud Computing, and issues of reliability, performability, availability, security and privacy. Prior to joining Miami, Dr. Rozier has served as a research scientist at NASA Langley Research Center, and the National Center for Supercomputing Applications, and as a Fellow at IBM Almaden Research Center. His work in Big Data and systems engineering has been the subject of numerous awards, including being named an Frontiers of Engineering Education Faculty Member by the National Academy of Engineering in 2013, and an Eric & Wendy Schmidt Data Science for Social Good Faculty Fellow at the University of Chicago for Summer 2014.
Dr. Rozier completed his PhD in Computer Science at the University of Illinois at Urbana-Champaign where he served as an IBM Doctoral Fellow, and worked on issues of reliability and fault-tolerance of the Blue Waters supercomputer with the Information Trust Institute. Dr. Rozier has been a long time member of the IEEE, ACM, a member of the AIAA Intelligent Systems Technical Committee where he serves with the Publications and the Professional Development, Education, and Outreach subcommittees.
Presented by: Shamim Hafiz
Contemporary Software Engineering is significantly aided by the creation of models. Therefore, early evaluation of the quality of a software system can be done by assessing corresponding models. Often, design and development teams compromise the quality of the implementation to release a product early or simply as a result of ”bad” practice. The undermining of software by such compromises is referred to as Technical Debt which needs to be addressed through refactoring. This paper presents a survey of quality and quantitative analysis of Technical Debt by exploring proposed frameworks and case studies. The paper also summarizes model-driven software refactoring and quality assessment of object-oriented models. Further, the author proposes scope for future work in developing frameworks to assess quality of models in terms of Technical Debt.
Advances in Linear Temporal Logic Translation: Ensuring the Safety of Safety-Critical Aeronautics Systems
Presented by: Kristin Y. Rozier
Formal verification techniques are growing increasingly vital for the development of safety-critical software and hardware. Techniques such as requirements-based design and model checking have been successfully used to verify systems for air traffic control, airplane separation assurance, autopilots, logic designs, medical devices, and other functions that ensure human safety. Formal behavioral specifications written early in the system-design process and communicated across all design phases increase the efficiency, consistency, and quality of the system under development. We argue that to prevent introducing design or verification errors, it is crucial to test specifications for satisfiability. These specifications can then be used to ensure system safety, from design-time to run-time.
In 2007, we established Linear Temporal Logic (LTL) satisfiability checking as a sanity check: each system requirement, its negation, and the set of all requirements should be checked for satisfiability before being utilized for other tasks, such as property-based system design or system verification via model checking. Our extensive experimental evaluation proved that the symbolic approach for LTL satisfiability checking is superior. However, the performance of the symbolic approach critically depends on the encoding of the formula. Since 1994, there had been essentially no new progress in encoding LTL formulas for this type of analysis. We introduced a set of 30 encodings, demonstrating that a portfolio approach utilizing these encodings translates to significant, sometimes exponential, improvement over the standard encoding for symbolic LTL satisfiability checking. We highlight major impacts of this work in aeronautics. We use these formal verification techniques to ensure there are no potentially catastrophic design flaws remaining in the design of the next Air Traffic Control system before the next stage of production. Also, our run-time monitoring of LTL safety specifications can enable a fire-fighting Unmanned Aerial System to fly!
Dr. Kristin Y. Rozier holds a position as a Research Computer Scientist in the Intelligent Systems Division of NASA Ames Research Center and a courtesy appointment at Rice University. She earned a Ph.D. from Rice University in 2012 and B.S. and M.S. degrees from The College of William and Mary in 2000 and 2001, all in theoretical computer science. Dr. Rozier's research focuses on automated techniques for the formal specification, validation, and verification of safety critical systems. Her primary research interests include: design-time checking of system logic and system requirements; specification debugging techniques and theory; and safety and security analysis. Her applications of computer science theory in the aeronautics domain earned her the American Helicopter Society's Howard Hughes Award, the American Institute of Aeronautics and Astronautics Intelligent Systems Distinguished Service Award, and the Women in Aerospace Inaugural Initiative-Inspiration-Impact Award. She has also earned the Lockheed Martin Space Operations Lightning Award, the NASA Group Achievement Award, and Senior Membership to IEEE, AIAA, and SWE. Dr. Rozier serves on the AIAA Intelligent Systems Technical Committee, where she chairs both the Publications and the Professional Development, Education, and Outreach (PDEO) subcommittees. She has served on the NASA Formal Methods Symposium Steering Committee since working to found that conference in 2008 and is serving as PC chair for the second time this year.
Presented by: Margareta Ackerman
Clustering is a central unsupervised learning task with a wide variety of applications. However, in spite of its popularity, it lacks a unified theoretical foundation. Recently, there has been work aimed at developing such a theory. We discuss recent advances in clustering theory, starting with results on clustering axioms. We will then discuss a new framework for addressing one of the most prominent problems in the field, the selection of a clustering algorithm fora specific task. The framework rests on the identification of central properties capturing the input-output behaviour of clustering paradigms. We present several results in this direction, including a characterization of linkage-based clustering methods.
Dr. Margareta Ackerman is currently a Postdoctoral Fellow at UC San Diego, and she got her PhD in Computer Science from the University of Waterloo under the supervision of Professor Shai Ben-David. Her research interests span Machine Learning, Information Retrieval, Game Theory, Auomata Theory, and Bioinformatics. The focus of her work is developing theoretical foundations of clustering that are independent of any specific algorithm or objective function. We provide a consistent set of axioms for clustering and perform a theoretical study of clusterability (NIPS '08 and AISTATS '09). Her recent work focuses on providing guidelines for selecting clustering algorithms based on their input-output behaviour (AISTATS '13, AAAI '12, IJCAI '11, NIPS '10, COLT '10).
Presented by: Pascal Hitzler
Despite numerous applications in specific scenarios, the use of ontologies for data organization, management, and integration is severely limited if faced with high volumes of heterogeneous data. Traditional ontology-based approaches using large, monolithic ontologies suffer from the drawbacks of strong ontological commitments which force perspectives on the data which may be at odds with the underlying intentions and perspectives of the data providers. In this presentation, we discuss ways forward in ontology modeling and use for high-volume heterogeneous data. In particular, we discuss the importance of combining data analytics with knowledge representation, and the use of ontology design patterns for flexible data organization and integration, including a current use case in oceanography.
Pascal Hitzler is Associate Professor at the Department of Computer Science and Engineering at Wright State University in Dayton, Ohio, U.S.A. From 2004 to 2009, he was Akademischer Rat at the Institute for Applied Informatics and Formal Description Methods (AIFB) at the University of Karlsruhe in Germany, and from 2001 to 2004 he was postdoctoral researcher at the Artificial Intelligence institute at TU Dresden in Germany. In 2001 he obtained a PhD in Mathematics from the National University of Ireland, University College Cork, and in 1998 a Diplom (Master equivalent) in Mathematics from the University of Tubingen in Germany. His research record lists over 250 publications in such diverse areas as semantic web, neural-symbolic integration, knowledge representation and reasoning, machine learning, denotational semantics, and set-theoretic topology. He is Editor-in-chief of the Semantic Web journal by IOS Press, and of the IOS Press book series Studies on the Semantic Web. He is co-author of the W3C Recommendation OWL 2 Primer, and of the book Foundations of Semantic Web Technologies by CRC Press, 2010 which was named as one out of seven Outstanding Academic Titles 2010 in Information and Computer Science by the American Library Association's Choice Magazine, and has translations into German and Chinese. He is on the editorial board of several journals and book series and on the steering committee of the RR conference series, and he frequently acts as conference chair in various functions, including e.g. General Chair (RR2012), Program Chair (AIMSA2014, ODBASE2011, RR2010), Track Chair (ESWC2013, ESWC20111, ISWC2010), Workshop Chair (K-Cap2013), Sponsor Chair (ISWC2013, RR2009, ESWC2009). For more information, see http://www.pascal-hitzler.de.
Presented by: Bowen Hui
Software development has historically adopted a "one-size-fits-all" approach in which applications are designed with a single target user group in mind, rather than tailoring the software features to the needs of specific users. The ability to customize software has become increasingly important as users are faced with larger, more complex software. To tackle this problem, my work adopts an intelligent agent's perspective where the system views this software customization problem as a decision-theoretic planning problem under uncertainty about the user. In my dissertation, I proposed a methodological framework for developing intelligent software interaction and assistance. Using this framework, I will highlight the interdisciplinary nature of the problem and present details of a case study to illustrate different aspects of AI and HCI involved. Current projects leveraging these ideas in the areas of education and digital youth will also be presented.
Bowen is an instructor in Computer Science at the University of British Columbia and runs her own software company Beyond the Cube Ltd. Her main research interest is intelligent user interfaces, with emphasis in probabilistic user modeling, computational linguistics, and online educational tools. She completed her PhD from the University of Toronto in 2011.