Emerging Needs for Informatics in Environmental Science and an Agronomic Case Study

Date/Time: Monday, November 30, 2015 from 4:10 p.m. - 5:00 p.m.

Location: EPS 108

Presenter: Rob Payn and Bruce Maxwell, Land Resources and Environmental Science, Montana State University

Abstract: Ecosystem and environmental scientific analyses are rapidly becoming more data intensive due to shifts in the scope of research questions and a decrease in the cost of on-site monitoring equipment. The environmental science community is thus faced with the need to incorporate informatics skillsets, in order to enable a more general ability within the community to perform data intensive scientific workflows and analyses. The ontology of the emerging field of “ecoinformatics” must incorporate a diverse set of data descriptors and data collection methods, requiring a flexible and extensible workflow management tool to manage both the data and the metadata that provides context within a given environmental system. We present the general state of informatics in environmental science, and the general nature of inferential and predictive workflows in the typical environmental scientific analysis. We further present a case study based on the emerging need for more efficient and sustainable agricultural practices. Within a given producing field, spatially variable applications of fertilizer, herbicides, and harvest provide a promising pathway to maximizing yield and profit from the agronomic system, while also minimizing long-term environmental damage to the system that supports that yield. This is a management strategy frequently called “precision agriculture”. The optimization workflow necessary to support this strategy is inherently data intensive, and development of a decision and workflow management system to handle the data flow is being funded in the Montana University System by the state legislature. We present the background research developed in MSU Land Resources and Environmental Sciences laboratories to find yield models that are effective in this effort, and we present an overview of the activities and collaborations with the Computer Science department toward developing automated workflow for optimizing Montana agronomic systems for sustained yield.

Machine Learning in Software Engineering

Date/Time: Monday, November 23, 2015 from 4:10 p.m. - 5:00 p.m.

Location: EPS 108

Presenter: Upulee Kanewala, Computer Science, Montana State University

Abstract: Machine learning deals with the issue of how to build programs that improve their performance at some task through experience. Machine learning algorithms have proven to be of great practical value in a variety of application domains. Not surprisingly, the field of software engineering turns out to be an area where many software development and maintenance tasks could be formulated as learning problems and approached in terms of learning algorithms. This talk will discuss applying machine learning in software engineering. First we provide the characteristics and applicability of some frequently utilized machine learning algorithms. Then we summarize and analyze the existing work and discuss some general issues in this area.

Video recording available: https://www.youtube.com/watch?v=vy_TmY4VI5Q

Title: Modeling Landscape Conservation of Greater Sage Grouse in Relation to Oil and Gas Development in Montana, North Dakota, South Dakota, and Wyoming

Date/Time: Monday, November 16, 2015 from 4:10 p.m. - 5:00 p.m.

Location: EPS 108

Presenter: Rick Sojda, Computer Science, Montana State University

Abstract: The effects of oil and gas development on the conservation of greater sage grouse concerns wildlife managers, but the effects of development are difficult to ascertain, a situation typical where cause-effect relationships are complex, multivariate, and involve landscape perspectives.  Understanding the potential effects of development on grouse first requires predicting where development is expected to occur on a landscape level.  I describe gathering “reasonable foreseeable development” spatial data from the USDI’s Bureau of Land Management that were available for Montana, North Dakota, South Dakota, and Wyoming.  These data were disparate across the study area, and I chronicle the GIS processes I used for standardizing the data across mapping units to establish consistent and quantitative categories.  Maps will be displayed of the number of wells per township as projected in the BLM data.  These data are then shown as overlays with the priority areas for conservation for greater sage grouse.  Using the density of the predicted number of wells, I then present a regional-scale view of where the effects of development are expected to occur. Using Bayesian belief network methods, I began to model the relative spatial risk to greater sage grouse from oil and gas development based on the published literature.  Risk analyses from site specific studies were linked to a conceptual model of the annual life cycle events of grouse.  Using the density of the predicted number of wells, I present a conceptual model of the regional-scale view of where the effects of development are expected to occur. The constraints to representing this in a spatial model using GIS are delineated.

Columnar Store Architecture for a Specialized Genomic Big Data Warehousing and Analytics

Date/Time: Monday, November 2, 2015 from 4:10 p.m. - 5:00 p.m.

Location: EPS 108

Presenter: Gabe Rudy, Golden Helix

Abstract: Starting with an overview of database and big-data solutions for the different use cases of Online Transactional Processing (OLTP), Data Warehousing and Analytical Processing (OLAP) and the No-SQL mix bag of big-data needs, we will dive into the read-optimized requirements of data warehousing and big-data scientific workflows. Golden Helix builds tools for researchers and clinicians to work on genome-scale datasets on their own hardware, and we will present on the specialized columnar-store data backend built to power their genomic application suite. Column-store architectures have won as the leading design of data warehousing solutions, but does not require forgoing the analytical power and convenience of a SQL interface. In fact, we will review the implementation details of a PostgreSQL foreign-data-wrapper we wrote for this customized column-store file format that powers Golden Helix genomic data warehousing solution built to scale to hundreds of millions of unique genomic variant sites for thousands of exomes and genomes.

Video recording available: https://www.youtube.com/watch?v=U32XDDe6lCc

An Analysis of Existing Contributions to Continuous Time Bayesian Networks (PhD Qualifier)

Date/Time: Monday, October 26, 2015 from 4:10 p.m. - 5:00 p.m.

Location: EPS 108

Presenter: Logan Perreault, MSU

Abstract: Continuous time Bayesian networks (CTBNs) are a relatively new model used for representing discrete state systems that evolve in continuous time. Here we provide a survey of the literature and identify the major contributions that have been made in the CTBN literature. In addition, we identify deficiencies and open questions that have not yet been addressed in the field. We use the results of this survey to suggest potential research avenues that extend the forefront of the CTBN literature.

Video recording available: https://www.youtube.com/watch?v=NwS8lrCVRZM

Disposable Infrastructure: Developing Cloud Infrastructure as a Utility

Date/Time: Monday, October 19, 2015 from 4:10 p.m. - 5:00 p.m.

Location: EPS 108

Presenter: James Hirmas, US Geological Survey

Abstract: As cloud computing becomes conventional, organizations will be required to transition from traditional rigid systems engineering, networking, application development, and architectures to a more flexible cloud on-demand model.  Instead of capacity planning for compute, networking, and storage 2-5 years in the future, the expectation will be that cloud architectures can dynamically right scale to specific demand per hour or in some cases per minute.  Solutions will need to be architected to decouple hard dependencies to compute, network, applications, and storage so that each layer can scale independently and/or transition seamlessly to other cloud vendors without effecting other layers.  The evolution of IT infrastructure transitioning to disposable cloud infrastructure will also have a dramatic impact to Security Operations, Financial Management, Development Lifecycle, and overall operational management.  In this seminar, we will discuss the principles of cloud disposable architectures, Cloud operational management, and real-world use cases.

Video recording available: https://www.youtube.com/watch?v=tEkPg1WNwS8

An Introduction to Persistent Homology

Date/Time: Monday, October 5, 2015 from 4:10 p.m. - 5:00 p.m.

Location: EPS 108

Presenter: Brittany Fasy, Computer Science, Montana State University

Abstract: Persistent homology is a widely used tool in Topological Data Analysis that encodes multi-scale topological information as a multi-set of points in the plane, called a persistence diagram. The method involves tracking the birth and death of topological features as one varies a tuning parameter. Features with short lifetimes are informally considered to be "topological noise," and those with a long lifetime are considered to be "topological signal." To formally distinguish signal from noise, we bring some statistical ideas to persistent homology in order to derive confidence sets for persistence diagrams.

Database for Dynamics: A New Approach to Model Gene Regulatory Networks

Date/Time: Monday, September 28, 2015 from 4:10 p.m. - 5:00 p.m.

Location: EPS 108

Presenter: Tomas Gedeon, Mathematical Sciences, Montana State University

Abstract: Experimental data on gene regulation   is mostly qualitative, where the only information available about pairwise interactions is the presence of  either up-or down- regulation. Quantitative data is often subject to large uncertainty and is mostly in terms of fold differences. Given these realities, it is very difficult to make reliable predictions  using mathematical models. The current approach of choosing  reasonable  parameter values, a few initial conditions and then making predictions based on resulting solutions is  severely subsampling both the parameter and phase space. This approach does not produce provable and reliable predictions.

We present a new approach that uses continuous time Boolean networks as a platform for qualitative studies of gene regulation.    We compute a  Database for Dynamics, which  rigorously approximates  global dynamics over  entire parameter space. The results obtained by this method provably capture the dynamics at a predetermined spatial scale.

Video recording available: https://www.youtube.com/watch?v=G7ARktJQT14

Ecological Remote Sensing: Confronting the Challenges of Autonomous Data Collection

Date/Time: Monday, September 21, 2015 from 4:10 p.m. - 5:00 p.m.

Location: EPS 108

Presenter: Robb Diehl, US Geological Survey

Abstract: Not long ago, ecological data was collected by biologists tromping around in the field, pencil and paper in hand.  These observations are still critical to the discipline, but a new era in remote sensing is automating the collection of ecological data, dramatically expanding the research productivity of the individual investigator.  Increasingly, field personnel are being replaced by durable, autonomous hardware that runs continuously and gathers data with high accuracy.  But these advances bring new challenges.  By replacing humans with autonomous sensors, we create enormous software challenges; in effect, we have moved intelligent data processing from the front-end to the back-end of the data collection process.  Our ability to effectively post-process data still lags our ability to gather it, but this has created new and challenging opportunities to advance scientific computing.  I give examples and consider the wide-ranging opportunities for productive collaboration between computer scientists and ecologists.

Video recording available: https://www.youtube.com/watch?v=l19zkAw0n2w


The Effectiveness of Software Development Instruction Through the Software Factory Method for High School Students

Date/Time: Monday, August 31, 2015 from 4:10 p.m. - 5:00 p.m.

Location: EPS 108


  • Jessica Jorgenson, James Jacobs, Zach Hansen — Bozeman High School
  • Mike Trenk, MacKenzie O'Bleness — Montana State University

Abstract: Teaching software development in environments that mimic industry practices is essential for teaching applicable real-word development skills. In addition, these kinds of delivery based projects engage students in meaningful design work that encourages clear, sustainable code. The Software Factory has provided such projects and environment to students at MSU for the past year. This project aimed to explore the effectiveness of such instruction for high school students with limited programming experience. Three students from Bozeman High School were selected to work in a team with two undergraduates with the goal of creating an android application. In the process these students were exposed to Java, sorting algorithms, version control, and software development practices in an industry setting. We will discuss the challenges and rewards of this teaching method and the Software Factory for students so early in their computing education.

Welcome Seminar

Date/Time: Monday, August 24, 2015 from 4:10 p.m. - 5:00 p.m.

Location: EPS 108

Presenter: John Paxton, Computer ScienceMontana State University

Abstract: This seminar will provide new and continuing graduate students with useful information about the Computer Science Department. It will also provide an opportunity for graduate students to meet one another, the CS faculty, and the CS staff.

Learning Spectral Filters for Single- and Multi-Label Classification of Musical Instruments (PhD Defense)

Date/Time: Tuesday, July 28, 2015 from 9:00 a.m. - 10:00 a.m.

Location: EPS 126

Presenter: Patrick Donnelly, Montana State University

Abstract: Musical instrument recognition is an important research task in the area of music information retrieval. While many studies have explored the recognition of individual musical instruments in isolation, the field has only recently begun to explore the more difficult multi-label classification problem of identifying the musical instruments present in polyphonic mixtures. This dissertation presents a novel method for feature extraction in multi-label instrument classification and makes important contributions to the domain of instrument classification and to the general research area of multilabel classification.

In this work, we consider the largest collection of instrument samples to date in the musical instrument classification literature. We examine 13 musical instruments common to four datasets, including the first use of a dataset in this research domain. We consider multiple performers, multiple dynamic levels, and all possible musical pitches within the range of the instruments.

To the area of multi-label classification, we introduce a binary-relevance feature extraction scheme to couple with the common binary-relevance classification paradigm. This approach allows consideration of a unique feature space for each binary classifier, allowing selection of features unique to each class label. We present a data-driven approach to learning areas of spectral prominence for each instrument and use these locations to guide our binary-relevance feature extraction. We use this approach to estimate source separation of our polyphonic mixtures.

We contribute the largest study of single- and multi-label classification in musical instrument literature and demonstrate that our results track with or improve upon the results of comparable approaches in the literature. In our solo instrument classification experiments, we provide the seminal use of Bayesian classifiers in the domain, introduce the grid-augmented topology for na ive Bayes, and demonstrate the utility of conditional dependencies between frequency- and time-based features for the instrument classification problem. For multi-label instrument classification, we explore the question of dataset bias for polyphonic test sets derived from the monophonic training sets in cross-validation study controlled for dataset independence. Additionally, we present the most comprehensive cross-dataset study in the instrument classification literature and demonstrate the generalizability of our approach.

Furthermore, we consider the difficulty of the multi-label problem with regards to label density and cardinality and present experiments with a reduced label set, comparable to many studies in the literature, and demonstrate the efficacy of our system on this easier problem. We provide a comprehensive set of multi-label evaluation measures with the goal of aligning the instrument classification literature with the standard evaluation practices of the general multi-label community.

Bounding Rationality by Computational Complexity

Date/Time: Monday, May 4, 2015 from 4:10 p.m. - 5:00 p.m.

Location: Byker Auditorium, Chemistry and Biochemistry Building

Presenter: Lance Fortnow, Georgia Institute of Technology


Traditional microeconomic theory treats individuals and institutions of completely understanding the consequences of their decisions given the information they have available. These assumptions may not be valid as we might have to solve hard computational problems to optimize our choices. What happens if we restrict the computational power of economic agents?

There has been some work in economics treating computation as a fixed cost or simply considering the size of a program. This talk will explore a new direction bringing the rich tools of computational complexity into economic models, a tricky prospect where even basic concepts like "input size" are not well defined.

We show how to incorporate computational complexity into a number of economic models including game theory, prediction markets, forecast testing, preference revelation and awareness.

This talk will not assume any background in either economics or computational complexity.

End of Year Celebration and Awards Ceremony

Date/Time: Monday, April 20th from 4:10 p.m. - 5:00 p.m.

Location: EPS 108


We will reflect on some of our accomplishments from the 2014-2015 academic year.  Awards will be given to recognize achievement of graduate students (e.g. Outstanding Ph.D. Researcher, Outstanding GTA) and faculty (e.g. Researcher of the Year).  Light snacks and refreshments will be served.

Reactive Game Engine Programming for STEM Outreach

Date/Time: Monday, February 23, 2015 from 4:10 p.m. - 5:00 p.m.

Location: EPS 108

Presenter: Alan Cleary, Montana State University


Science, Technology, Engineering, and Mathematics (STEM) are pervasive in our society. For this reason it is important that we incorporate STEM topics in our education system. The crux of the problem is how to make these topics accessible to younger students in an engaging manner. We present our experiences using a novel programming style, reactive programming, to deliver a summer camp for students in grades 8 through 12. This software uses a declarative programming approach to allow students without a background in computing to explore a wide variety of subject material within a 3D virtual environment, including computer science, mathematics, physics, and art. This work is based on PyFRP, a reactive programming library written in Python. We describe our camp experience and provide examples of how this style of programming supports a wide variety of educational activities.


Predicting Metamorphic Relations for Testing Scientific Software: A Machine Learning Approach Using Graph Kernels

Date/Time: Monday, February 2, 2015 from 4:10 p.m. - 5:00 p.m.

Location: EPS 108

Presenter: Upulee Kanewala, Colorado State University


Comprehensive, automated software testing requires an oracle to check whether the output produced by a test case matches the expected behavior of the program. But the challenges in creating suitable oracles limit the ability to perform automated testing in some programs including scientific software. Metamorphic testing is a method for automating the testing process for programs without test oracles. This technique operates by checking whether the program behaves according to a certain set of properties called metamorphic relations. A metamorphic relation is a relationship between multiple input and output pairs of the program. Unfortunately, finding the appropriate metamorphic relations required for use in metamorphic testing remains a labor intensive task, which is generally performed by a domain expert or a programmer. This talk describes MRpred: an automated technique for predicting metamorphic relations for a given program. MRpred applies a machine learning based approach that uses graph kernels to create predictive models. MRpred achieves a high prediction accuracy, and the predicted metamorphic relations are highly effective in identifying faults in scientific programs.

Topological Data Analysis and Road Network Comparison

Date/Time: Friday, January 30, 2015 from 4:10 p.m. - 5:00 p.m.

Location: Roberts 301

Presenter: Brittany Fasy, Tulane University


Vast amount of data are routinely collected, and analyzing them effectively has become a central challenge we face across science and engineering. Topological data analysis (TDA) is a field that has recently emerged in order to tackle this challenge. This talk will focus on the problem of comparing two road networks (for example, to detect where and by how much a road network has changed over the course of a year). Surprisingly, only recently have distance measures between embedded graphs (representing road networks) been studied. We will see how one of the tools from TDA, namely, persistent homology, can be used to define a local distance measure between two graphs. Persistent homology describes the homology (in particular, the number of connected components and loops) of a data set, at different scales. An example to keep in mind is impressionistic paintings: at one scale, all that is seen are brush strokes; at a larger scale, the brush strokes blur together to form the subject of the painting. The (local) persistent homology distance measure is one of the first theoretically justified approaches to road network comparison. This talk should be accessible to both students and faculty.

How to Use CS to Become a Nuclear Physicist: Applying Computational Geometry to Reactor Physics

Date/Time: Friday, January 30, 2015 from 1:30 p.m. - 2:30 p.m.

Location: CS Conference Room

Presenter: David Millman, Lead Cloud Developer at ProductionPro


Simulating a nuclear reactor is challenging. Often, it involves many computers, working together to solve a very complex differential equation. While many methods exist for solving the equation, the most accurate are Monte Carlo (MC) methods. MC methods use Constructive Solid Geometry (CSG) to model a complex domain with high fidelity. Recent efforts to include feedback effects (e.g., depletion, thermal, xenon, etc.) have forced MC methods to calculate volumes and tight bounding boxes of spatial regions quickly and accurately.

In this talk, I describe a framework for approximating (to a user specified tolerance) volumes and bounding boxes of regions given their equivalent CSG definition. The framework relies on domain decomposition to recursively subdivide regions until they can be computed analytically or approximated. While the framework is general enough to handle any valid region, it was optimized for models used in criticality safety and reactor analysis. For bounding boxes, this is the first algorithm that has strong enough accuracy guarantees yet is fast enough for use within a production level nuclear reactor code. For volume calculations, numerical experiments show that the framework is over 500x faster and two orders of magnitude more accurate than the standard stochastic volume calculation algorithms currently in use.

Virtual Reality and Cybersickness

Date/Time: Monday, January 26, 2015 from 4:10 p.m. - 5:00 p.m.

Location: EPS 108

Presenter: Lisa Rebenitsch, Michigan State University


Often, familiarity with virtual reality is due to movies and shows such as Star Trek, The Matrix, Sword Art Online, and Iron Man's holograms. Real world virtual reality exists beyond flight simulators, and Oculus Rift has increased interest in the field. However, virtual reality implementations differ from television versions. Most rely strictly on sight with some including sounds and, more rarely, touch. Two common paradigms in virtual reality are projection screen systems such as CAVEs and visor display systems such as Oculus Rift. Applications for virtual reality include medical training, military training, museums, collaboration, design, and entertainment.

One safety issue inhibiting use of virtual reality is cybersickness, or the feeling of motion sickness-like symptoms in virtual environments. For example, three-dimensional and shaky camera movies have reports of "movie theater sickness." The likelihood and severity of these symptoms increase in virtual environments. The source of the issue is unclear with research in the field posing over forty potential factors. Factors lie in three categories: individual, hardware, and software. Prior attempts predicting cybersickness are the Cybersickness Dose Value (CSDV) and Kolasinski's linear model. The CSDV correlates well with cybersickness, but only includes software factors. Kolasinski's model explains 34% of the variance and excludes individual factors. Cybersickness is highly individual with a resistant population upwards of 50%. New models using individual characteristics and include the effect of resistance are needed. Statistical and modeling methods called zero-inflated models are examined for better comparison of factors and prediction of cybersickness.

Privacy in Social Computing and Mobile Networking

Date/Time: Friday, January 23, 2015 from 4:10 p.m. - 5:00 p.m.

Location: Roberts 301

Presenter: Na Li, Northwest Missouri State University


With the development of web and wireless technologies and mobile devices, more and more people are conducting their daily activities online. These activities generate large amounts of data including the sensitive information that people are not willing to share with others. Therefore, the disclosure of users’ privacy becomes an intensive concern. This talk will focus on preserving users’ privacy in social media and mobile networks. Specifically, two projects will be introduced. One is to design a privacy-aware friend search engine in Online Social Networks (OSN), and the other one is to preserve users’ relationship privacy in OSN operators sharing data with the third parties. Additionally, this talk will briefly discuss the problems of privacy disclosure in mobile networks.

Classification of Musical Instruments

Date/Time: Wednesday, January 21, 2015 from 4:10 p.m. - 5:00 p.m.

Location: EPS 108

Presenter: Patrick Donnelly, Montana State University


Musical instrument classification is an important task in the area of Music Information Retrieval. While there have been many approaches to recognize individual instruments, the majority of these are not extensible to the more complex case of identifying the musical instruments present in polyphonic mixtures. We present a data-driven clustering technique for learning regions of spectral prominence in an instrument's timbre, exploiting these regions as spectral filters in the feature extraction stage of a binary relevance classification task. We demonstrate the approach over several large datasets consisting of multiple articulations, dynamics, and performers, validating the approach across datasets and with several classifiers. Lastly, we discuss ongoing work in the extension of these spectral filters for source separation estimation in identification of instruments present in polyphonic mixtures.

Seminars from 2014.