University of Michigan School of Information
MADS team designs a new data visualization to help save Great Lakes
Thursday, 10/14/2021
Applying data science methodology to the issue of invasive and introduced species in the Great Lakes, three UMSI Master of Applied Data Science (MADS) grads have designed a new data visualization to display species impacting or being impacted by nonindigenous species.
This data visualization, termed the “ripple plot,” will aid the National Oceanic and Atmospheric Administration (NOAA) in their mission to research ecosystem dynamics. The MADS team of Toby Kemp, Ermias Bizuwork and Robert Bowman also substantially increased the usable information in NOAA's Great Lakes Aquatic Nonindigenous Species Information System (GLANSIS). In addition to guiding new research, their innovations will help support the ecosystems of Great Lakes waterways for future generations.
A local, high-stakes information challenge for capstone
The current understanding of many freshwater aquatic species’ links to other components of the ecosystems, and the benefits they provide to society, is limited. In many instances, scientists have virtually no data, according to NOAA.
At the Great Lakes Environmental Research Lab (GLERL), the GLANSIS database is a "one-stop shop" for information about nonindigenous species in the Great Lakes, says GLANSIS research associate El Lower. The database is staffed primarily by Michigan Sea Grant researchers.
"The GLANSIS team collects and synthesizes data about 180+ introduced and invasive species from peer-reviewed literature and verified reports from experts throughout the entire Great Lakes basin," Lower says. "Updates to the database are then published in technical memoranda on an annual basis."
In lecturer Elle O’Brien’s MADS capstone course, one project team partnered with GLERL to analyze and visualize the lab’s records on invasive species.
Toby, a Michigan-based data scientist for Tokyo’s Shiga International Patent Office, sensed an information challenge after contacting GLERL to see what the team could help with.
The GLANSIS team wanted someone to go through large technical reports written before they built the impact component of the database and parse the data into a tabular format so they could upload it.
“The challenge we set for ourselves was to organize their information and give it to them in a way that’s easy to understand,” Toby says.
Unlocking research
Toby met Robert, a professional commodities trader, in an introductory MADS course. The two connected over Slack and paired with Michigan’s Department of Natural Resources for a class project. Later they linked with fellow MADS student Ermias, an Atlanta, Georgia-based product manager for The Home Depot, over shared interests in data science and the great outdoors.
The team used an algorithm to process 3,300 pages of technical reports and created 3,100 structured research findings that will be added to GLANSIS, increasing the available entries by over 750%.
Toby, Ermias and Robert overcame data challenges to process more than six years of reports in two months.
“The first challenge is scraping the technical memoranda because they are not machine-generated,” Toby says. “There are inconsistencies in the way the data is presented, which makes things much more complicated.”
With a need to extract and identify impacted species in GLERL’s database, the team specifically ran into problems with ontology, or the classification of existing things into different orders.
“In biology, there’s five different levels present in the naming conventions of the species — family, phylum, et cetera, in addition to the common names that are used in literature,” Toby says. “Trying to combine all of these together expanded the food network from 1,600 relationships to 2 million relationships, which is a little bit more data than we wanted to deal with.”
The team turned to MADS lecturer Winston Featherly-Bean, who offered this counsel: Consult with domain experts and heed their advice.
“After we discussed it among the group and talked with NOAA staff, we learned that when they get down to very small species like protozoa, they don’t normally talk about the individual species name,” Toby says. “But when you’re talking about a larger fish, for example, lake trout, they do tend to use just the one single species. By conforming to that ontology, we simplified our network.”
Mapping the ripple effect
The GLANSIS team also needed help with the way the database was organized.
“You can look up an invasive species and what native species it impacts. But it’s not currently possible to do the opposite lookup,” Toby says. “So you can’t look up a native species or a current species in the lake and see what species impact it. We decided to set up this reverse lookup, and of course we wanted to use our MADS skills to take it a step further.”
The team used a PostgreSQL database and tools like React, JSX and Javascript to create an interactive river plot showing the existence and potential spread of a selected invasive species across Great Lakes waterways.
“Mapping visualizations are notoriously difficult to display on web pages without some sort of engineering twist that you put on it, so we worked through those challenges,” Ermias says.
He says that throughout his MADS experience, the importance of good data visualizations has been one of his greatest takeaways.
“You have to visualize the data in a way that people are going to understand, that's actually useful,” Ermias says. “That's one of the things we've targeted here, trying to make things as easy to understand for as broad an audience as possible.”
That target motivated the team to create their ripple plot, a network visualization of concentric circles that maps the effects of invasive species on native species and vice versa. This means that users can look up an invasive species in the Great Lakes, like killer shrimp, and see its effect on native organisms two levels higher on the same food web, like lake trout. Users can also look up a species native to the Great Lakes, like protozoans, and see what invasive species impact it. According to GLERL, this kind of information has never been available before.
A guide for future biological research
Toby says the biggest impact of their project is that it enables researchers to see the information they have — and what information they don’t.
“Our research is a bit too fundamental right now to be able to say anything specific in itself, but hopefully it can be used by GLERL to guide and focus new research,” Toby says.
The team’s innovations show promise for areas outside the Great Lakes region as well.
“We’re focused on Michigan now, but I think some of these tools and ideas can be expanded to other wildlife organizations,” Ermias says. “We’re really excited to see if some of this work that we’re doing has an impact not only in Michigan, but across the country.”
Toby, Ermias and Rob, who were part of the MADS program’s first graduating class in August 2021, are helping preserve Great Lakes ecosystems for future generations.
Robert says, “As sportsmen, conservationists and data scientists we seek to maintain the beauty and vitality of Great Lakes waterways for our children and grandchildren.”
View Toby, Ermias and Robert’s final project here.
Learn more about UMSI’s Master of Applied Data Science program and how to apply.
Related: MADS capstone team works toward more nuanced predictive tools for healthcare