Machine learning in the water industry

The past few years have seen a huge surge in interest in artificial intelligence (AI). There are a number of factors that have contributed to this. The large increase in data available and computing power that can crunch it have been two big factors. Researchers in machine learning techniques have also been able to combine existing techniques and algorithms into new methods that can utilize these emerging resources.

It is difficult to avoid the articles on Facebook’s and Google’s usage of artificial intelligence. Magazine and newspaper articles are becoming flooded with buzz words and important people talking about the dangers of AI and how Skynet is going to take over the world just as Mr Cameron predicted.

But once we move past the hype and look at what is actually happening we see that the methods being used are the same as or very similar to the methods used in statistics for making predictions from data for over 50 years. These techniques are anything but scary and are actually very important tools for providing us with valuable insights into the large and unwieldy amounts of data we are capable of currently generating.

Obvious existing examples in the water and wastewater industry include prediction of water supply and demand in cities, investigations into potential outbreaks through water supply systems, and environmental impacts of wastewater treatment and disposal. More recently researchers have been focusing on the potential to estimate the effluent quality of wastewater treatment plants using the large amount of data generated by them to train prediction models. The data generated is only going to increase at these plants as new and advanced sensors become more prevalent.

Another interesting aspect is the possibility to learn new water and wastewater treatment strategies from these machine learning algorithms. The classic example is that of TD-Gammon, where the self-learning algorithm was able to eventually beat even the best players and even changed the way people played the game as it introduced new concepts and strategies for winning the game that had not been thought of before. This method combined reinforcement learning and neural nets, where the neural net learns an action strategy given certain state variables and potential rewards.


In this case the environment is the game Backgammon, but Google’s DeepMind have used this same basic concept to recently beat top players in the more difficult (for AI atleast) game of Go. Humans are already learning from the tactics that the machine is using for board games, could we also learn of better ways to treat and distribute clean water from a machine?

The advantage of using old arcade and board games for machine learning is that the environments are easily definable and have strict boundaries and rules. The machine can learn by making hundreds of thousands of errors in simulations before having to take on a real human. Water and wastewater treatment is anything but easily definable! We also can’t let a machine expel millions of liters of untreated wastewater into our rivers and streams for the next hundred years until it learns how to treat the wastewater properly.

The solution is to produce an accurate enough simulation of the treatment plant that a machine can train itself on. However, this is in itself a very difficult problem due to the multitudes of microorganisms that can come into play and the constantly changing composition of wastewaters. Benchmark simulation models exist for testing of control strategies, but even these standardised models require a fair amount of parameter calibration and variable initialisation to obtain a decent representation of the plant to be tested. Perhaps new information becoming available regarding the microorganisms present in these treatment facilities can be used to produce more accurate models with less of the specialised ‘research lab only’ measurements and approximation required for the current generation of models. Maybe the next generation of models will be simulating the entire ecology of the plant right down to the cellular level? Imagine what a machine could learn from that!