Portable Git

A few months back I posted an article about setting up a portable Python environment. I think just as important as having a nice environment to work in is having all your stuff properly backed up for when things go wrong!

I use Git to properly keep track of all changes I have made to programming projects, journal articles and conference papers. I am also using it to backup and keep tabs on my final dissertation. This is really useful, not just for when things go wrong but also to go back and look at previous versions of, for example, a paper where some text you later deleted is still there and can be recovered.

Setting up a portable version of Git is super easy. Just download the portable version for your operating system at this website, put it on your USB stick and use it whenever you want. The trickier part is actually creating a git repository and keeping stuff synced.

GUI environments exist for maintaining Git repositories on your computer but I actually think it is just easier to do everything from command line in this case. If you are working on these projects by yourself you can keep things relatively simple and only need to know a few basic commands. First run the ‘git-cmd.exe’ file (if you are running windows) and then use the ‘cd’ command to get to the directory you want to backup.

git init

Run this command in the folder you want to keep track of, it can contain code, pictures, documents or whatever else you want to keep backed up. So you now have an empty git repository on your computer. So as to add all the files in that folder you need to run:

git add .

So all the files are ‘staged’ or ready for commiting. Now type:

git commit -m "message telling me what this commit is"

This command ‘commits’ the ‘staged’ files in the current folder to the current ‘branch’. What this basically means is that whatever you had in the folder when you typed ‘git add .’ is now backed up as a saved point in time that you can retrieve and go back to later.

Typically you will want to save your files on a remote server (in the ‘Cloud’) so they can be retrieved from wherever you are. If you are at a university you should have access to a private GitLab account. GitHub can also be used for free if you are willing to share you work with everyone, otherwise you have to pay a small fee to keep it private. BitBucket is another hosting service that offers free accounts.

Once you have created an online repository for the project you are backing up you need to tell the Git program running on your computer where the online repository is. You should have been given a URL when you created your  online project, something like ‘https://user@gitlab.com/username/Project.git’. Once you have this you can link it up with the repository on your computer by typing:

git remote add origin https://user@gitlab.com/username/Project.git

where you should replace the above URL with your real one.

Once you have this all setup all you need to do is type:

git push -u origin master

That will send all your files over to your online repository and keep them safe. In future, whenever you want to save the current state of your folder just type these commands:

git add .
git commit -m "something about the current save"
git push

That is about all you need to know for 90% of the time. You can do the same thing on another computer and use the following command to retrieve the last saved state:

git pull

That’s it!

One other thing that can happen is you change one or two things that aren’t important on the other computer, you then wish to pull the last save and Git tells you that the two versions of that repository don’t match anymore. If that happens you can force the folder on your computer to match that of the online repository using this command:

git fetch --all
git reset --hard origin/master

That will wipe whatever changes you made on your local computer and replace it with the online version, so be careful!

You can start making things more complicated with branches representing different version histories of the same repository, but if you are just using Git to keep basic track of your work these commands are all you will need.

======== CHEAT SHEET =========
---- Start a new repository ----
git init
git remote add origin <url>

---- Save current state ----
git add .
git commit -m "message about save"

---- Upload repository to online server ----
git push -u origin master   # First time
git push   # Any time afterwards

---- Get repository from online server ----
git pull

---- Overwrite local repository with online version ----
git fetch --all
git reset --hard origin/master


I hope this helps some people get their heads around using Git. It is really useful even for small projects and I don’t think it needs to be super complicated to use.





Machine learning in the water industry

The past few years have seen a huge surge in interest in artificial intelligence (AI). There are a number of factors that have contributed to this. The large increase in data available and computing power that can crunch it have been two big factors. Researchers in machine learning techniques have also been able to combine existing techniques and algorithms into new methods that can utilize these emerging resources.

It is difficult to avoid the articles on Facebook’s and Google’s usage of artificial intelligence. Magazine and newspaper articles are becoming flooded with buzz words and important people talking about the dangers of AI and how Skynet is going to take over the world just as Mr Cameron predicted.

But once we move past the hype and look at what is actually happening we see that the methods being used are the same as or very similar to the methods used in statistics for making predictions from data for over 50 years. These techniques are anything but scary and are actually very important tools for providing us with valuable insights into the large and unwieldy amounts of data we are capable of currently generating.

Obvious existing examples in the water and wastewater industry include prediction of water supply and demand in cities, investigations into potential outbreaks through water supply systems, and environmental impacts of wastewater treatment and disposal. More recently researchers have been focusing on the potential to estimate the effluent quality of wastewater treatment plants using the large amount of data generated by them to train prediction models. The data generated is only going to increase at these plants as new and advanced sensors become more prevalent.

Another interesting aspect is the possibility to learn new water and wastewater treatment strategies from these machine learning algorithms. The classic example is that of TD-Gammon, where the self-learning algorithm was able to eventually beat even the best players and even changed the way people played the game as it introduced new concepts and strategies for winning the game that had not been thought of before. This method combined reinforcement learning and neural nets, where the neural net learns an action strategy given certain state variables and potential rewards.


In this case the environment is the game Backgammon, but Google’s DeepMind have used this same basic concept to recently beat top players in the more difficult (for AI atleast) game of Go. Humans are already learning from the tactics that the machine is using for board games, could we also learn of better ways to treat and distribute clean water from a machine?

The advantage of using old arcade and board games for machine learning is that the environments are easily definable and have strict boundaries and rules. The machine can learn by making hundreds of thousands of errors in simulations before having to take on a real human. Water and wastewater treatment is anything but easily definable! We also can’t let a machine expel millions of liters of untreated wastewater into our rivers and streams for the next hundred years until it learns how to treat the wastewater properly.

The solution is to produce an accurate enough simulation of the treatment plant that a machine can train itself on. However, this is in itself a very difficult problem due to the multitudes of microorganisms that can come into play and the constantly changing composition of wastewaters. Benchmark simulation models exist for testing of control strategies, but even these standardised models require a fair amount of parameter calibration and variable initialisation to obtain a decent representation of the plant to be tested. Perhaps new information becoming available regarding the microorganisms present in these treatment facilities can be used to produce more accurate models with less of the specialised ‘research lab only’ measurements and approximation required for the current generation of models. Maybe the next generation of models will be simulating the entire ecology of the plant right down to the cellular level? Imagine what a machine could learn from that!