This was my first EuroPython conference and I had high expectations because I heard a lot of good things about it. I must say that overall it didn’t let me down. I learned several new things and met a lot of new people. So lets dive straight into the most important lessons.
On Tuesday I attended “Effective Python for High-Performance Parallel Computing” training session by Michael McKerns. This was by far my favorite training session and I have learned a lot from it. Before Michael started with code examples and code analysis he emphasized two things:
- Do not assume what you hear/read/think. Time it and measure it.
- Stupid code is fast! Intelligent code is slow!
At this point I knew that the session is going to be amazing. He gave us a github link (https://github.com/mmckerns/tuthpc) where all examples with profiler results were located. He stressed out that we shouldn’t believe him and that we should test them ourselves (lesson #1).
I strongly suggest to clone his github repo (https://github.com/mmckerns/tuthpc) and test those examples yourself. Here are my quick notes (TL; DR):
- always compile regular expressions
- use local variables (
true = True
,local = GLOBAL
) - if you know how many elements it will be in your list, create it with None elements and then fill it (
L = [None] * N
) - when inserting item on 0 index in a list use append then reverse (O(n) vs O(1))
- use built-in functions, use built-in functions, use built-in functions!!! (they are written in C layer)
- when extending list use
.extend()
and not+
- searching in set (hash map) is a lot faster then searching in list (O(1) vs O(n))
- constructing set is much slower then list so you usually don’t want to transform list into set and then search in it because it will be slower. But again you should test it
+=
doesn’t create new instance of an object so use this in loops- list comprehension is better than generator. for loop is better then generator and sometimes also than list comprehension (you should test it!)
- importing is expensive (e.g. numpy is 0.1 sec)
- switching between python arrays and numpy arrays is very expensive
- if you start writing intelligente and complex code you should stop and rethink if there is more stupid way of achieving your goal (see lesson #2)
- optimize the code you want to run in parallel. This is more important than to just run it in parallel.
Threading and multiprocessing:
- you should always run analysis if/when threading/multiprocessing is faster. If you are using simple functions it will probably be slower
- in parallel computing you need to catch and log errors
- in parallel computing you always want your functions to return value
- in parallel computing you never want your code to “die”. Always try to return reasonable default value even if an exception is raised. Slightly wrong is better than not getting an answer!
- when using threading/multiprocessing use
.map()
and if you don’t care about the order use.imap_unordered()
. It is the fastest because it returns the first available value. - if you have stop condition use
.imap_unordered()
- be aware of random module problems. Random seed gets copied to all processes. Result is “random doesn’t work”. You need to create random_seed function and ensure that you are in different random state.
- is there any general rule when to use threads and when multiprocessing? Use threads if you have light jobs (i.e. they execute in 0-1 sec)
Another interesting talk was about code review (Another pair of eyes: Reviewing code well by Adam Dangoor). He pointed out that one of the most important things with the process of reviewing the code is to share knowledge. When you review others code you learn a lot especially if you take your time and try to really understand what he/she was trying to achieve. It is also recommended to always say something nice about the code especially when reviewing the code of junior developer. And when you think that the code you are reviewing has a bug, write a test that proves it.
EuroPython 2016 was really an amazing experience that every Python developer/scientist should experience. I’m really looking forward to EuroPython 2017!