I recently spent a day working on the performance of a Python function and learned a bit about Pandas and NumPy array indexing. The function is iterative, looping over data and updating some row weights until it meets convergence criteria. I tried to do as much processing as I could before the loops, but some indexing (and of course arithmetic) had to stay inside the loops.
When I looked at profiles of the function almost all of the time was being spent doing indexing on Pandas Series objects. A quick investigation shows that indexing Series objects is quite slow compared to NumPy arrays. First, some setup: Continue reading “Performance of Pandas Series vs NumPy Arrays”