Decode

This week I was in a job interview and one of my interview questions turned into a pretty interesting puzzle. I was asked the following question:

Assume a character encoding a = 1, b = 2, c = 3, … , z = 26. Given a string of numbers (e.g. '123456'), figure out how many different ways that string could be decoded.

I didn’t arrive a complete solution in the few minutes I had during the interview but I kept thinking about it afterwards because I couldn’t shake the feeling there was really a math problem here, not just a programming problem.

The difficult part of this challenge is figuring out how many decodings there are when there are overlapping possible groupings of numbers. For example, the string '111' can be decoded as ['1', '1', '1']['11', '1'], or ['1', '11']. I never figured out a formula for calculating the number of decodings but by manually figuring out the number of decodings for the strings '1''11''111''1111''11111''111111', etc. I noticed that the number of decodings follow the Fibonacci sequence starting with 1, 2, 3, 5, 8…! Unfortunately I don’t have a good explanation for why that is the case. (Though it makes gut level sense in the way the Fibonacci sequence is a sum of everything that has come before.) Please leave a comment if you can explain this!

So, my strategy for solving this problem follows these basic steps:

  1. Break the given string down into substrings such that each substring is one digit, '10' or '20', or cannot be broken down further because it has an unknown number of possible decodings. These substrings can be considered independently for the purposes of this problem.
    • For example, '956' becomes ['9', '5', '6'] and '121956' becomes ['1219', '5', '6'].
  2. For each substring figure out the number of decodings. For single digits, '10', and '20' this is just 1. For other substrings this is the Nth value in the Fibonacci series 1, 2, 3, 5, 8… where N is the length of the substring.
  3. Multiply together all of the number of decodings for the individual substrings to get the number of of decodings for the original string of numbers.

You can see my code for this solution here: http://nbviewer.ipython.org/3831343/ (raw notebook).

Decode

SnakeViz 0.1 – A Python Profile Viewer

When profiling code it can be helpful to have a visualization of the profile to really show what areas of your program are taking the most time. The great RunSnakeRun application has been filling this roll for a while now but some of my colleagues have been turned off by the need to install wxPython. The web browsers we use every day are quickly becoming great visualization platforms so I thought it would be a nice project to make a Python profile viewer that works in the browser. Today, co-developer Erik and I are happy to announce the first official release of SnakeViz.

Example SnakeVis visualization.
An example SnakeViz visualization.

Continue reading “SnakeViz 0.1 – A Python Profile Viewer”

SnakeViz 0.1 – A Python Profile Viewer

What’s In My Stack?

This is a response to a post on the Software Carpentry blog: What’s In Your Stack?

My development setup is pretty simple:

  • Sublime Text 2 with a few plugins
  • git, GitHub, and bitbucket
  • Subversion and Trac (only at work)
  • IPython notebook
  • NumPy, SciPy, and matplotlib (mostly at work)
  • Chrome for debugging web stuff
  • bash and the usual shell utilities

It should go without saying that I’m using all of this to work on Python. And a little  JavaScript comes in for web apps.

What’s in your stack?

What’s In My Stack?

Function Timeout in Python

In some code I’m working on I want to put a 10 second timeout on a function call. If the function doesn’t return in 10 seconds I just want to terminate the function and return None. Here’s the code I came up with using multiprocessing.Pool:

import multiprocessing as mp

pool = mp.Pool(1, maxtasksperchild=1)
result = pool.apply_async(function, (arg1,))
pool.close()

try:
    s = result.get(10)
except mp.TimeoutError:
    pool.terminate()
    return None

This seems to work but I’m curious if there’s a standard Python pattern for this sort of thing?

Function Timeout in Python

IPython Notebook Viewer

The nice folks over at IPython have put together a nice viewer for IPython notebooks stored as GitHub gists: nbviewer.ipython.org/. Just enter a gist URL or number. It doesn’t actually store notebooks, just renders notebooks stored as gists.

I’ve had some notebooks as gists for a while and now I can make pretty links to them:

IPython Notebook Viewer

Upgrading to Mountain Lion

I just upgraded to Mountain Lion and here are some of my notes related to scientific Python installs.

If you’ve already got a working installation it will likely continue to work after upgrading. Regardless of whether you plan to keep using an existing install or make a new one you’ll probably want to do the following two things:

Xcode

You’ll need a Mountain Lion compatible set of command line tools and development libraries so reinstall Xcode or the Command Line Tools for Xcode. I personally prefer the later now that I use Sublime Text 2 for code editing. If you install Xcode make sure to install the command line tools from the preferences.

X11

ML doesn’t come with X11 installed so install XQuartz from http://xquartz.macosforge.org/ if you anticipate needing X11.

Python Libraries

  • NumPy compiles without issue.
  • SciPy is waiting on a small change before it will compile on Mountain Lion. I expect that to go in soon and they may release SciPy 0.10.2 to make it Mountain Lion compatible. Even if they don’t you’ll be able to install from the GitHub repository once the change is made.
  • matplotlib must be installed from the GitHub repository at this time.
Upgrading to Mountain Lion

brewer2mpl – colorbrewer2 and Python

A friend recently tweeted: “Who knows of a python code that intelligently picks colors based on the number needed for a plot?” Phil is plotting several datasets on one plot and wanted an easy way to get a good set of colors to use. I mentioned colorbrewer2.org, which has terrific color maps, but they are locked up in a Flash application. So I had the idea to make a Python package bringing the colorbrewer2.org color maps into Python and connecting them to matplotlib.

There is an existing Python package called colorbrewer but it’s undocumented, doesn’t have its code in a public repository, and is missing some features I wanted. (And it was updated for the first time in 3 years on the day I wrote this post.) It also turns out that the colorbrewer2.org color maps are already in matplotlib (all the capitalized maps in this example) but that doesn’t give access the original colors as defined by colorbrewer2.org. And so was born brewer2mpl.

colorbrewer2.org has three types of color maps: sequential, diverging, and qualitative (for categorical data). Each color map is defined several times with different numbers of colors, anywhere from three to 12.

brewer2mpl packages the colorbrewer2.org color maps and gives access to them as 0-255 RGB triplets, 0-1 RGB triplets, hex strings, and matplotlib color maps. It contains functions for listing color maps by type and number of defined colors. Now Phil can do the following to get some good colors to use in matplotlib plots:

import brewer2mpl
bmap = brewer2mpl.get_map('Set1', 'qualitative', 5)
colors = bmap.mpl_colors

For more information:

This was my first time building and releasing my own package. It was fun and a great chance to learn about the simplest aspects of releasing Python packages. (Nothing fancy going on here.) Hope I get the chance to do more!

brewer2mpl – colorbrewer2 and Python

SciPy 2012

Last week was the SciPy 2012 conference in Austin, TX. I’ve been for the last three years and it’s a good event for the tutorials on scientific Python packages, to see what crazy fields people are using Python in (brain-robotics interaction!), to hack on your favorite project, and to meet the maintainers of some of the packages you use every day. I recommend it especially to people who are new to scientific Python as you’ll learn a lot your first year. All of the tutorials and talks from SciPy are recorded and should be up on the web soon. You can also check out the #scipy2012 stream on Twitter.

Tutorials

This year I went to tutorials on scikits-learn, HDF5 with PyTables, pandas, and IPython:

scikits-learn

scikits-learn is the package for machine learning in Python. I haven’t had occasion to use it but if I ever need to do ML scikits-learn will be the first tool I reach for. The documentation is excellent and the interfaces seem simple and logical. Jake VanderPlas’ excellent tutorial is online at http://astroml.github.com/sklearn_tutorial/.

HDF5 with Pytables

HDF5 is a data format that can handle pretty much any kind of data, and when paired with PyTables it seems to be especially useful for efficiently working with very large datasets. One nice feature of HDF is that you can put many different arbitrary datasets in one file and attach separate metadata to every node of the file.

pandas

pandas has been skyrocketing in popularity lately and for good reason. If you’re working with time series or tabular data pandas is probably the tool you should be using. pandas has extremely slick indexing and slicing, handling of missing data, and intelligent data alignment. The DataFrame object supports SQL-like merging and joining. Time handling is really amazing. pandas supports time zones and daylight saving time. It has really rich support for different periods and frequencies. If you’re working with time series or tabular data you should really give pandas a try.

Ipython

I’ve written about IPython before. The notebook is the big star these days and they demoed some really cool features, including making interactive plots with JavaScript. All of their tutorial notebooks are available on GitHub and I recommend you check them out if you’re curious about all the crazy stuff the notebook can do.

Talks

All of the talks are listed on the SciPy website. Many of the talks tend to be very domain specific but the keynotes are always interesting. This year John  Hunter talked about the growth of matplotlib and Joshua Bloom talked about how he has used Python throughout his astronomy career, from machine learning to remotely controlling telescopes.

The theme of SciPy 2012 seemed to be high powered computing. This has been the case in years past, but while in previous years the focus has been on the cloud or GPUs, this year seemed to be about serial optimization using fancy compilers, especially LLVM. Numba and Julia seemed to be stars of the conference in this respect. (Julia isn’t actually related to Python, but the language syntax does look a lot like Python. People seemed especially excited about the prospect of calling Julia from Python much as you can call C from Python. Anything to not have to work in C…)

One thing that caught my eye during the lightning talks was pipe-o-matic, a Python tool for making data pipelines. It’s still coming together but I could see it being useful. Pipelines are defined in simple text files and the executables and their versions are fully specified. This means you can archive/version control your pipelines along with your data to track which versions of which programs were used to make the data. pipe-o-matic supposedly has full support for resuming or restarting pipelines after errors. It looks simple and lightweight.

Astronomy Mini-Symposium

This year there were topical mini-symposia on geophysics, astronomy, meteorology, and bioinformatics. The astronomy talks included AstroPy, astroML for machine learning in astronomy, and the yt visualization toolkit for astronomical simulations. yt makes some seriously beautiful visualizations.

It was great to have the topical symposium and see what others are doing with Python in astronomy, I hope these sessions keep showing up at SciPy and other places. Maybe AAS should have some Python sessions?

Sprints

This was my first year participating in the sprints, which are basically hack days where project developers will organize groups of programmers to tackle certain issues in their projects. There were groups working on scikits-learn, scikits-image, AstroPy, matplotlib, IPython and more. I ended up working on my own project, brewer2mpl, which I will discuss in another blog post.

Other Thoughts

Each time I’ve been to SciPy over the last three years I’ve been struck by how few women are there (~2% of attendees). Maybe I shouldn’t be shocked to see so few women at a conference on scientific programming, but it’s really pathetic, and it would be nice to see the organizers make an effort to recruit a more diverse group of attendees. I think a significant fraction of the women at SciPy 2012 were astronomers, keep up the good work astronomy!

SciPy 2012

Profiling Python

Profiling a program is a way to take a detailed look at the execution time of individual pieces of the program. When you want to optimize code it’s important to use a profiler first so that you know where your optimization effort would be most beneficial.

I recently demonstrated profiling Python code using the built-in tools, line_profiler, and IPython’s “magic” hooks into those tools. RunSnakeRun also made an appearance.

As usual I used the IPython notebook and you can get the .ipynb file and the PDF, or see it directly.

More info on using the built-in profiler and pstats modules is available via the Python Module of the Week: http://www.doughellmann.com/PyMOTW/profile/index.html.

Profiling Python