Announcing ipythonblocks.org

Way back…

About a year ago, inspired by Greg Wilson, I wrote ipythonblocks as a fun way for students (and anyone else!) to practice writing Python with immediate, step-by-step, visual feedback about what their code is doing. When I’ve taught using ipythonblocks it has always been a hit—people love making things they can see. And after making things people love to share them.

Sometime last year Tracy Teal suggested I make a site where students could post their work from ipythonblocks, share it, and even grab the work of others to remix. Today I’m happy to announce that that site is live: ipythonblocks.org.

How it works

With the latest release of ipythonblocks students can use post_to_web and from_web methods to interact with ipythonblocks.org. post_to_web can include code cells from the notebook so the creation process can be shared, not just the final result. from_web can pull a grid from ipythonblocks.org for a student to remix locally. See this notebook for a demonstration.

Thank you

There are many people to thank for helping to make ipythonblocks.org possible. Thanks to Tracy Teal for the original idea, thanks to Rackspace and Jesse Noller for providing hosting, and thanks to Kyle Kelley for helping with ops and deployment.  Most of all, thanks to my family for putting up with me working at a startup and taking on projects.

Broadcasting IPython Notebooks

A useful feature of the IPython Notebook is that you can set the server to broadcast so that others on your local network can see the server and your notebooks. This is especially nice as a teacher so that students can load your notebooks as you work, copy text out of them, and see them in their entirety instead of just what you have on screen. Here’s the outline of what to do, with detailed instructions below:

  1. Create an IPython profile with a password for the Notebook server.
  2. Figure out your IP address on the local network.
  3. Launch IPython in broadcast + read-only mode using your new profile.
  4. Have your students navigate to your Notebook server.

Read More »

Data Provenance with GitPython

Data Provenance

When running scientific software it is considered a best practice to automatically record the versions of all the software you use. This practice is sometimes referred to as recording the provenance of the results and helps make your analysis more reproducible. Almost all software libraries will have a version number that you can somehow access from your own software. For example, NumPy’s version number is recorded in the variable numpy.__version__ and most Python packages will having something similar. Python’s version is in the variable sys.version (and, alternatively, sys.version_info).

However, a lot of personal or lab software doesn’t have a version number. The software might change so fast and be modified by so many people that manually incrememented version numbers aren’t very practical. There’s still hope in this situation, though, if the software is under version control. (Your software is under version control, isn’t it?) In Subversion the keyword properties feature is often used to record provenance. There isn’t a compatible feature in Git, but for Python software in Git repositories we can engineer a provenance solution using the GitPython package.

Returning to Previous States with Git

When you make a commit in Git the state of the repository is recorded and given a label based on a hash of the commit data. We can use the commit hash to return to any recorded state of the repository using the “git checkout” command. This means that if you know the commit hash of your software when you created a certain set of results, you can always set your software back to that state to reproduce the same results. Very handy!

Recording the Commit Hash

When you import a Python module, code at the global level of the module is actually executed. This is often used to set global variables within the module, which is what we’ll do here. GitPython lets us interact with Git repos from Python and one thing we can do is query a repo to get the commit hash of the current “HEAD“. (HEAD is a label in Git pointing to the latest commit of whatever state the repository is currently in.)

What we can do with that is make it so that when our software modules are imported they set a global variable containing the commit hash of their HEAD at the time the software was run. That hash can then be inserted into data products as a record of the software version used to create them. Here’s some code that gets and stores the hash of the HEAD of a repo:

from git import Repo
MODULE_HASH = Repo('/path/to/repo/').head.commit.hexsha

If the module we’re importing is actually inside a Git repo we can use a bit of Python magic to get the HEAD hash without manually listing the path to the repo:

import os.path
from git import Repo
repo_dir = os.path.abspath(os.path.dirname(__file__))
MODULE_HASH = Repo(repo_dir).head.commit.hexsha

(__file__ is a global variable Python automatically sets in imported modules.)

Versioned Data

Some data formats, especially those that are text based, can be easily stored in version control. If you can put your data in a Git repo then the same strategy as above can be used to get and store the HEAD commit of the data repo when you run your analysis, allowing you to reproduce both your software and data states during later runs. If your data does not easily fit into Git it’s still a good idea to record a unique identifier for the dataset, but you may need to develop that yourself (such as a simple list of all the data files that were used as inputs).

Teaching with ipythonblocks at UW

I’ve got a blog post up over on the Software Carpentry blog about trying out ipythonblocks in the classroom for the first time. Summary: it was a hit! The students really got a lot out of being able to immediately see the result of their code. We also did a lot of “what do you think this will do?”, which I think helped get the students thinking a bit more computationally. Some of the more advanced students even struck off on their own making their own designs instead of just sitting there bored.

I’m really looking forward to using ipythonblocks again at my next boot camps in May, and I hope others get some use out of it in the meantime!

Teaching with the IPython Notebook

For a few months now I’ve been using the IPython Notebook as my primary teaching tool for Python topics. Within Software Carpentry we’re also switching over to using the Notebook for both in-person bootcamps and our online repository of material. Ethan White and I put together a post on this topic on the Software Carpentry blog and now Titus Brown has blogged with his own thoughts. We’ve put in a PyCon proposal for a panel on this topic in 2013.

The IPython developers have to be given a huge amount of credit for putting together the Notebook and the rest of IPython. The Notebook especially is quite a feat: a top notch research/engineering/teaching tool all in one. And they aren’t resting on their laurels, they have a ton of ideas in mind for the Notebook in the future, including a slide-show mode. I’m definitely looking forward to seeing what they’ve got!

As with many open source projects, the IPython developers struggle to find the time and funding to write their software. If any open source project is helping with your job or your research you can easily help by citing the software in your papers and in public on social media or blogs. This gives the developers more ammunition the next time they’re writing grants, so please make your support known!