Announcing ipythonblocks.org

Way back…

About a year ago, inspired by Greg Wilson, I wrote ipythonblocks as a fun way for students (and anyone else!) to practice writing Python with immediate, step-by-step, visual feedback about what their code is doing. When I’ve taught using ipythonblocks it has always been a hit—people love making things they can see. And after making things people love to share them.

Sometime last year Tracy Teal suggested I make a site where students could post their work from ipythonblocks, share it, and even grab the work of others to remix. Today I’m happy to announce that that site is live: ipythonblocks.org.

How it works

With the latest release of ipythonblocks students can use post_to_web and from_web methods to interact with ipythonblocks.org. post_to_web can include code cells from the notebook so the creation process can be shared, not just the final result. from_web can pull a grid from ipythonblocks.org for a student to remix locally. See this notebook for a demonstration.

Thank you

There are many people to thank for helping to make ipythonblocks.org possible. Thanks to Tracy Teal for the original idea, thanks to Rackspace and Jesse Noller for providing hosting, and thanks to Kyle Kelley for helping with ops and deployment.  Most of all, thanks to my family for putting up with me working at a startup and taking on projects.

Announcing ipythonblocks.org

Broadcasting IPython Notebooks

A useful feature of the IPython Notebook is that you can set the server to broadcast so that others on your local network can see the server and your notebooks. This is especially nice as a teacher so that students can load your notebooks as you work, copy text out of them, and see them in their entirety instead of just what you have on screen. Here’s the outline of what to do, with detailed instructions below:

  1. Create an IPython profile with a password for the Notebook server.
  2. Figure out your IP address on the local network.
  3. Launch IPython in broadcast + read-only mode using your new profile.
  4. Have your students navigate to your Notebook server.

Continue reading “Broadcasting IPython Notebooks”

Broadcasting IPython Notebooks

You—Yes, You—Should Go to SciPy 2013

The SciPy 2013 conference is coming up on June 24-29 in Austin, Texas and you should go. Here are some good reasons:

You’ll learn something. There are beginning, intermediate, and advanced tutorial tracks this year, so it would be pretty much impossible for you not to learn something at those. I’ll even be there with Katy Huff teaching a tutorial on version control and testing. Even if you can’t make it to the tutorials there will lots of great talks and BOF sessions.

There are domain specific mini-symposia. If your field is represented you can go for a concentrated dose of relevant talks and to meet other Python users in your field. Here are the specific domains this year:

  • Astronomy & astrophysics
  • Bio-informatics
  • GIS – Geospatial Data Analysis
  • Medical imaging
  • Meteorology, climatology, and atmospheric and oceanic science

It’ll be fun! The scientific Python community is chock full of really nice people. Even if you’re new and just learning how to use Python you’ll meet people who are eager to talk and make you feel welcome. (If you find this is not the case, email me or tweet me and I will see if I can help.)

Diversity at SciPy

I’ve been going to SciPy since 2010 and every year the attendees and speakers have been disappointingly white and male. Last year Andy Terrel and I chided the conference organizers about this and it looks like this year the organizers (which include Andy) are actually trying to do something about diversity: there is a Diversity Statement, a Code of Conduct, and pyladies will be there as a community sponsor.

If you’re not sure about SciPy because you’re worried you won’t fit in or won’t be welcome I want to be the first to tell you that you don’t need to worry and that you should come. Everyone who comes to SciPy has agreed to abide by the Code of Conduct and the conference organizers are there to help if you experience any problems. SciPy is a conference for everyone and having a more diverse community is good for all of us.

You—Yes, You—Should Go to SciPy 2013

Data Provenance with GitPython

Data Provenance

When running scientific software it is considered a best practice to automatically record the versions of all the software you use. This practice is sometimes referred to as recording the provenance of the results and helps make your analysis more reproducible. Almost all software libraries will have a version number that you can somehow access from your own software. For example, NumPy’s version number is recorded in the variable numpy.__version__ and most Python packages will having something similar. Python’s version is in the variable sys.version (and, alternatively, sys.version_info).

However, a lot of personal or lab software doesn’t have a version number. The software might change so fast and be modified by so many people that manually incrememented version numbers aren’t very practical. There’s still hope in this situation, though, if the software is under version control. (Your software is under version control, isn’t it?) In Subversion the keyword properties feature is often used to record provenance. There isn’t a compatible feature in Git, but for Python software in Git repositories we can engineer a provenance solution using the GitPython package.

Returning to Previous States with Git

When you make a commit in Git the state of the repository is recorded and given a label based on a hash of the commit data. We can use the commit hash to return to any recorded state of the repository using the “git checkout” command. This means that if you know the commit hash of your software when you created a certain set of results, you can always set your software back to that state to reproduce the same results. Very handy!

Recording the Commit Hash

When you import a Python module, code at the global level of the module is actually executed. This is often used to set global variables within the module, which is what we’ll do here. GitPython lets us interact with Git repos from Python and one thing we can do is query a repo to get the commit hash of the current “HEAD“. (HEAD is a label in Git pointing to the latest commit of whatever state the repository is currently in.)

What we can do with that is make it so that when our software modules are imported they set a global variable containing the commit hash of their HEAD at the time the software was run. That hash can then be inserted into data products as a record of the software version used to create them. Here’s some code that gets and stores the hash of the HEAD of a repo:

from git import Repo
MODULE_HASH = Repo('/path/to/repo/').head.commit.hexsha

If the module we’re importing is actually inside a Git repo we can use a bit of Python magic to get the HEAD hash without manually listing the path to the repo:

import os.path
from git import Repo
repo_dir = os.path.abspath(os.path.dirname(__file__))
MODULE_HASH = Repo(repo_dir).head.commit.hexsha

(__file__ is a global variable Python automatically sets in imported modules.)

Versioned Data

Some data formats, especially those that are text based, can be easily stored in version control. If you can put your data in a Git repo then the same strategy as above can be used to get and store the HEAD commit of the data repo when you run your analysis, allowing you to reproduce both your software and data states during later runs. If your data does not easily fit into Git it’s still a good idea to record a unique identifier for the dataset, but you may need to develop that yourself (such as a simple list of all the data files that were used as inputs).

Data Provenance with GitPython

Install Scientific Python on Mac OS X

These instructions detail how I install the scientific Python stack on my Mac. You can always check the Install Python page for other installation options.

I’m running the latest OS X Mountain Lion (10.8) but I think these instructions should work back to Snow Leopard (10.6). These instructions differ from my previous set primarily in that I now use Homebrew to install NumPy, SciPy, and matplotlib. I do this because Homebrew makes it easier to compile these with non-standard options that work around an issue with SciPy on OS X.

I’ll show how I install Python and the basic scientific Python stack:

If you need other libraries they can most likely be installed via pip and any dependencies can probably be installed via Homebrew.

Command Line Tools

The first order of business is to install the Apple command line tools. These include important things like development headers, gcc, and git. Head over to developer.apple.com/downloads, register for a free account, and download (then install) the latest “Command Line Tools for Xcode” for your version of OS X.

If you’ve already installed Xcode on Lion or Mountain Lion then you can install the command line tools from the preferences. If you’ve installed Xcode on Snow Leopard then you already have the command line tools.

Homebrew

Homebrew is my favorite package manager for OS X. It builds packages from source, intelligently re-uses libraries that are already part of OS X, and encourages best practices like installing Python packages with pip.

To install Homebrew paste the following in a terminal:

ruby -e "$(curl -fsSL https://raw.github.com/mxcl/homebrew/go)"

The brew command and any executables it installs will go in the directory /usr/bin/local so you want to make sure that goes at the front of your system’s PATH. As long as you’re at it, you can also add the directory where Python scripts get installed. Add the following line to your .profile, .bash_profile, or .bashrc file:

export PATH=/usr/local/bin:/usr/local/share/python:$PATH

At this point you should close your terminal and open a new one so that this PATH setting is in effect for the rest of the installation.

Python

Now you can use brew to install Python:

brew install python

Afterwards you should be able to run the commands

which python
which pip

and see

/usr/local/bin/python
/usr/local/bin/pip

for each, respectively. (It’s also possible to install Python 3 using Homebrew: brew install python3.)

NumPy

It is possible to use pip to install NumPy, but I use a Homebrew recipe so I avoid some problems with SciPy. The recipe isn’t included in stock Homebrew though, it requires “tapping” two other sources of Homebrew formula:

brew tap homebrew/science
brew tap samueljohn/python

You can learn more about these at their respective repositories:

With those repos tapped you can almost install NumPy, but first you’ll have
to use pip to install nose:

pip install nose

I compile NumPy against OpenBLAS to avoid a SciPy issue. Compiling OpenBLAS requires gfortran, which you can get via Homebrew:

brew install gfortran
brew install numpy --with-openblas

SciPy

And then you’re ready for SciPy:

brew install scipy --with-openblas

matplotlib

matplotlib generally installs just fine via pip but the custom Homebrew formula takes care of installing optional dependencies too:

brew install matplotlib

IPython

You’ll want Notebook support with IPython and that requires some extra dependencies, including ZeroMQ via brew:

brew install zeromq
pip install jinja2
pip install tornado
pip install pyzmq
pip install ipython

pandas

Pandas should install via pip:

pip install pandas

Testing It Out

The most basic test you can do to make sure everything worked is open up an IPython session and type in the following:

import numpy
import scipy
import matplotlib
import pandas

If there are no errors then you’re ready to get started! Congratulations and enjoy!

Install Scientific Python on Mac OS X

PyCon 2013 Review

PyCon 2013 was my first PyCon and it was, bar none, the best conference I’ve ever been to. And it wasn’t just the free Raspberry Pi or the Wreck-it-Ralph swag from Disney or the fact that I stood next to Guido for a minute during the poster session. No, PyCon is just good people. The Python community is diverse and accepting, and I can’t list all the awesome, kind people I met there.

There were, unfortunately, disappointments, but what other tech conference has a sold-out full-day education summit, or raises $10k for a community group, or raises $6k for cancer research and the John Hunter Memorial fund with a 5k fun run? And PyCon attendees were 20% women! It’s amazing to have been a part of conference where community, generosity, and outreach were put front and center. I tried to do my small part by giving people directions during the tutorials.

Anyway, on to the specifics of what I did:

Tutorials

The first tutorial I went to was called “A beginner’s introduction to Pydata: how to build a minimal recommendation engine”. The intent of the tutorial was to introduce NumPy and pandas. I was hoping to learn some pandas-fu but I found the material poorly organized and didn’t feel like I was getting a good idea of why/when to use particular pandas features. The video for this one doesn’t seem to be up yet.

The second tutorial I went to was called “Bayesian statistics made simple” and this one was awesome! I was comfortable with Bayesian stats beforehand but a refresher never hurts and the instructor (Allen Downey) gave terrific explanations. He had a little Bayesian stats library for us to use in the programmatic examples, which was fun. (Though I had to re-compile NumPy and SciPy to get it to work. It used the one little corner of SciPy that’s often broken on Macs.) If you’re interested in learning more Downey is working on a new book called Think Bayes that you can read for free, Fernando Perez has posted his notebook from the course, and you can watch the video.

Education Summit

The PyCon Education Summit brought together educators from all kinds of backgrounds from K-12 teachers to those teaching adults. I went due to my interest as an instructor for Software Carpentry. Most of the discussion focused on teaching Python/computation in long-form courses to people who have zero programming experience.

I didn’t take much concrete away from the summit, but I was impressed with the sheer level of energy going into the Python/education nexus. There are many people out there experimenting with Python in education and developing lessons that use Python. There are also a lot of user groups around the country (like the Boston Python Workshop) that are actively working to bring new people into the Python world. Many people do this in their spare time! That’s the kind of community devotion I love about Python.

I gave a five minute lightning talk at the summit that was part a preview of my PyCon talk and part showing off ipythonblocks. The Notebooks for that are at nbviewer.ipython.org/5165758.

Talks

The first and most important thing you should know about the talks is that they were all recorded and the videos are online. There were about a million concurrent talk sessions and I’m still catching up on all the great stuff I missed. I highly recommend starting with the opening/closing statements from Jesse Noller and the Raspberry Pi keynote from Eben Upton:

I think there were standing ovations during each of those. And then there were the great regular talks I saw in person:

  • Python: A Toy Language by Dave Beazley
    • Do not miss a chance to see Dave Beazley talk. You will be thoroughly entertained and leave wondering why you do such boring things with your code. Here Beazley talks about using Python to control a hobby CNC mill.
  • How the Internet works by Jessica McKellar
    • Learn about the underlying structure and protocol of the web!
  • Awesome Big Data Algorithms by Titus Brown
    • Titus gives a great introduction to some algorithms and data structures that help deal with Data of Unusual Size. Also check out his blog post on the talk with links to his notebooks.
  • Who’s there? – Home Automation with Arduino/Raspberry Pi by Rupa Dachere
    • Rupa tells us how she built an automated front door camera. This talk was standing room only!
  • What Teachers Really Need from Us by Selena Deckelmann
    • Selena relates her experience getting to know teachers and how we as developers can best help them.

My Talk

I gave a talk titled “Teaching with the IPython Notebook” that focused on how the IPython Notebook can help students learning Python. (Primarily by simplifying their interface to Python.) It seemed to go well and I’m really glad I did it! The video is up and my presentation notebook is at nbviewer.ipython.org/5165431.

Posters

I stopped by Simeon Franklin’s poster about making Python more beginner friendly and I was really impressed with the level of interest surrounding the topic. Even Guido was there seriously engaged in this discussion. With engagement of this magnitude at that level I think we’ll see people putting serious effort into making Python more user friendly right out of the box, which will be wonderful.

Observations

As Wes McKinney noted on Twitter, there were two things everywhere at PyCon this year: the IPython Notebook and Raspberry Pis. It seemed like every other talk and tutorial was using the Notebook and it’s no surprise, the Notebook is so fantastic for presenting code plus supporting material and then sharing it. It’s a major boon to Python.

Everyone at the conference (plus some kids who came for free tutorials) left with a Raspberry Pi. These amazing little computers enable all kinds of projects, often attached to an Arduino for talking to hardware. In Eben Upton’s keynote I learned that the “Pi” in “Raspberry Pi” is for Python since much of the system is built on Python. The site raspberry.io has been set up as a community of projects that use RPis but I’m sure Googling turns up a ton more. A small, cheap, low powered, easy to program computer just has so many possibilities! I haven’t had a chance to start hacking on mine yet but I’m looking forward to it!

Thanks

A big thanks to STScI for sending me. Thanks to Greg Wilson for suggesting the talk idea and thanks to Titus Brown and Ethan White for looking over my proposal.

PyCon 2013 Review

Heading to PyCon

Today I’m flying out to Santa Clara for PyCon! This will be my first one and I’m really excited!

Tutorials

On Wednesday I’m taking a couple of data oriented tutorials: Introduction to Pydata and Bayesian statistics made simple. I already know quite a bit on those topics but I’m looking forward to learning more about pandas and PyTables, and it can’t hurt to brush up on my statistics.

Education Summit

On Thursday I’m taking part in the first ever Python Education Summit. There should be a lot to learn about teaching with Python and I’m curious to see what is done in venues other than Software Carpentry.

Talks

I haven’t yet decided on all of the talks I want to see but Daniel Greenfield’s PyCon guide has given me some good ideas. My own talk on Teaching with the IPython Notebook is on Saturday at 1:55 PM. The talks immediately after mine look interesting; the first is on building scientific applications with Python and the second is about Numba, a tool for speeding up numeric calculations.

Job Fair

There are a ton of companies signed up for the PyCon Job Fair, I’ll definitely be there with my resume!

Say Hi!

I’m looking forward to meeting a lot of people at PyCon! If you would like to meet up at the conference drop me a line on App.net, twitter, Google+, or by email. See you there!

Heading to PyCon

Some git Aliases

These are a few git aliases I’ve made recently to make my life a little easier. The first two are for displaying the log, and the rest for merging. See the Git wiki for more on how to add aliases.

Logs

I pretty much always want the one-line log, so this is the basic version of that:

ls = log --oneline --decorate

--decorate shows any labels on commits, such as branch names. Sometimes I want the above but with the graph view turned on:

lg = log --oneline --decorate --graph

Merging

Merges in git fall into two basic categories: fast-forward merges in which the branch label is simply updated to a new commit (but no new commit is made), and all other merges in which a new commit is made with multiple parents.

By default the merge command will attempt to do a fast-forward merge. If that won’t work it will move to some other “real” merge strategy. I think the difference between fast-forward and the other merges is sufficient that they shouldn’t happen with the same command so I’ve set up aliases to separate them. First the alias for doing a fast-forward. This will fail if a fast-forward is not possible:

ff = merge --ff-only

And then an alias for forcing a non-fast-forward merge even in situations where a fast-forward would be possible:

mrg = merge --no-ff

(This is the sort of merge GitHub does when you merge a pull request.)

Pulling

The pull command is really fetch + merge so it takes most of the same options. When I do a pull I generally only want it to succeed if it’s possible to fast-forward my local branch. If git must do a real merge something is probably wrong, so I have a fast-forward-only pull alias:

ffpull = pull --ff-only
Some git Aliases

Teaching with ipythonblocks at UW

I’ve got a blog post up over on the Software Carpentry blog about trying out ipythonblocks in the classroom for the first time. Summary: it was a hit! The students really got a lot out of being able to immediately see the result of their code. We also did a lot of “what do you think this will do?”, which I think helped get the students thinking a bit more computationally. Some of the more advanced students even struck off on their own making their own designs instead of just sitting there bored.

I’m really looking forward to using ipythonblocks again at my next boot camps in May, and I hope others get some use out of it in the meantime!

Teaching with ipythonblocks at UW

A Styled HTML Document from Markdown

There are many, many command line converters for turning Markdown into HTML, but for the most part these make HTML fragments, not full documents with CSS styling. That’s fine most of the time (e.g. when I’m writing blog posts), but sometimes I want a full, pretty document so I can print it out (typically for presentation notes).

To fill this hole I put together a small script that converts Markdown and wraps the HTML result in a template that includes Bootstrap CSS. I set the fonts to sans-serif and monospace so that they are taken from the defaults for your browser, making it easier to use your favorite fonts.

The script requires the Python libraries Python Markdown, mdx_smartypants (a Python-Markdown extension), and Jinja2.

#!/usr/bin/env python
import argparse
import sys
import jinja2
import markdown
TEMPLATE = """<!DOCTYPE html>
<html>
<head>
<link href="http://netdna.bootstrapcdn.com/twitter-bootstrap/2.3.0/css/bootstrap-combined.min.css&quot; rel="stylesheet">
<style>
body {
font-family: sans-serif;
}
code, pre {
font-family: monospace;
}
h1 code,
h2 code,
h3 code,
h4 code,
h5 code,
h6 code {
font-size: inherit;
}
</style>
</head>
<body>
<div class="container">
{{content}}
</div>
</body>
</html>
"""
def parse_args(args=None):
d = 'Make a complete, styled HTML document from a Markdown file.'
parser = argparse.ArgumentParser(description=d)
parser.add_argument('mdfile', type=argparse.FileType('r'), nargs='?',
default=sys.stdin,
help='File to convert. Defaults to stdin.')
parser.add_argument('-o', '--out', type=argparse.FileType('w'),
default=sys.stdout,
help='Output file name. Defaults to stdout.')
return parser.parse_args(args)
def main(args=None):
args = parse_args(args)
md = args.mdfile.read()
extensions = ['extra', 'smarty']
html = markdown.markdown(md, extensions=extensions, output_format='html5')
doc = jinja2.Template(TEMPLATE).render(content=html)
args.out.write(doc)
if __name__ == '__main__':
sys.exit(main())
view raw markdown_doc hosted with ❤ by GitHub
A Styled HTML Document from Markdown