Resources for Learning Python

Yesterday I asked my followers on Twitter for their advice on the best resources for people learning programming and Python:

You can see their responses on Twitter and below.

Of those, I think Think Python and How to Think Like a Computer Scientist are especially targetted at people who are brand new to programming in any language.

These are some of the resources I learned from back when I picked up Python, though I should note that I already knew some programming at the time:

Thanks to everyone who responded!

Resources for Learning Python

More Commits via the GitHub API

I wrote a bit ago about making commits via the GitHub API. That post outlined making changes in two simplified situations: making changes to a single file and making updates to two existing files at the root of the repository. Here I show a more general solution that allows arbitrary changes anywhere in the repo.

I want to be able to specify a repo and branch and say "here are the contents of files that have changed or been created and here are the names of files that have been deleted, please take all that and this message and make a new commit for me." Because the GitHub API is so rudimentary when it comes to making commits that will end up being a many-stepped process, but it’s mostly the same steps repeated many times so it’s not a nightmare to code up. At a high level the process goes like this:

  • Get the current repo state from GitHub
    • This is the names and hashes of all the files and directories, but not the actual file contents.
  • Construct a local, malleable representation of the repo
  • Modify the local representation according to the given updates, creations, and deletions
  • Walk though the modified local "repo" and upload new/changed files and directories to GitHub
    • This must be done from the bottom up because a change at the low level means every directory above that level will need to be changed.
  • Make a new commit pointed at the new root tree (I’ll explain trees soon.)
  • Update the working branch to point to the new commit

This blob post is readable as an IPython Notebook at http://nbviewer.ipython.org/gist/jiffyclub/10809459. I’ve also reproduced the notebook below. Continue reading “More Commits via the GitHub API”

More Commits via the GitHub API

Using Conda Environments and the Fish Shell

I recently started over with a fresh development environment and decided to try something new: I’m using Python 3 via miniconda. The first real hiccup I’ve run into is that conda’s environment activation/deactivation scheme only works in bash or zsh. I use fish. There is an open PR to get fish support for conda but in the meantime I hacked something together to help me out.

"Activating" a conda environment does a couple of things:

  • Puts the environment’s "bin" directory at the front of the PATH environment variable.
  • Sets a CONDA_DEFAULT_ENV environment variable that tells conda in which environment to do things when none is specified.
  • Adds the environment name to the prompt ala virtualenv.

Deactivating the environment resets everything to its pre-activation state. The fish functions I put together work like this:

~ > type python
python is /Users/---/miniconda3/bin/python
~ > condactivate env-name
(env-name) ~ > type python
python is /Users/---/miniconda3/envs/env-name/bin/python
(env-name) ~ > deactivate
~ > type python
python is /Users/---/miniconda3/bin/python

Here’s the text of the functions:

function condalist -d 'List conda environments.'
for dir in (ls $HOME/miniconda3/envs)
echo $dir
end
end
function condactivate -d 'Activate a conda environment' -a cenv
if test -z $cenv
echo 'Usage: condactivate <env name>'
return 1
end
# condabin will be the path to the bin directory
# in the specified conda environment
set condabin $HOME/miniconda3/envs/$cenv/bin
# check whether the condabin directory actually exists and
# exit the function with an error status if it does not
if not test -d $condabin
echo 'Environment not found.'
return 1
end
# deactivate an existing conda environment if there is one
if set -q __CONDA_ENV_ACTIVE
deactivate
end
# save the current path
set -xg DEFAULT_PATH $PATH
# put the condabin directory at the front of the PATH
set -xg PATH $condabin $PATH
# this is an undocumented environmental variable that influences
# how conda behaves when you don't specify an environment for it.
# https://github.com/conda/conda/issues/473
set -xg CONDA_DEFAULT_ENV $cenv
# set up the prompt so it has the env name in it
functions -e __original_fish_prompt
functions -c fish_prompt __original_fish_prompt
function fish_prompt
set_color blue
echo -n '('$CONDA_DEFAULT_ENV') '
set_color normal
__original_fish_prompt
end
# flag for whether a conda environment has been set
set -xg __CONDA_ENV_ACTIVE 'true'
end
function deactivate -d 'Deactivate a conda environment'
if set -q __CONDA_ENV_ACTIVE
# set PATH back to its default before activating the conda env
set -xg PATH $DEFAULT_PATH
set -e DEFAULT_PATH
# unset this so that conda behaves according to its default behavior
set -e CONDA_DEFAULT_ENV
# reset to the original prompt
functions -e fish_prompt
functions -c __original_fish_prompt fish_prompt
functions -e __original_fish_prompt
set -e __CONDA_ENV_ACTIVE
end
end
# aliases so condactivate and deactivate can have shorter names
function ca -d 'Activate a conda environment'
condactivate $argv
end
function cda -d 'Deactivate a conda environment'
deactivate $argv
end
# complete conda environment names when activating
complete -c condactivate -xA -a "(condalist)"
complete -c ca -xA -a "(condalist)"
view raw conda.fish hosted with ❤ by GitHub

Or you can download it from https://gist.github.com/jiffyclub/9679788.

To use these, add them to the ~/.config/fish/ directory and source them from the end of the ~/.config/fish/config.fish file:

source $HOME/.config/fish/conda.fish
Using Conda Environments and the Fish Shell

Making Commits via the GitHub API

For fun I’ve been learning a bit about the GitHub API. Using the API it’s possible to do just about everything you can do on GitHub itself, from commenting on PRs to adding commits to a repo. Here I’m going to show how to do add commits to a repo on GitHub. A notebook demonstrating things with code is available here, but you may want to read this post first for the high level view.

Choosing a Client Library

The GitHub API is an HTTP interface so you can talk to it via any tool that speaks HTTP, including things like curl. To make programming with the API simpler there are a number of libraries that allow communicate with GitHub via means native to whatever language you’re using. I’m using Python and I went with the github3.py library based on its Python 3 compatibility, active development, and good documentation.

Making Commits

The repository api is the gateway for doing anything to a repo. In github3.py this is corresponds to the repository module.

Modifying a Single File

The special case of making a commit affecting a single file is much simpler than affecting multiple files. Creating, updating, and deleting a file can be done via a single API call once you have enough information to specify what you want done.

Modifying Multiple Files

Making a commit affecting multiple files requires making multiple API calls and some understanding of Git’s internal data store. That’s because to change multiple files you have to add all the changes to the repo one at a time before making a commit. The process is outlined in full in the API docs about Git data.

I should note that I think deleting multiple files in a single commit requires a slightly different procedure, one I’ll cover in another post.


That’s the overview, look over the notebook for the code! http://nbviewer.ipython.org/gist/jiffyclub/9235955

Making Commits via the GitHub API

Announcing ipythonblocks.org

Way back…

About a year ago, inspired by Greg Wilson, I wrote ipythonblocks as a fun way for students (and anyone else!) to practice writing Python with immediate, step-by-step, visual feedback about what their code is doing. When I’ve taught using ipythonblocks it has always been a hit—people love making things they can see. And after making things people love to share them.

Sometime last year Tracy Teal suggested I make a site where students could post their work from ipythonblocks, share it, and even grab the work of others to remix. Today I’m happy to announce that that site is live: ipythonblocks.org.

How it works

With the latest release of ipythonblocks students can use post_to_web and from_web methods to interact with ipythonblocks.org. post_to_web can include code cells from the notebook so the creation process can be shared, not just the final result. from_web can pull a grid from ipythonblocks.org for a student to remix locally. See this notebook for a demonstration.

Thank you

There are many people to thank for helping to make ipythonblocks.org possible. Thanks to Tracy Teal for the original idea, thanks to Rackspace and Jesse Noller for providing hosting, and thanks to Kyle Kelley for helping with ops and deployment.  Most of all, thanks to my family for putting up with me working at a startup and taking on projects.

Announcing ipythonblocks.org

Data Provenance with GitPython

Data Provenance

When running scientific software it is considered a best practice to automatically record the versions of all the software you use. This practice is sometimes referred to as recording the provenance of the results and helps make your analysis more reproducible. Almost all software libraries will have a version number that you can somehow access from your own software. For example, NumPy’s version number is recorded in the variable numpy.__version__ and most Python packages will having something similar. Python’s version is in the variable sys.version (and, alternatively, sys.version_info).

However, a lot of personal or lab software doesn’t have a version number. The software might change so fast and be modified by so many people that manually incrememented version numbers aren’t very practical. There’s still hope in this situation, though, if the software is under version control. (Your software is under version control, isn’t it?) In Subversion the keyword properties feature is often used to record provenance. There isn’t a compatible feature in Git, but for Python software in Git repositories we can engineer a provenance solution using the GitPython package.

Returning to Previous States with Git

When you make a commit in Git the state of the repository is recorded and given a label based on a hash of the commit data. We can use the commit hash to return to any recorded state of the repository using the “git checkout” command. This means that if you know the commit hash of your software when you created a certain set of results, you can always set your software back to that state to reproduce the same results. Very handy!

Recording the Commit Hash

When you import a Python module, code at the global level of the module is actually executed. This is often used to set global variables within the module, which is what we’ll do here. GitPython lets us interact with Git repos from Python and one thing we can do is query a repo to get the commit hash of the current “HEAD“. (HEAD is a label in Git pointing to the latest commit of whatever state the repository is currently in.)

What we can do with that is make it so that when our software modules are imported they set a global variable containing the commit hash of their HEAD at the time the software was run. That hash can then be inserted into data products as a record of the software version used to create them. Here’s some code that gets and stores the hash of the HEAD of a repo:

from git import Repo
MODULE_HASH = Repo('/path/to/repo/').head.commit.hexsha

If the module we’re importing is actually inside a Git repo we can use a bit of Python magic to get the HEAD hash without manually listing the path to the repo:

import os.path
from git import Repo
repo_dir = os.path.abspath(os.path.dirname(__file__))
MODULE_HASH = Repo(repo_dir).head.commit.hexsha

(__file__ is a global variable Python automatically sets in imported modules.)

Versioned Data

Some data formats, especially those that are text based, can be easily stored in version control. If you can put your data in a Git repo then the same strategy as above can be used to get and store the HEAD commit of the data repo when you run your analysis, allowing you to reproduce both your software and data states during later runs. If your data does not easily fit into Git it’s still a good idea to record a unique identifier for the dataset, but you may need to develop that yourself (such as a simple list of all the data files that were used as inputs).

Data Provenance with GitPython

Install Scientific Python on Mac OS X

These instructions detail how I install the scientific Python stack on my Mac. You can always check the Install Python page for other installation options.

I’m running the latest OS X Mountain Lion (10.8) but I think these instructions should work back to Snow Leopard (10.6). These instructions differ from my previous set primarily in that I now use Homebrew to install NumPy, SciPy, and matplotlib. I do this because Homebrew makes it easier to compile these with non-standard options that work around an issue with SciPy on OS X.

I’ll show how I install Python and the basic scientific Python stack:

If you need other libraries they can most likely be installed via pip and any dependencies can probably be installed via Homebrew.

Command Line Tools

The first order of business is to install the Apple command line tools. These include important things like development headers, gcc, and git. Head over to developer.apple.com/downloads, register for a free account, and download (then install) the latest “Command Line Tools for Xcode” for your version of OS X.

If you’ve already installed Xcode on Lion or Mountain Lion then you can install the command line tools from the preferences. If you’ve installed Xcode on Snow Leopard then you already have the command line tools.

Homebrew

Homebrew is my favorite package manager for OS X. It builds packages from source, intelligently re-uses libraries that are already part of OS X, and encourages best practices like installing Python packages with pip.

To install Homebrew paste the following in a terminal:

ruby -e "$(curl -fsSL https://raw.github.com/mxcl/homebrew/go)"

The brew command and any executables it installs will go in the directory /usr/bin/local so you want to make sure that goes at the front of your system’s PATH. As long as you’re at it, you can also add the directory where Python scripts get installed. Add the following line to your .profile, .bash_profile, or .bashrc file:

export PATH=/usr/local/bin:/usr/local/share/python:$PATH

At this point you should close your terminal and open a new one so that this PATH setting is in effect for the rest of the installation.

Python

Now you can use brew to install Python:

brew install python

Afterwards you should be able to run the commands

which python
which pip

and see

/usr/local/bin/python
/usr/local/bin/pip

for each, respectively. (It’s also possible to install Python 3 using Homebrew: brew install python3.)

NumPy

It is possible to use pip to install NumPy, but I use a Homebrew recipe so I avoid some problems with SciPy. The recipe isn’t included in stock Homebrew though, it requires “tapping” two other sources of Homebrew formula:

brew tap homebrew/science
brew tap samueljohn/python

You can learn more about these at their respective repositories:

With those repos tapped you can almost install NumPy, but first you’ll have
to use pip to install nose:

pip install nose

I compile NumPy against OpenBLAS to avoid a SciPy issue. Compiling OpenBLAS requires gfortran, which you can get via Homebrew:

brew install gfortran
brew install numpy --with-openblas

SciPy

And then you’re ready for SciPy:

brew install scipy --with-openblas

matplotlib

matplotlib generally installs just fine via pip but the custom Homebrew formula takes care of installing optional dependencies too:

brew install matplotlib

IPython

You’ll want Notebook support with IPython and that requires some extra dependencies, including ZeroMQ via brew:

brew install zeromq
pip install jinja2
pip install tornado
pip install pyzmq
pip install ipython

pandas

Pandas should install via pip:

pip install pandas

Testing It Out

The most basic test you can do to make sure everything worked is open up an IPython session and type in the following:

import numpy
import scipy
import matplotlib
import pandas

If there are no errors then you’re ready to get started! Congratulations and enjoy!

Install Scientific Python on Mac OS X

PyCon 2013 Review

PyCon 2013 was my first PyCon and it was, bar none, the best conference I’ve ever been to. And it wasn’t just the free Raspberry Pi or the Wreck-it-Ralph swag from Disney or the fact that I stood next to Guido for a minute during the poster session. No, PyCon is just good people. The Python community is diverse and accepting, and I can’t list all the awesome, kind people I met there.

There were, unfortunately, disappointments, but what other tech conference has a sold-out full-day education summit, or raises $10k for a community group, or raises $6k for cancer research and the John Hunter Memorial fund with a 5k fun run? And PyCon attendees were 20% women! It’s amazing to have been a part of conference where community, generosity, and outreach were put front and center. I tried to do my small part by giving people directions during the tutorials.

Anyway, on to the specifics of what I did:

Tutorials

The first tutorial I went to was called “A beginner’s introduction to Pydata: how to build a minimal recommendation engine”. The intent of the tutorial was to introduce NumPy and pandas. I was hoping to learn some pandas-fu but I found the material poorly organized and didn’t feel like I was getting a good idea of why/when to use particular pandas features. The video for this one doesn’t seem to be up yet.

The second tutorial I went to was called “Bayesian statistics made simple” and this one was awesome! I was comfortable with Bayesian stats beforehand but a refresher never hurts and the instructor (Allen Downey) gave terrific explanations. He had a little Bayesian stats library for us to use in the programmatic examples, which was fun. (Though I had to re-compile NumPy and SciPy to get it to work. It used the one little corner of SciPy that’s often broken on Macs.) If you’re interested in learning more Downey is working on a new book called Think Bayes that you can read for free, Fernando Perez has posted his notebook from the course, and you can watch the video.

Education Summit

The PyCon Education Summit brought together educators from all kinds of backgrounds from K-12 teachers to those teaching adults. I went due to my interest as an instructor for Software Carpentry. Most of the discussion focused on teaching Python/computation in long-form courses to people who have zero programming experience.

I didn’t take much concrete away from the summit, but I was impressed with the sheer level of energy going into the Python/education nexus. There are many people out there experimenting with Python in education and developing lessons that use Python. There are also a lot of user groups around the country (like the Boston Python Workshop) that are actively working to bring new people into the Python world. Many people do this in their spare time! That’s the kind of community devotion I love about Python.

I gave a five minute lightning talk at the summit that was part a preview of my PyCon talk and part showing off ipythonblocks. The Notebooks for that are at nbviewer.ipython.org/5165758.

Talks

The first and most important thing you should know about the talks is that they were all recorded and the videos are online. There were about a million concurrent talk sessions and I’m still catching up on all the great stuff I missed. I highly recommend starting with the opening/closing statements from Jesse Noller and the Raspberry Pi keynote from Eben Upton:

I think there were standing ovations during each of those. And then there were the great regular talks I saw in person:

  • Python: A Toy Language by Dave Beazley
    • Do not miss a chance to see Dave Beazley talk. You will be thoroughly entertained and leave wondering why you do such boring things with your code. Here Beazley talks about using Python to control a hobby CNC mill.
  • How the Internet works by Jessica McKellar
    • Learn about the underlying structure and protocol of the web!
  • Awesome Big Data Algorithms by Titus Brown
    • Titus gives a great introduction to some algorithms and data structures that help deal with Data of Unusual Size. Also check out his blog post on the talk with links to his notebooks.
  • Who’s there? – Home Automation with Arduino/Raspberry Pi by Rupa Dachere
    • Rupa tells us how she built an automated front door camera. This talk was standing room only!
  • What Teachers Really Need from Us by Selena Deckelmann
    • Selena relates her experience getting to know teachers and how we as developers can best help them.

My Talk

I gave a talk titled “Teaching with the IPython Notebook” that focused on how the IPython Notebook can help students learning Python. (Primarily by simplifying their interface to Python.) It seemed to go well and I’m really glad I did it! The video is up and my presentation notebook is at nbviewer.ipython.org/5165431.

Posters

I stopped by Simeon Franklin’s poster about making Python more beginner friendly and I was really impressed with the level of interest surrounding the topic. Even Guido was there seriously engaged in this discussion. With engagement of this magnitude at that level I think we’ll see people putting serious effort into making Python more user friendly right out of the box, which will be wonderful.

Observations

As Wes McKinney noted on Twitter, there were two things everywhere at PyCon this year: the IPython Notebook and Raspberry Pis. It seemed like every other talk and tutorial was using the Notebook and it’s no surprise, the Notebook is so fantastic for presenting code plus supporting material and then sharing it. It’s a major boon to Python.

Everyone at the conference (plus some kids who came for free tutorials) left with a Raspberry Pi. These amazing little computers enable all kinds of projects, often attached to an Arduino for talking to hardware. In Eben Upton’s keynote I learned that the “Pi” in “Raspberry Pi” is for Python since much of the system is built on Python. The site raspberry.io has been set up as a community of projects that use RPis but I’m sure Googling turns up a ton more. A small, cheap, low powered, easy to program computer just has so many possibilities! I haven’t had a chance to start hacking on mine yet but I’m looking forward to it!

Thanks

A big thanks to STScI for sending me. Thanks to Greg Wilson for suggesting the talk idea and thanks to Titus Brown and Ethan White for looking over my proposal.

PyCon 2013 Review

A Styled HTML Document from Markdown

There are many, many command line converters for turning Markdown into HTML, but for the most part these make HTML fragments, not full documents with CSS styling. That’s fine most of the time (e.g. when I’m writing blog posts), but sometimes I want a full, pretty document so I can print it out (typically for presentation notes).

To fill this hole I put together a small script that converts Markdown and wraps the HTML result in a template that includes Bootstrap CSS. I set the fonts to sans-serif and monospace so that they are taken from the defaults for your browser, making it easier to use your favorite fonts.

The script requires the Python libraries Python Markdown, mdx_smartypants (a Python-Markdown extension), and Jinja2.

#!/usr/bin/env python
import argparse
import sys
import jinja2
import markdown
TEMPLATE = """<!DOCTYPE html>
<html>
<head>
<link href="http://netdna.bootstrapcdn.com/twitter-bootstrap/2.3.0/css/bootstrap-combined.min.css&quot; rel="stylesheet">
<style>
body {
font-family: sans-serif;
}
code, pre {
font-family: monospace;
}
h1 code,
h2 code,
h3 code,
h4 code,
h5 code,
h6 code {
font-size: inherit;
}
</style>
</head>
<body>
<div class="container">
{{content}}
</div>
</body>
</html>
"""
def parse_args(args=None):
d = 'Make a complete, styled HTML document from a Markdown file.'
parser = argparse.ArgumentParser(description=d)
parser.add_argument('mdfile', type=argparse.FileType('r'), nargs='?',
default=sys.stdin,
help='File to convert. Defaults to stdin.')
parser.add_argument('-o', '--out', type=argparse.FileType('w'),
default=sys.stdout,
help='Output file name. Defaults to stdout.')
return parser.parse_args(args)
def main(args=None):
args = parse_args(args)
md = args.mdfile.read()
extensions = ['extra', 'smarty']
html = markdown.markdown(md, extensions=extensions, output_format='html5')
doc = jinja2.Template(TEMPLATE).render(content=html)
args.out.write(doc)
if __name__ == '__main__':
sys.exit(main())
view raw markdown_doc hosted with ❤ by GitHub
A Styled HTML Document from Markdown

ipythonblocks – A Visual Tool for Practicing Python

Learning to program and learning the basics of control flow can be tricky business for novices. I wanted to make something that provided immediate, visual feedback to students as they practice things like for loops and if statements so they can see precisely what their code is (or isn’t) doing. So I wrote ipythonblocks.

The IPython Notebook makes it possible to display rich representations of Python objects using HTML (among other things). That allowed me to make a Python object whose representation in the Notebook is a colored table. Students can index into the table to change the color properties of individual table cells and then immediately display their changes.

With ipythonblocks instructors can give coding problems like ‘turn every block in the third column red’ or ‘turn every blue block green’ and by displaying their blocks students can see right away whether their code is having the desired effect.

Check out the demo notebook to see ipythonblocks in action.

ipythonblocks – A Visual Tool for Practicing Python