Resources for Learning Python

Yesterday I asked my followers on Twitter for their advice on the best resources for people learning programming and Python:

You can see their responses on Twitter and below.

Of those, I think Think Python and How to Think Like a Computer Scientist are especially targetted at people who are brand new to programming in any language.

These are some of the resources I learned from back when I picked up Python, though I should note that I already knew some programming at the time:

Thanks to everyone who responded!

More Commits via the GitHub API

I wrote a bit ago about making commits via the GitHub API. That post outlined making changes in two simplified situations: making changes to a single file and making updates to two existing files at the root of the repository. Here I show a more general solution that allows arbitrary changes anywhere in the repo.

I want to be able to specify a repo and branch and say "here are the contents of files that have changed or been created and here are the names of files that have been deleted, please take all that and this message and make a new commit for me." Because the GitHub API is so rudimentary when it comes to making commits that will end up being a many-stepped process, but it’s mostly the same steps repeated many times so it’s not a nightmare to code up. At a high level the process goes like this:

  • Get the current repo state from GitHub
    • This is the names and hashes of all the files and directories, but not the actual file contents.
  • Construct a local, malleable representation of the repo
  • Modify the local representation according to the given updates, creations, and deletions
  • Walk though the modified local "repo" and upload new/changed files and directories to GitHub
    • This must be done from the bottom up because a change at the low level means every directory above that level will need to be changed.
  • Make a new commit pointed at the new root tree (I’ll explain trees soon.)
  • Update the working branch to point to the new commit

This blob post is readable as an IPython Notebook at http://nbviewer.ipython.org/gist/jiffyclub/10809459. I’ve also reproduced the notebook below. (more…)

Docker via Homebrew

Docker is a great tool for getting lightweight, isolated Linux environments. It uses technology that doesn’t work natively on Macs. Until now you’ve had to boot into a VM to install and use Docker, but it’s now a little easier than that.

As of Docker 0.8 it can be run on Macs thanks to a specially developed, lightweight VirtualBox VM. There are official instructions for installing Docker on Mac, but with Homebrew and cask it’s even easier.

Follow the instructions on the cask homepage to install it. Cask is an extension to Homebrew for installing Mac binary packages via the command line. Think things like Chrome or Steam. Or VirtualBox. Running Docker on Mac requires VirtualBox so if you don’t have it already:

brew cask install virtualbox

Then install Docker and the helper tool boot2docker:

brew install docker
brew install boot2docker

boot2docker takes care of the VM that Docker runs in. To get things started it needs to download the Docker VM and start a daemon that the docker command line tool will talk to:

boot2docker init
boot2docker up

The docker command line tool should now be able to talk to the daemon and if you run docker version you should see a report for both a server and a client. (Note: When I ran boot2docker up it told me that the default port the daemon uses was already taken. I had to specify a different port via the DOCKER_HOST environment variable, which I now set in my shell configuration.)

If everything has gone well to this point you should now be able to start up a Docker instance. This command will drop you into a bash shell in Ubuntu:

docker run -i -t ubuntu /bin/bash

Use ctrl-D to exit. I find this especially helpful for very quickly getting to a Linux command line from my Mac for testing this or that, like checking what versions of software are installing by apt-get.

Visit the Docker documentation to learn more about what you can do with Docker and how to do it.

Using Conda Environments and the Fish Shell

I recently started over with a fresh development environment and decided to try something new: I’m using Python 3 via miniconda. The first real hiccup I’ve run into is that conda’s environment activation/deactivation scheme only works in bash or zsh. I use fish. There is an open PR to get fish support for conda but in the meantime I hacked something together to help me out.

"Activating" a conda environment does a couple of things:

  • Puts the environment’s "bin" directory at the front of the PATH environment variable.
  • Sets a CONDA_DEFAULT_ENV environment variable that tells conda in which environment to do things when none is specified.
  • Adds the environment name to the prompt ala virtualenv.

Deactivating the environment resets everything to its pre-activation state. The fish functions I put together work like this:

~ > type python
python is /Users/---/miniconda3/bin/python
~ > condactivate env-name
(env-name) ~ > type python
python is /Users/---/miniconda3/envs/env-name/bin/python
(env-name) ~ > deactivate
~ > type python
python is /Users/---/miniconda3/bin/python

Here’s the text of the functions:


Or you can download it from https://gist.github.com/jiffyclub/9679788.

To use these, add them to the ~/.config/fish/ directory and source them from the end of the ~/.config/fish/config.fish file:

source $HOME/.config/fish/conda.fish

Making Commits via the GitHub API

For fun I’ve been learning a bit about the GitHub API. Using the API it’s possible to do just about everything you can do on GitHub itself, from commenting on PRs to adding commits to a repo. Here I’m going to show how to do add commits to a repo on GitHub. A notebook demonstrating things with code is available here, but you may want to read this post first for the high level view.

Choosing a Client Library

The GitHub API is an HTTP interface so you can talk to it via any tool that speaks HTTP, including things like curl. To make programming with the API simpler there are a number of libraries that allow communicate with GitHub via means native to whatever language you’re using. I’m using Python and I went with the github3.py library based on its Python 3 compatibility, active development, and good documentation.

Making Commits

The repository api is the gateway for doing anything to a repo. In github3.py this is corresponds to the repository module.

Modifying a Single File

The special case of making a commit affecting a single file is much simpler than affecting multiple files. Creating, updating, and deleting a file can be done via a single API call once you have enough information to specify what you want done.

Modifying Multiple Files

Making a commit affecting multiple files requires making multiple API calls and some understanding of Git’s internal data store. That’s because to change multiple files you have to add all the changes to the repo one at a time before making a commit. The process is outlined in full in the API docs about Git data.

I should note that I think deleting multiple files in a single commit requires a slightly different procedure, one I’ll cover in another post.


That’s the overview, look over the notebook for the code! http://nbviewer.ipython.org/gist/jiffyclub/9235955

Writing WordPress Posts in Markdown

Pen and Pants is hosted by WordPress, but I write my blog posts in my favorite text editor using Markdown. That way I have all the conveniences those afford and I can archive the posts in plain text on GitHub.

The tricky part is going from the .md files to some text I can paste into the input box in WordPress. I learned today that you can write posts in Markdown, but that still doesn’t work perfectly for me because WordPress treats new lines within blocks as hard breaks. (When writing posts I break all lines before 80 characters for more convenient editing and diffing. Keeping all those breaks literal doesn’t translate well to web pages.)

Today, thanks to Ethan White, I figured out that Pandoc can help. By converting my Markdown to Markdown with the --no-wrap flag Pandoc will output paragraphs on a single line but otherwise give me regular Markdown. The command I use looks like this:

pandoc -f markdown -t markdown --no-wrap blog-post.md

I can take the output of that and past it into WordPress’ text input box (after ticking the box to allow Markdown when writing posts).

Note that if you use fenced codeblocks (as on GitHub) WordPress will convert that into its special source code widget. If instead you want something presented using only <pre><code> tags then use indentation to indicate it is pre-formatted text.

Tips for Mac Users

If you use Homebrew you can install Pandoc via the cask add on:

brew cask install pandoc

To copy the output of pandoc straight to the clipboard you can use the pbcopy command:

pandoc -f markdown -t markdown --no-wrap blog-post.md | pbcopy

The Libraries of ipythonblocks.org

In this post I’ll describe the libraries used by ipythonblocks.org to turn requests into web pages and JSON to send back to users. In some future posts I’ll describe how it’s actually put on the internet. If you’re curious about the code you can see it on GitHub.

Back End

The back end consists of GET and POST REST endpoints for ipythonblocks to talk to and handlers for the site itself: main and about pages, a random grid redirect, and the individual grid views. In all there are about six handlers for all of ipythonblocks.org.

Framework

ipythonblocks.org is such a simple site that any lightweight framework could probably handle it. I went with Tornado mainly because I’ve used it before and I like the way applications are designed using Tornado. That it includes a template engine and a high performance web server are also pluses. If I’d not used Tornado, Flask and Jinja2 would have been my second choice.

Database

Choosing a database was something of an agonizing decision. You can choose from SQL, NoSQL, and key-value stores; and within each of those you have many more choices. I like the simplicity of working with schema-less databases like MongoDB, and I was very intrigued by RethinkDB, but in the interest of having a simple setup that allowed me to focus on developing app logic I ended up using sqlite. I use the dataset library to take care of some of the SQL overhead (like table creation) so that I can combine the simplicity of sqlite with a more NoSQL-like interface.

At some point I may want to move to another database, especially one running on a dedicated machine so that swapping the application server can be done without worrying about the database. When I get to that point I’ll probably take another look at RethinkDB and see if it’s ready for my application.

To avoid database lookups of recently visited pages I’m using memcached and talking to it from Python via the pylibmc library.

Logging

Python’s built in logging can certainly get the job done, but its interface has some rough edges I don’t like. Configuration can be painful for sophisticated cases and any kind of structured logging requires custom formatting. I think Twiggy is a much more “Pythonic” approach to logging with simpler configuration and built in structured logging. ipythonblocks.org was my first time using Twiggy and I’d use it again. (Though it is unfortunately not Python 3 compatible at this time.)

Other

Requests to the POST endpoint are validated using jsonschema. This provides protection for the app against incorrectly configured requests and can be used as a kind of documentation on what requests should look like.

I use the hashids library to turn the integer SQL IDs of grid entries into short strings, as in http://ipythonblocks.org/zcezcM. This is a URL form people are familiar with and it allows the implementation of “secret” grid posts that have public URLs but are difficult to find unless someone gives you the URL.

Users of ipythonblocks can include code with their posted grids and I use Pygments to highlight the syntax of the code and format it for HTML. Pygments is decent enough to escape HTML included in the posted code so I don’t have to worry about that breaking the page rendering. The color scheme used is Base16 Chalk Light via https://github.com/idleberg/base16-pygments.

Finally, I use ipythonblocks itself to turn grid data into rendered HTML via the same methods used by the IPython Notebook.

Front End

The back end renders and delivers static HTML to browsers (or JSON to ipythonblocks) so there isn’t much fancy going on in the front end. I use CSS media queries to adjust the site margins for small screens, and on the front page I use Pure CSS grids to make a responsive three-column layout that collapses to a single column on small screens.

ipythonblocks.org uses the Source family of fonts from Adobe delivered by Google Fonts.