ipythonblocks.org Move: Part 4 – Application Updates

This is Part 4 in a series of blog posts describing my move of ipythonblogs.org from Rackspace to Heroku. In this post I’ll describe the updates I’ve made to the application layer of ipythonblocks.org. Other posts are:

  • Part 1: Introduction and Architecture
  • Part 2: Data Migration
  • Part 3: Database Interface Updates
  • Part 4: Application Updates

The application logic is not really changed in this update, the bulk of changes are to support providing SQLAlchemy sessions to allow database access during requests. (See Part 3 for discussion of the database interface layer of ipythonblocks.org.)

Application Overview

ipythonblocks.org is powered by Tornado, which combines an application framework, web server, and asynchronous features. On Heroku the application is started with the command

python -m app.app --port=$PORT --db_url=$DATABASE_URL

$PORT and $DATABASE_URL are environment variables provided by Heroku to specify, respectively, which port to listen on and where to find the Postgres database attached to the app. These are parsed from the command line by Tornado’s options module and made available globally on the tornado.options.options variable. Continue reading “ipythonblocks.org Move: Part 4 – Application Updates”

ipythonblocks.org Move: Part 4 – Application Updates

ipythonblocks.org Move: Part 3 — Database Interface

This is Part 3 in a series of blog posts describing my move of ipythonblogs.org from Rackspace to Heroku. In this post I’ll describe the updates I’ve made to the database interface module of ipythonblocks.org. Other posts are:

  • Part 1: Introduction and Architecture
  • Part 2: Data Migration
  • Part 3: Database Interface Updates
  • Part 4: Application Updates

The big change to the database interface module was the switch from dataset to SQLAlchemy for database abstraction. This involves using the ORM models described in Part 2, removing the JSON de/serialization functions needed to use SQLite, removing use of memcached, and updating tests to use a Postgres database to match production. The full diff is here, but I’ll breakdown the important points below. Continue reading “ipythonblocks.org Move: Part 3 — Database Interface”

ipythonblocks.org Move: Part 3 — Database Interface

ipythonblocks.org Move: Part 2 – Data Migration

This is Part 2 in a series of blog posts describing my move of ipythonblogs.org from Rackspace to Heroku. In Part 1 of this series I described my motivation for the move and the broad changes I expect to make as part of the migration. In this post I’ll describe the grid data model and how I migrated the existing grid data from SQLite to Postgres. Other posts are:

  • Part 1: Introduction and Architecture
  • Part 2: Data Migration
  • Part 3: Database Interface Updates
  • Part 4: Application Updates

Continue reading “ipythonblocks.org Move: Part 2 – Data Migration”

ipythonblocks.org Move: Part 2 – Data Migration

ipythonblocks.org Move: Part 1

This is Part 1 in a series of blog posts describing my move of ipythonblogs.org from Rackspace to Heroku. In this first post I’ll describe the existing deployment and what I intend to change during the migration. Other posts are:

  • Part 1: Introduction and Architecture
  • Part 2: Data Migration
  • Part 3: Database Interface
  • Part 4: Application Updates

As a side project I maintain a Python library called ipythonblocks that displays colored grids in a Jupyter Notebook. (See also this intro blog post.) It can be useful for teaching or for a bit of fun art. I also maintain the website ipythonblocks.org that allows users to post their ipythonblocks grids on the internet to be shared. ipythonblocks.org has been hosted on Rackspace since I first launched it, but now I’m migrating the site to Heroku for easier maintenance and deployment. Continue reading “ipythonblocks.org Move: Part 1”

ipythonblocks.org Move: Part 1

Palettable 3.0 Released

I’m happy to announce the release of Palettable version 3.0. Palettable is a Python library that packages a variety of color palettes for use with matplotlib or really anywhere. Here’s the full diff since the last release.

This release includes a number of new palettes:

The new cmocean, matplotlib, and MyCarta palettes are created from data that contains 256 color points per palette. By default palettes are created with lengths 3-20 colors, but you can request longer ones via the get_map function. For example, to get the matplotlib Viridis palette with 200 color points:

In [2]: palettable.matplotlib.get_map('Viridis_200')

You can find Palettable on the web at:

P.S.: Here’s a little demo notebook.

Palettable 3.0 Released

Python 3 Universally

Frustration

A Python user is starting a project and thinks to themselves, “Yay, new code! I can use Python 3 for this!”. They install the latest Anaconda for Py3 and get to work. A few days and hundreds of lines of code later they find out that a particular library they need (maybe imposm.parser) only supports Python 2. Our well intentioned user sighs, re-installs Anaconda for Py2, and carries on. Maybe next time (or maybe not). (This is a semi-autobiographical story.)

Elsewhere, a Python library maintainer is excited about Py3’s new asyncio module and could put it to immediate use but doesn’t want to alienate users who are stuck on Py2.


There are many valid reasons to be using Py2 today: a dated dependency, the inertia of existing code, not wanting to break a working setup, not knowing how/why to switch, and lack of time.

There are also many valid reasons for wanting to develop exclusively for Py3: access to new features, reduced support burden, simplified maintenance, wanting to get ahead of the 2020 end-of-support for Py2, and lack of time. These tensions have the potential to create much frustration in the Python community, but I think with some intentional effort on the part of Python developers and leaders it will all be fine.  Continue reading “Python 3 Universally”

Python 3 Universally

C Extensions for Python 2 and 3

One of my upcoming tasks at work is converting Pandana to support both Python 2 and 3. The tricky bit is that Pandana has a C extension written in plain C using the Python 2 C-API, which is not compatible with Python 3.

It seems like the best way to have a C extension that supports both Python 2 and 3 is to not write the extension in C. These days there are a number of alternatives that allow you to write interfaces in Python or something like Python (Cython). I decided to make a sample project with some C functions to wrap so that I could try out CFFI, Cython, and the standard library ctypes module.

You can find the project with examples of all three and a longer writeup at https://github.com/jiffyclub/cext23. Pull requests are welcome on the repo with further examples!

C Extensions for Python 2 and 3

Introducing Palettable

I wrote brewer2mpl a couple years ago to help people use colorbrewer2 color palettes in Python. Since then it’s expanded to include palettes from Tableau and the whimsical Wes Anderson Palettes Tumblr; and there’s plenty of room for more palettes from other sources. To encompass the growing scope, brewer2mpl has been renamed to Palettable! (Thanks to Paul Ivanov for the name.)

The Palettable API has also been updated for the IPython age. All available palettes are now loaded at import and are available for your tab-completion pleasure. Need the YlGnBu palette with nine colors? That’s now available at palettable.colorbrewer.sequential.YlGnBu_9. Reversed palettes are also available with a _r suffix.

I hope you find Palettable useful! You can find it on the web at:

P.S.: Here’s a little demo notebook.

Introducing Palettable

State of Conda, Oct. 2014

I have been using Conda (via Miniconda) for managing my Python development environments and packages for close to a year now, so I thought I’d write up my thoughts so far for others.

Conda is both an environment manager (an alternative to virtualenv) and an installation tool (an alternative to pip). You can also use Conda to build your packages and distribute them via Binstar.

So, what does Conda do well and what needs improvement? Continue reading “State of Conda, Oct. 2014”

State of Conda, Oct. 2014

Testing With NumPy and Pandas

Testing Python results is often as straightforward as assert result == expected, especially with builtin types. But that doesn’t work with NumPy or Pandas data structures because using == with those doesn’t return True or False. Instead, == results in new arrays filled with boolean values. This is useful for boolean indexing, but leads to this error when testing:

In [2]: a = np.arange(10)

In [3]: b = np.arange(10)

In [4]: assert a == b
------------------------------------------------
ValueError     Traceback (most recent call last)
<ipython-input-4-6bf76ad3480a> in <module>()
----> 1 assert a == b

ValueError: The truth value of an array with more than one element is ambiguous.
            Use a.any() or a.all()

You can check whether all the elements in two arrays are equal using the .all() method:

In [5]: (a == b).all()
Out[5]: True

But that errs if the arrays are different sizes/shapes, and the result is an uninformative True or False when they are the same size. Luckily, NumPy has this situation covered.

Library Versions

For reference, these are the versions of NumPy and Pandas I’m currently using:

In [43]: np.version.version
Out[43]: '1.9.0'

In [44]: pd.version.version
Out[44]: '0.14.1'

Testing with NumPy

NumPy has an entire module devoted to testing support. I like to import it via import numpy.testing as npt in my tests. I’ll be focusing here on two functions, assert_array_equal and assert_allclose.

assert_array_equal

assert_array_equal raises an AssertionError when to arrays are not exactly equal. It can take anything array-like as inputs, including lists.

In [10]: npt.assert_array_equal([1, 2, 3], [1, 2, 3])

In [11]: npt.assert_array_equal([1, 2, 3], [1, 2, 3, 4, 5])
----------------------------------------------------
AssertionError     Traceback (most recent call last)
<truncated>

AssertionError:
Arrays are not equal

(shapes (3,), (5,) mismatch)
 x: array([1, 2, 3])
 y: array([1, 2, 3, 4, 5])

In [12]: npt.assert_array_equal([1, 2, 3], [99, 2, 3])
----------------------------------------------------
AssertionError     Traceback (most recent call last)
<truncated>

AssertionError:
Arrays are not equal

(mismatch 33.33333333333333%)
 x: array([1, 2, 3])
 y: array([99,  2,  3])

The examples show how you get somewhat descriptive output when the comparisons fail, including if the shapes are mismatched and what percentage of elements differ between the two arrays.

Similar functionality is available in the array_equal function, which returns True or False instead of raising an exception.

assert_allclose

assert_array_equal checks for exact equality. That’s fine for integer and boolean values, but often fails with floating point values because of very slight differences in the results of values calculated different ways or on different computers. For comparing floating point values I use assert_allclose.

In [17]: npt.assert_array_equal([np.pi], [np.sqrt(np.pi) ** 2])
-------------------------------------------------------
AssertionError        Traceback (most recent call last)
<truncated>

AssertionError:
Arrays are not equal

(mismatch 100.0%)
 x: array([ 3.141593])
 y: array([ 3.141593])

In [18]: npt.assert_allclose([np.pi], [np.sqrt(np.pi) ** 2])

assert_allclose takes atol and rtol arguments for specifying the absolute and relative tolerance of the comparison. For the most part I leave these at their defaults: atol=0 and rtol=1e-07. That’s a small enough tolerance that I’m confident the numbers are quite close, but large enough to let floating point noise go through. Sometimes, though, it’s useful to choose custom tolerances. For example, I was once writing tests based on numbers I copied out of a paper. The numbers were provided to four decimal places so in my tests I used npt.assert_allclose(result, expected, atol=0.0001). Choosing appropriate tolerances for testing with assert_allclose can be tricky depending how accurate you expect your code to be. Unfortunately, I don’t have any great advise on that.

assert_allclose also has a non-assertion version: allclose.

Notes

One very handy thing about assert_array_equal (and its scalar friendly cousin assert_equal) is that it handles values like nan intelligently. Normally nan compared to anything else, even nan, results in False. That’s the official, expected behavior, but it does make testing harder. assert_array_equal handles this for you.

In [29]: (np.array([np.nan, 2, 3]) == np.array([np.nan, 2, 3])).all()
Out[29]: False

In [30]: npt.assert_array_equal([np.nan, 2, 3], [np.nan, 2, 3])

Note that array_equal and equal behave in the official manner and will always return False for comparisons to nan.

Testing with Pandas

Pandas also has a testing module, but it is apparently meant more for internal testing of Pandas itself than for Pandas users. There is no documentation page for it, but it’s still available and I use it in testing. I import it via import pandas.util.testing as pdt.

The three main things I use are assert_frame_equal, assert_series_equal, and assert_index_equal. assert_frame_equal and assert_series_equal take arguments that let you control whether the comparisons are exact or approximate, and whether to compare types in addition to value equality. By default they use an allclose-like comparison.

In [39]: s1 = pd.Series([1, 2, 3], dtype='int')

In [40]: s2 = pd.Series([1, 2, 3], dtype='float')

In [41]: pdt.assert_series_equal(s1, s2)
-------------------------------------------------------
AssertionError        Traceback (most recent call last)
<truncated>

AssertionError: attr is not equal [dtype]: dtype('int64') != dtype('float64')

In [42]: pdt.assert_series_equal(s1, s2, check_dtype=False)

assert_frame_equal is sensitive to the order of columns and rows in the tables. I’ve found this is not always what I want, sometimes it’s fine if ordering changes as long as the same column names and index labels are in both tables. I’ve made my own assert_frames_equal function for testing that case.


Just because you’re using complex data containers like arrays and DataFrames in your code doesn’t mean you can’t test it. NumPy and Pandas are themselves heavily tested and you can test your own code using the same utilities the NumPy and Pandas developers use. Happy testing!

Testing With NumPy and Pandas