Log in

No account? Create an account
entries friends calendar profile Elf Sternberg's Pendorwright Projects Previous Previous Next Next
The Semantics of Python Import, part 4: Iterators - Elf M. Sternberg
The Semantics of Python Import, part 4: Iterators

Module Iterators, as defined in pkgutil.py, aren’t really part of the mess that has been imposed on us by PEP-302 and its follow-on attempts to rationalize the loading process, but they’re used by so many different libraries that when we talk about creating a new general class of importers, we have to talk about iterators.

Iterators, after all, are why I started down this project in the first place. It was Django’s inability to find heterogeneously defined modules that I set out to fix.

Iterators are define in the pgkutil module; their entire purpose is, given some kind of reference to an archive, to be able to list the contents of that archive, and to recursively descend into that archive if it happens to be a tree-like structure.

When you call pkgutil.iter_modules(path, prefix), you get back a list of all the modules within that path or, if no path is supplied, all the paths in sys.path. As I pointed out in my last post, the paths is sys.path aren’t necessarily paths on the filesystem or, if they are, they’re not necessarily directory paths. All that matters is that for each path, a path_hook exists that can return a Finder, and that Finder has a method for listing the contents of the path found.

In Python 2, pkgutil depends upon Finders (those things we said were attached to meta_path and path_hooks) to have a special function called iter_modules; if it does, that function is used to list the contents of the “path”.

In Python 3, the functools.singledispatch tools is used to differentiate between different Finders; once a Finder has been identified by path_hooks, the singledispath us used to find a corresponding resource iterator for that Finder. It doesn’t necessarily have to be a method on the Finder, although the default has a classmethod that is its finder.

An iterator is pretty straightforward; once you know the “path” (resource identifier) and the Finder for that path, you can call a function that checks for the presence of modules. In the case of FileFinder, that function is a combination of listdir, isfile, and isdir/isfile' to check fordir/init` pairs indicating a submodule.

For our purposes, of course, we had to provide a path_hook that eclipses the existing path_hook, and we had a provide a Finder that was more precisely ours than the inherited base FileFinder, so that single dispatch would find ours before it found FileFinder‘s and still work correctly.

There is one other module I have to worry about: modulefinder. It’s not used often, it’s not used by Django or any of the other major tools that I usually use, and it’s never been covered by Python Module of the Week. That doesn’t mean that it’s hard-coding of the ‘.py’ suffix isn’t problematic. I’m just not sure what to do about it at this point.

Leave a comment