Tuesday, May 7, 2013

Local PyPI Options

Having a central package repository has helped the Python community immensely through sharing reusable code. There's a few issues that arise when you start depending on such a resources though, and may need to be solved:
  1. make your installs resilient against Internet/PyPI issues,
  2. speed up your installs significantly (after the first one),
  3. prevent problems installing packages that are removed from distribution by the author,
  4. allow installation of packages from within a firewalled environment where the host performing the installation does not have Internet access, and
  5. allow hosting and installation of private packages.
All while being as little an overhead on the package users as possible (ie. maintenance of a system performing the above should be either low or none).

Searching for "PyPI" on the package repository is somewhat daunting (the page of results seems to go on forever). Having done a bit of a survey of the top hits there seems to be only a few packages that are relevant to the above requirements (presented here in the order that the PyPI search ranks them):
  • Flask-Pypi-Proxy - A semi-proxy that supports private package upload. Its dependencies are quite hefty and it does not mirror packages locally.
  • pyramidpypi - "This is a very simple pypi-like server written with the pyramid web framework." Pyramid is a very hefty dependency for such a simple server and it only supports private package upload.
  • simplepypi - a very simple local repository allowing upload of packages and installation of them.
  • yopypi - is a "load balancer" which punts requests to a mirror automatically when the primary PyPI is unavailable.
  • djangopypi / djangopypi2 - are both PyPI servers acting as local repositories with the same user interface as the real thing. No proxying, though there is a manual tool infi.pypi_manager which may be used to mirror packages to a local djangopypi.
  • inupypi - appears to also be a "load balancer".
  • mypypi - another local PyPI server using Zope 3.
  • pypiserver - serves files out of local directories or redirects to the real server if not found. Handles upload of private packages. No proxying for missing packages, though it does have a facility for updating packages which are already in the local directories.
  • pyshop - another private repository implementation with access controls built in. It also performs caching proxy of packages not present locally. Hefty dependencies (Pyramid but also an SQL database).
  • spynepi - a proxying server with local storage which also handles local upload of private packages! In Twisted. Using "spyne" which is some RPC mechanism and I don't know what it's got to do with PyPI serving. Hefty dependencies.
  • chishop - another simple local repository with upload written in Django.
  • ClueReleaseManager - yet another local repository though with full meta-data support and what appears to be proxying of PyPI meta-data, but not files.
  • pyroxy - a proxying index server which can serve local files (but without local caching of proxied files).
  • scrambled - a very simple server of local files (point it at a directory and run).
  • devpi-server - a transparent caching proxy with local storage of the files accessed. Uses a redis database, which is an additional dependency that is a problem in my deployment scenario.
  • collective.eggproxy - implements caching proxy but has hefty dependencies. Also seems to be very fetch-happy, retrieving eggs I don't actually need.
A lot of the implementations above have a bunch of user controls built into them. And there's an awful lot of "simple PyPI in framework X" implementations. Most of the "proxy" solutions (save pyshop, devpi-server and collective.eggproxy) required manual download of the package files, or they just proxied their requests through to the Internet with no local file storage for speed/resilience. Those others had dependencies that prevented me easily installing them into my target environment.

So none of them fit the bill, and none appeared to be easily modifiable to do what I want. So, I wrote my own: proxypypi :-)

When proxyypi is asked about a package it doesn't know it automatically goes off and fetches the file download list for the package, rewriting all references (PyPI and external) so they appear to be local. On request of one of those now-local package files it performs a background fetch of the file contents and serves up the new file data to the pip request (thus keeping that request alive despite its very short timeout duration).

Sunday, February 26, 2012

A couple of new modules for messing about with objects

blueprint - a neat tool/library that allows data objects to be used as bases for new data objects. And other cool stuff. "Think of it as prototypal inheritance for Python! " I see a lot of potential in video games with procedural content.
objectifier - in a similar (messing about with objects) vein, here we create objects from dictionaries.

Sunday, February 12, 2012

reStructuredText to ...

Convert reStructuredText files to...

rst2blogger - HTML to post on blogger.com blogs. Hmm, thanks Doug :-)
rst2hatena - posts for Hatena's Diary service.
rst2marsedit - HTML that can be used with the MarsEdit blogging tool.
rst2atom - skip the whole "blog" thing and send your rst directly to XML ATOM 1.0 feed readers.
blohg - blog posts (stored in Mercurial.)

rst2beamer - the Beamer LaTeX document class for presentations.
slides - the Beamer LaTeX document class for presentations.
rst2odp - odp files for OpenOffice Impress. Probably nicer than using the GUI ;-)
rst2slides - an HTML5 slideshow.
bruce - an interactive OpenGL slideshow.
StarScream - a DHTML slideshow.
landslide - an HTML slideshow.

handcrank - a static website.
soho - a static website.
flask-rst - a static website.
rest2web - a static website.
cyrax - a static website.

sphinx - your project's documentation. And then some*.

rst2pdf - PDF using ReportLab.
rst2texinfo - texinfo (the official documentation format of the GNU project.)
rst2xamlXAML for WPF and Silverlight / Moonlight.
rstex - a more powerful version of the built-in LaTeX support docutils provides (inline math, equations, references, and raw latex.)
epubmakerEPUB.

restxml - XML using XSLT. Oh, yes.

Sorry if your tool isn't on this list. I tried :-)



* there's a whole other blog post listing Sphinx extensions... later... :-)

Wednesday, February 8, 2012

A couple of useful tools

logging_tree - introspect and display the logger tree inside the standard library's "logging" package. This could be an invaluable tool to discover what's really going on in your application's logging - and in particular perhaps why logging isn't working how you think it should.
hgtools - adds support for Mercurial in setuptools, both for the basics like listing the files under revision control (so find_packages and  include_package_data can do their work without needing explicit listings of files in MANIFEST.in) but also supporting pulling the version number from the repository tag so it doesn't have to be duplicated. The git equivalent appears to be setuptools-git (formerly known as gitlsfiles.)

Sunday, February 5, 2012

A few more random bits

localshop - "really, really alpha" but promising local PyPI mirror / private repository. Yes, another one. This one might just be the one to meet my specific requirements though...

pytagcloud - is one to watch: make tag clouds as PNG images or HTML. Usage is a bit fiddly at the moment and I couldn't replicate the results they got. I think the key is having a good tag (interesting word) extractor. This bit of code might come in handy when experimenting with it:
import re
from roundup.backends.indexer_common import STOPWORDS
import requests, collections, bs4
soup = requests.get('http://www.python.org/about/').text
text = bs4.BeautifulSoup(soup).find('div', id='content-body').get_text()
counts = collections.defaultdict(int)
for word in re.split('\W+', text):
    if word.upper() not in STOPWORDS and len(word)>2:
        counts[word.lower()] += 1
words = sorted((count, word) for word, count in counts.items())
tags = [(word, count) for count, word in words[-30:]]

from pytagcloud import make_tags, create_tag_image
create_tag_image(make_tags(tags), 'cloud.png')
Sadly it doesn't quite work for me. I suspect something might up up with my pygame/platform's TTF support. I also had to add a Font object cache to stop it blowing up on my system (git pull request submitted :-)

slumber - call web RESTful (HTTP) APIs from Python code. Supports JSON, and YAML (with pyyaml installed) and is built on top of the awesome requests. While looking at slumber I picked up this tip for validating and pretty-printing JSON:
$ echo '{"json":"obj"}' | python -m json.tool
{
    "json": "obj"
}

Thursday, February 2, 2012

Another small sampler to finish the week

OMG it's beautifulsoup4 - BeautifulSoup for Python 3! Beware: this release involves API changes, amongst other things.
heightfield is a neat toy that generates 256x256 heightfields using particle deposition.
pager - "page output to the screen, read keys and get console dimensions."

Wednesday, February 1, 2012

Another sampler

I do love a module that has a nice, simple purpose and a direct, to the point name :-)

rfc6266 - parse and generate Content-Disposition headers as per RFC 6266
walkdir  - making it easier to use os.walk() to walk directories with filtering, depth limiting, flattening and handling of symlink loops to boot!
times - "everything sucks about our mechanisms to represent absolute moments in time, but the least worst one of all is UTC." Indeed. In a style similar to the explicit bytes/unicode objects in Python 3 with encodings explicitly dealt with at input and output, this library encourages times to be UTC internally, with timezones only every dealt with at input and output time.