Archive for February, 2009

Section 92A: It's Not Over

Sunday, February 22nd, 2009

The blackout ends tomorrow, and the new Copyright law comes into effect next Saturday. Section 92A of the amended Copyright Act 1994 will probably remain, and it is what it is: it’s not a violation of rights, nor is it, in itself, a piece of legislation that “presumes guilt until proof of innocence.”

It’s a terrible clause that is both too prescriptive and too vague. It starts to say something but then it doesn’t say enough. I joined the blackout, not because I think the law is wrongheaded: it’s not. I’ve joined because the amended Act is poor legislation and the faults in section 92A should either be hammered out and the law made more specific, or the section should be completely removed. In its current form it’s a mess and it relies on ISPs to clean it up. But if those ISPs don’t get it right, copyright holders could test them in court and ruin them. It’s a court case that ISPs can’t avoid without being very, very careful: they need to keep copyright holders happy.

Rhetoric about innocence until proof of guilt may not be all that relevant. Copyright infringement is usually a civil offence, not a criminal one. It can become criminal if you engage in piracy: making copies and selling them, or setting up an enterprise that sells products based on the creative ideas you don’t hold under copyright. But a private citizen who makes copies for personal use, or downloads a TV show from a torrent will not be tried as a criminal under current law or under any changes in this Act. They are, however, violating terms of use, both those tacitly agreed when (and if) the product was purchased and those applied implicitly by the 1994 Copyright Act. That’s a civil case between a defendant and a plaintiff. At the risk of being didactic, a civil case between a private enterprise and an individual isn’t covered under New Zealand’s 1990 Bill of Rights Act. Our Bill of Rights, which guarantees things like a fair hearing and presumption of innocence, is designed to protect you in the case of a criminal accusation.

If a Copyright holder accused you of a crime such as piracy, now or under this new law, your day in court would be guaranteed and you would get your trial and all the rights that go along with it. The burden of proof is on the Crown and it has to prove beyond a reasonable doubt that you’re guilty of the crime you are accused of.

In a civil case, as in the case of an alleged copyright infringement, things are different. The burden of proof is on the plaintiff but they only have to prove that you are more likely to have infringed than not. You both argue your case before a judge and the judge decides for either the defendant or the plaintiff. If they win, you would normally pay damages, commensurate with the magnitude of the infringement. If you win, you might get court costs and some compensation. For argument’s sake, and as others more eloquent than I have pointed out, something like the court-ordered termination of an internet connection would be very, very unusual.

Despite the standard of proof being much lower, civil cases against copyright infringers have proven hard for copyright holders to win. Not only that, they’re costly, nasty exercises that result in redresses that are way out of step with the actual losses due to infringements because they have to be tackled one at a time and in depth; they cause huge public relation debacles; and they’re not actually achieving what a successful civil case should be achieving: making the result for a citizen so punitive that others will be deterred from following their example. Despite all of these trials and the closure of many P2P networks and Torrent trackers, file-sharing is on the rise.

So where does that leave us with section 92A?

92A Internet service provider must have policy for terminating accounts of repeat infringers

“(1) An Internet service provider must adopt and reasonably implement a policy that provides for termination, in appropriate circumstances, of the account with that Internet service provider of a repeat infringer.

“(2) In subsection (1), repeat infringer means a person who repeatedly infringes the copyright in a work by using 1 or more of the Internet services of the Internet service provider to do a restricted act without the consent of the copyright owner.

On the surface, this intent of this law isn’t actually that bad. It says that an ISP should have a policy that allows for the termination of accounts of repeat infringers, a repeat infringer being a person that repeatedly infringes copyright in “a work”.

This clause seems to say that if you upload a Madonna song to YouTube once, you’re an infringer. It might be saying that if you upload it the same song three or four times, you are a repeat infringer. It’s not clear if you’re a repeat infringer if you infringe the copyright of many different works, and do this repeatedly, with each work being infringed once.

But okay, let’s say you’ve been running a file-sharing application, and your copy of Madonna’s Like a Virgin has been shared with many people. Or you’re using a torrent and you’re uploading to three or four different peers while downloading from three or four others. That probably constitutes repeated infringement in “a work”.

So do you get disconnected if a copyright holder catches you and accuses you? The answer is: it depends. It depends on a document that is being drafted right now: the Telecommunications Carriers’ Forum’s Draft Code of Practice, which is a template document which all of the New Zealand’s ISPs will crib from in order to write their own Code of Practice. It depends on the Code of Practice that your own ISP adopts. It depends on what that document says about how a repeat infringer is identified, and defined, and the level of evidence required for the ISP to accept that a user has repeatedly infringed using the ISP’s service.

The copyright holders and ISPs have tussled over this document like dogs over a bone. The problem is, if the ISP adopts a Code of Practice that is too lenient on users, or too strict on the standard of evidence it requires over infringements, copyright holders will take them to court and they will fight over the words ‘in appropriate circumstances’. ISPs, particularly small ones, will want to avoid that situation at all costs. They have to overcompensate or risk their business.

So what’s in the Code? Here’s what the Draft Code of Practice says about repeated infringement:

To avoid doubt, an Infringer need not Infringe repeatedly with respect to the same category of work under the Act or with respect to the same Copyright Holder, to qualify as a Repeat Infringer.

The Draft Code of Practice is clarifying section 92A (2) and saying that actually, a repeat infringer is someone who repeatedly infringes copyrights in any works, in any categories, from any copyright holders. So you’re a repeat infringer if you share a Madonna song, and a Rihanna song, and an Animal Collective song, and you only share each once. The Draft Code, at the behest of entertainment industry interests, has interpreted the statute in the widest possible way.

The Draft Code of Practice also defines the levels of evidence required for someone to have made an infringement. They are:

11.1 A judgement of a Court (interim or final) finding Infringement under the Act;

11.2 A Copyright Holder Notice which complies with this Code;

11.3 such other evidence as that Party is prepared, in its sole discretion, to accept would be sufficient to satisfy a Court that an Infringement under the Act has taken place.

11.1 says that if a copyright holder took you to court, and won, that’s satisfactory evidence of an infringement. 11.3 says that if the ISP is satisfied that the evidence presented to them would be sufficient to satisfy a court, that’s satisfactory evidence of an infringement. So either you go to court, or the ISP makes the judgement that the evidence is satisfactory on its own merits.

But under 11.2, a copyright holder can issue a Copyright Holder Notice, which contains information about the how a copyright was infringed, the method by which the infringement was detected, the time and date, and so on. The Draft Code of Practice says that this is acceptable evidence of an infringement. The Draft Code also says (in its current form) that a user can issue a Counter-Notice, disputing the allegation of infringement, but this is being debated, and copyright holders have proposed an alternative Counter-Notice procedure that puts the copyright holder themselves in an position where they themselves decide the validity of a Counter-Notice.

With the inclusion of 11.2 and 11.3, we will have to deal with ISPs being in the position of an adjudicator. If the alternative Copyright Holder Notice procedure is taken up, the copyright holders themselves are in the adjudication position. Copyright holders, who would be the plaintiffs in a civil case, get to play judge.

The alternative is that the ISP plays judge, but they and the entertainment industry have intermingled interests. ISPs are also their distributors. They fight over this but on other things they’re business partners. Would you be comfortable being the defendant in a court case if you knew the judge and the plaintiff were business partners? Would you really get the same consideration as you would get in a court or arbitration? I don’t believe you would. Users need the opportunity to state their case before an impartial judge in a proper court of law, (or a Copyright Court, as suggested by the Creative Freedom Foundation): anything else is corruption waiting to happen. They may trust themselves to be impartial, but that’s arrogance and hubris, and just not good enough.

The Draft Code of Practice, assuming all ISPs adopt it unchanged, will be the law in essence. We may lose this fight, and through our government’s inaction, section 92A may come into effect. That’s why it’s hugely important to write a submission to the Telecommunications Carriers’ Forum to let them know that they cannot accept any evidence short of a Court judgement finding infringement, or an agreed impartial body, such as a Copyright Court, finding infringement.

You can find more information at the Telecommunications Carriers’ Forum site. Please make a submission to submissions@tcf.org.nz: they close on Friday the 6th of March, in two weeks. This matters. Beyond that, we will have to vote with our wallets, by choosing the ISPs whose actual Code of Practice offers us the most protection from false allegations of copyright infringement.

Webstock ’09: Day 2: When Sterling Attacks

Saturday, February 21st, 2009

If Day 1’s topic was community, Day 2’s topic was data and hardware. Big, messy stacks of data, measured, unmeasured, captured by all the things around us and coming to the web. On Day 1, I took an average of one page of notes per speaker. On Day 2 it was closer to three pages. I remembered a couple of things I forgot to mention on Day 1. Jane McGonigal’s latest game is the Signtific Lab, a massively multiplayer thought experiment that was played throughout the conference, and Ze Frank’s youngme/nowme project is just plain cool.

Russell Brown kicked things off with a quick chat about s92a and why we should really be paying attention to the Draft Code of Practice (PDF): because what’s in there will actually be the implementation of the law. He then talked about the state of television, and how the biggest problem they have is the old-style distribution networks: it’s the reason why you’re not allowed to watch TV on Hulu in New Zealand (the Alec Baldwin superbowl ad is worth a watch), or why an American can’t watch TVNZ on Demand. That said, sites like NZ On Screen and Radio New Zealand are getting it right with no DRM and no geo-blocking. He believes traditional TV still has a role (after all, it’s infinitely scalable), but that the content has a long way to go if it wants to still be relevant: people (and particularly kids) just aren’t watching the style of programs they used to. Oh, and Media 7’s doing great, thanks very much: particularly via the web.

Derek Featherstone discussed accessibility, and was chiefly interested in the differences between standards and accessibility: meeting standards is no guarantee that your site is actually usable. Without some knowledge about how your users actually interact with your site, or without at least thinking about it, you run the risk of creating something that validates but nevertheless sucks. I’m a fan of the Cooper process, and I found Derek’s insights underlined a lot of Alan Cooper’s ideas. He talked about making maps and flash movies usable by embedding buttons instead of overlays and putting controls outside a YouTube clip that controls the clip via the YouTube API. He showed us a cool accessible crossword, and professed a lot of love for Ubiquity, a command line interface for the web, which, among other things, is an accessibility tool on steroids.

Annalee Newitz of Wired was awesome. She talked about science fiction metaphors and the vocabulary established in films and books like Tron and Neuromancer and The Matrix and Battlestar Galactica, and the fear that people have about computer interfaces that interface back. She discussed how we can surmount those fears through education and by being mindful of the resistances that people have, because when it comes to new and scary interfaces people have internal narratives that end in having their brains sucked out: even when it comes to things like Facebook. She talked about superheroes and super villains and how the internal development team of Google Android switched from a Cylon-styled booting screen to a more friendly R2D2-style brand when it was about to be introduced to the public. And she presented some cool work on gestural interfaces, wearable computers, exoskeletons and Brain Computer interfaces, referencing Iron Man and Batman Begins along the way. It was nerd heaven.

I was really inspired by Toby Segaran’s talk. As web developers we rely a lot on the relational database, and he took as through some of the technologies that are trying to improve on the database paradigms we’ve depended on for the last 35 years. He discussed the graph data model, which is the model of the semantic web: object-predicate-subject relationships, and how it is organic and allows for schema changes because it doesn’t really have a fixed schema. Storing data in this way with tools like Sesame and Exhibit allows us to use SPARQL to do efficient queries that in RDBMSes take dozens of SQL joins. If data is presented in this way on the web, through standards like RDF and others, it means we can get answers to queries that are not only accurate, but also smarter.

Matt Biddulph of Dopplr talked about hardware hacking, which is a quiet ambition of mine, and he showed us some cheap electronics that we can use to create cool stuff. He directed us to Make, and showed us the current standard microprocessors and wireless connectors in hardware hacking: the Arduino, the XBee. He believes that if you can’t open something, you don’t own it. He pointed us to some cool hardware projects: a pot plant that tweets when it needs water, the tea/no-tea teapot, both built out of cheap hardware components that were just wired together. He recommended Everyware and Making Things Talk as good texts about the craft and the culture.

Matt Jones, also of Dopplr, talked about the cities that we live in, and how the spaces that we inhabit are becoming augmented with information: he pointed to the early ideas of Archigram, Project Cybersyn and Sir Richard Rogers. The end result of this is that our environments are becoming spaces that are robot-readable: machines can navigate the world by the nascent data-ness of things. Riffing on Matt Biddulph’s earlier example of the twitter-powered pot plant, he pointed to James Chambers’ Has Needs, a pot plant that will put itself on a site like Craigslist if it’s neglected. But people are walking architecture, mobile data sources, and our interaction with cities gives us a whole swathe of psychogeographic information (cf. Guy Debord) that we can play with. He presented a lot of interesting ideas, the street as platform, the Sky Ear, the Helsinki laser cloud, situated software. How do we enable all of this? In the words of Eliel Saarinen: always design a thing in its next larger context.

Tom Coates of Fire Eagle was hilarious. He was a good follow-up to Matt, as he talked about personal informatics, and how we are beginning to see a world where people can (if they want to) record every miniscule detail about their lives and view the information as the wish, using personal sensors. FireEagle, as a geolocator, is a part of that, and he thinks we will see an explosion of services that manage other pieces of data (think last.fm, Twitter, Flickr, 23andme, Skydeck, Mint, Google Health, Nike+iPod, Google Power Meter … besides being tools to explore and store personal information, they’re data gatherers, each gathering a single slice of data). Mobile is key to this stuff because it’s the most ubiquitous sensor we have. And real-time really excites him. Technology like XMPP is enabling this, but Tom’s view is that it’s data that’s driving new products, not technology. As more data comes available, more products become possible. Aggregators crop up, like Socialthing and Friendfeed, and for devices, Pachube. (We had our own Webstock example of real-time data aggregation: the SuiteSpot.) Tom pointed out the difference between personal information and private information: although we might talk about what should be personal and what should be private, the smartest thing to do is to put that decision in the hands of users. And a big privacy policy or terms of use doesn’t help. Be up front.

Bruce Sterling was next, but I’ll get to him in a second.

Damian Conway, the bossy little schoolgirl of web design (his words), talked about designers and users, and how we tend to think about the differences. He compared the relationship to that of Elois and Morlocks, and then to Elves and Orcs, but actually thought it was more helpful for us to think of ourselves as more akin to doctors. He proposed a Hippocratic Oath for web designers:

  1. To learn and share knowledge
  2. To always do your best work
  3. Don’t kill the client’s business
  4. Know your limitations, and get help if you need it
  5. Always have the best interests of your clients in mind (often the best interests of your clients are the interests of their clients)
  6. Be professional, keep your clients confidentiality.

He showed us some sites that have really got it wrong, told us most shopping carts suck (how much information do they really need to collect, six steps worth?), and was hilarious throughout. He also showed us a couple of pictures of some guys holding cats.

Bruce Sterling spoke before Damian, but I wanted to talk about him last because his presentation was something that stood alone. He was electrifying. He presented one slide, Tim O’Reilly’s Web 2.0 meme map, and tore it to shreds. It was a deconstruction; a wake-up call, he was trying to shake us out of the fog we were in. This isn’t permanent. We got here because creative people are attracted to places that lack rightness, and the web lacked a lot of rightness, and still does. Web 2.0 had a few useful ideas, but some of those ideas (the ideas that lacked technology) were just attitudes. It’s not a platform: you can’t build a platform on a web, just like you can’t build a castle on a cloud: it’s a fantasy. The whole thing rests on an economy that is in a state of collapse. Web 2.0 is a structure built from the bones and ashes of Web 1.0 technologies: it’s bricolage. JavaScript might be the glue that holds Web 2.0 together, but if a freighter runs over our cable across the bottom of the Pacific, that glue isn’t going to hold. Ubiquity is a long way off, we may have the concepts but power sources and bandwidth hasn’t kept up: it’s an art scene, it’s a hack scene. Permanence is a fiction: the sites he linked to years ago are all gone, in 404-land. We can’t trust the so-called collective intelligence like we can’t trust the invisible hand of the market: it’s not our benefactor; it’s a force of nature, and it could turn and flatten us at any moment. With the economy in freefall, we’re about to see a Web with great big holes in it, the Transitional Web, a cyberstructure that is poverty-stricken but that we will use to keep the rain off. He encouraged us to grow up to the scale of things, see the culture for what it is, and to simply be, but be aware.

To some, he would have appeared as a ranting preacher, raving that the end is nigh. To me, it was a welcome shot of realism. We work in the best industry in the world right now, no doubt, but permanence is by no means guaranteed. We should enjoy it while it lasts, for however long that might be. Sterling’s talk was a great end to the conference (thankfully tempered by Damian Conway’s positivity), and it played on my mind as we walked out into the rain and the darkening day. Maybe it was just me but the party, awesome as it was, felt Gatsby-ish: beautiful geeks, industry celebrities, a red carpet, a band played by robots. I had a great time, but it felt of-a-time, and I wonder: what happens now?

Webstock ’09: Day 1

Thursday, February 19th, 2009

I had a great time at Webstock today. I am getting the sense that this year’s event is primarily about online communities: attracting them, keeping them, inspiring them and moderating them.  Last year was more about design and frameworks, so this was something quite new.

Jane McGonigal spoke about her work with AvantGame and the Institute for the Future, how games make for a happier humanity and the role of games in making futures, not just predicting them.  She talked about how we’re all creating games, and if we’re not creating them, our competitors will.  Good games are structures that make people happy, and help people feel and become more awesome.  How many websites do that? In what ways can they do that?  I’m heartened by Jane’s ambition to have a game developer win a Nobel Prize by 2032.   And Top Secret Dance Off (which featured on Close Up tonight) looks fun!

Nat Torkington of O’Reilly talked about the scientific method and the idea of a feedback loop (in science, in evolution, in the political process), how it applies to web and software design and development and how we don’t actually get anywhere without allowing failure to happen. If it happens in small, testable increments, failure is a tool for innovation.  But too small, and we risk any chance for serendipitous innovation.  He talked about how big companies suffer from the Inventor’s Dilemma: they’ve got too much to lose to innovate their most successful products.  This is exactly why they can be beaten by small companies.  But, on the side of corporates, it brought to mind an article I read in Idealog earlier in the week about the Black Room concept.

Derek Powazek spoke about the wisdom of crowds: what it’s good for, and how it can be woefully misused.  His insights about relying on selfishness and the double-edged sword of using game tropes – winning the game has to help the community, or you’re screwed – were worth the price of admission. He talked about the ways in which communities can be encouraged and the methods you can use to keep them flourishing.  He spoke about trolls – who they were, what motivated them, and what they wanted – and how to diffuse them.  I really liked the idea of the cone of silence: this is where a logged-in user who is an identified troll sees their contributions within the stream of comments, but they are the only user who sees them.   Their attempts at baiting are ignored by those that can’t see them, and they eventually leave out of boredom.  Sites he mentioned that are worth a look: Favrd, Hot or Not!, and his own site, Fray.

It was heartening to see Meg Pickard from the Guardian UK speak about community participation as something that fundamentally changes a content producer’s offer: it’s not about a content producer putting out authoratitive comment and then everyone picking it to pieces in a completely separate (and largely ignored) silo, it’s about bringing that content into the conversation to create something completely new: something better than simply authoritative content with comments tacked on.  To me it seemed like an obvious leap, but the reaction of others in the room suggests that a lot of people really aren’t there yet.  It’s nice to know that the Guardian is willing to meet the challenges of the new media, I hope there were people from APN and Fairfax in the audience.

David Recordon of Six Apart talked through a number of technologies that support the social web: he discussed the growing recognition amongst people that walled gardens on social networking sites suck, it’s our data, and we need to find ways to make it easy to share the data that we want in the way that we want, and it’d be great if we didn’t have to complete a sign-up process for every site on the web either.  There’s a whole bunch of open source tools that could fix this: Microformats, OpenID, OAuth, XMPP, Open Social, FOAF, vCard, DiSo.  There’s a great Django-based product that builds on top of a lot these tools: Pinax, and it sounds well-worth investigating further.  He also mentioned the SocialWeb TV, a video podcast.  I’ve subscribed already.

Adrian Holovaty impressed the hell out of me.  I’m a Django fanboy to begin, but Django didn’t actually figure much in his presentation.  He talk about data, and using data in exciting and interesting ways.  His latest project EveryBlock gives you an RSS feed for what’s happening in your neighbourhood: crimes, property sales, new businesses, restaurant inspections.  His presentation was quite detailed: talking about URL schemes and mapping technology (he recommended an article in A List Apart about rolling your own maps).  He suggested we encourage our government (national and local) to not waste time building badly thought-out sites for displaying data: give out the data in an API and let the public use it as they will.  It’s our data after all.

Heather Champ is the community manager at Flickr. She spoke about the ways  she’s found effective for shepherding her community.  She talked about some of Flickr’s most public challenges: the YahooID switchover, the masturbating subway pervert, laptop thieves accidentally uploading their photos to Flickr, and solving downtime issues with a colouring competition.  Heather talked frankly about the mistakes they’d made and the lessons they learned, and how they’ve become much better at managing the community as a result.  A great talk.

Michael Lopp of Apple took the entertaining route.  He spoke about nerds, geeks and dorks, showed us a Venn diagram of the three and how they intersected.  He talked about his observations on how geeks think and work: they’re systems thinkers who go deep in interesting ways: they’re obsessed with understanding the rules of things.  He believes that everything’s just a beautiful mess, but that you can fascinate a nerd by pulling bits out of that mess and getting them to make sense of it.  His talk was perhaps lacking in information, but nerds love hearing about themselves, and hearing someone who seems to be explaining the rules about understanding themselves, so he was very well received.

I really didn’t know what to expect with Ze Frank, but I was impressed.  He talked us through some of the projects he’d been involved in over the years, and some of the strange and wonderful moments that have happened as part of his many projects: Atheist Game, Flowers, 52to48, Facebook me=u, Earth Sandwich, and Remixes for Ray.  He was funny, and he was smart, really smart.  He understands the internet in such a fundamental way, and more than that, he knows how to participate in it in ways that really inspire others and go out of their way to do some really silly things.  I’m a total fan.

Tomorrow’s speakers have a tough job ahead of them to live up to the standard set today.  There wasn’t a single session where I wasn’t completely fascinated.  I just wish I could have seen the other break-out sessions. Full credit to the Webstock team!

Generating Short URLs for Django Site URLs

Saturday, February 14th, 2009

Today I developed the following mixin for generating short urls (à la tinyurl) for any model that generates a unique absolute URL for an instance.

It builds on top of django.contrib.redirects, so relies on that app being installed. Aside from that, integration is easy: just import the ShortURL mixin and add it to your model’s class declaration:

class MyModel(models.Model, ShortURL):
  1. The mixin code itself is:
  2. <pre lang="python">from django.conf import settings
  3. from django.contrib.sites.models import Site
  4. from django.contrib.redirects.models import Redirect
  5. from random import choice, seed
  6. from os import urandom
  7.  
  8. SHORTURL_CHARS = getattr(settings, "SHORTURL_CHARS", "bcdfghjklmnpqrstvwxyz2346789")
  9. SHORTURL_CHAR_NO = getattr(settings, "SHORTURL_CHAR_NO", 5)
  10. SHORTURL_APPEND_SLASH = getattr(settings, "SHORTURL_APPEND_SLASH", True)
  11.  
  12. class ShortURLException: pass
  13.  
  14. class ShortURL(object):
  15.  """
  16. A mixin that sets up short url redirects for models that have a get_absolute_url
  17. method.  Requires django.contrib.redirects to be installed to create redirects, and
  18. django.contrib.redirects.middleware.RedirectFallbackMiddleware to use them.
  19. """
  20.  def __init__(self, *args, **kwargs):
  21.   """
  22.  Seeds randomiser
  23.  """
  24.   seed(urandom(256))
  25.   super(ShortURL, self, *args, **kwargs)
  26.  
  27.  def get_short_url(self, *args, **kwargs):
  28.   """
  29.  Finds the short url for the object's absolute url in the Redirects model objects.
  30.  If it doesn't exist, generate a short url and create a new Redirect object.
  31.  """
  32.   if not hasattr(self, 'get_absolute_url'):
  33.    return None
  34.   else:
  35.    currenturl = self.get_absolute_url()
  36.    site = Site.objects.get(id=settings.SITE_ID)
  37.    redirects = Redirect.objects.filter(site=site, new_path=currenturl)
  38.  
  39.    for url in redirects:
  40.     if len(url.old_path) &lt;= SHORTURL_CHAR_NO + 1: #allow for leading slash
  41.      shorturl = url.old_path
  42.      break
  43.     else:
  44.      shorturl = None
  45.  
  46.    if not shorturl:
  47.     # Check we've got at least a 9 in ten chance of not colliding or throw an exception
  48.     if Redirect.objects.count() &gt; (len(SHORTURL_CHARS) ** SHORTURL_CHAR_NO) / 10:
  49.      raise ShortURLException
  50.     while True:
  51.      shorturl = '/'+''.join([choice(SHORTURL_CHARS) for char in range(SHORTURL_CHAR_NO)])
  52.      if not Redirect.objects.filter(site=site, old_path=shorturl):
  53.       # save shorturl without trailing slash so redirect middleware will find both forms
  54.       r = Redirect(site=site, old_path=shorturl, new_path=currenturl)
  55.       r.save()
  56.       break
  57.  
  58.    shorturl += '/' if SHORTURL_APPEND_SLASH else ''
  59.    return shorturl

The default settings give you around 17 million short URL combinations, one character less will give you 600,000.  If you've only got a few thousand unique pages you shouldn't be worried about collisions (the script just keeps pulling out random strings until it finds a unique one), but if you've got a big site you'll probably want to rely on a hashing function rather than just getting lucky.

This code can also be found at Django Snippets.

UPDATED: 16 February

  • Now throws exception if the chance of collision is getting too great.
  • Only returns short urls and not just the first redirect it finds for a page.
  • Stores redirect in DB without trailing slash so that the redirect works with or without the slash.

Variable Swapping with Python

Wednesday, February 11th, 2009

Python’s great. It’s the little things. Like swapping variables. In most other languages, a variable swap requires a temporary variable, for example:

  1. tmpvar = var1
  2. var1 = var2
  3. var2 = tmpvar

Or if you’re dealing with integers, you can use the XOR/addition and subtraction method:

  1. var1 = var1 + var2
  2. var2 = var1 – var2
  3. var1 = var1 – var2

But in Python? Because all variables are references to objects, you can use tuple-packing/unpacking to achieve a swap in one command.

  1. var1, var2  = var2, var1

Nice.