Fun with Beautiful Soup

Saturday, September 9, 2006

So how easy is it to yank out all links in a post and display them as a list on their own? Well with a little Python love and some Django filters it’s easy as microwave burritos. Before we get started you’ll need a fresh copy of ElementTree and BeautifulSoup.

ElementTree (http://effbot.org/zone/element-index.htm) and BeautifulSoup (http://www.crummy.com/software/BeautifulSoup/) together form a very handy toolbox for parsing structured data. ElementTree is used a lot for parsing XML and BeautifulSoup is tailored more towards parsing XHTML. Download both of these guys and drop them in your python path so they can be accessed within your Django app.

Once those guys are seated you can then write a very simple templatetag filter.

@register.filter
def get_links(value):
  try:
    try:
from BeautifulSoup import BeautifulSoup
    except ImportError:
from beautifulsoup import BeautifulSoup
    soup = BeautifulSoup(value)
    return soup.findAll('a')
  except ImportError:
    if settings.DEBUG:
raise template.TemplateSyntaxError, "Error in 'get_links' filter: BeautifulSoup isn't installed."
    return value

Save that into your templattags directory and use it like so in a template:

<ul>
  {% for link in object.body|getlinks %}
  <li><a href="{{ link.href }}">{{ link.title }}</a></li>
  {% endfor %}
</ul>

The above example assumes you’re working with an object that has a field called body which contains simple HTML structured data. The next assumption is that each anchor has a title attribute. If you didn’t want to mess with titles, just say {{ link }} instead. For more on templatetags consume some tasty Django Documentation (http://www.djangoproject.com/documentation/templates_python/#extending-the-template-system). Enjoy!

UPDATE: I may have inadvertently implied that you needed ElementTree to use BeautifulSoup. This is in fact wrong. BeautifulSoup can play by itself.