Last week I ran into some problems dealing with large book imports on Readernaut. I tested the system for around 50-100 books but had no idea people would upload lists of 900+ books. This begged the question, how do you handle importing very large sets of data before the browser times out?
Brief example
User uploads a list of 1000 ISBNs to be imported into their library. Each book, if not already in the system, needs to be imported via another service like Amazon.
Solution 1: Threading
Use threading to push off the long running process to another thread while directing the browser to a status page. You could setup an Ajax request to periodically check on the status of the import and update a progress bar.
This solution is generally a bad idea. It’s super easy to do and tests well in a development environment, but has scary consequences down the road, especially if you don’t have a lot of server resources.
Solution 2: Message Queuing
Message queuing is a very basic asynchronous method of storing items in a queue to be processed later.
So for this instance I created a model called Message that has three fields:
class Message(models.Model):
user = models.ForeignKey(User)
subject = models.CharField(max_length=100)
message = models.TextField()
This allowed me to slurp in the list of ISBNs and break them out into 20 book chunks. Each chunk gets related to a user and a subject of “book_import.” Once it’s finished I can send the user to a progress page where each chunk is processed one by one until they’re gone. Here’s a simple example of a progress view:
def book_import_progress(request):
message_list = Message.objects.filter(user=request.user, subject='book_import')
handle_message(message_list[0])
if len(message_list) > 0:
return render_to_response('books/progress.html', {}, context_instance=RequestContext(request))
else:
return HttpResponseRedirect('/user/books')
First we get a list of messages, then we process the first 20 books. If there are more messages to be processed we refresh the page otherwise we send the user to their books page.
Additionally it’s super simple to Ajaxify the progress page by using Django’s handy request.is_ajax()
and returning a JSON object with info on the progress of the import. Message Queuing is pretty handy for a lot of situations and Amazon even has a service for it called Simple Queuing Service (SQS).