Temporary complexity of serializing / parsing JSON

I have a project that uses JSON as cross-language serialization for data transfer. Recently, the size of data has been growing a bit huge (a list of objects with a length of 10 thousand objects). Serializing data requires the standard json library for python for about 20 seconds.

I am working on optimizing the time. Although switching to another json serializer (cjson, simplejson, ujson) may speed things up a bit, I'm starting to wonder about the time complexity of JSON serialization. If the relation is not linear (say, if it is n ^ 2), I can easily slice the data in pieces and significantly reduce the time.

From what I guessed, complexity should really depend on the input. But is there a worst / average rating? Link to link will also be highly appreciated.

Thank.

+3
source share
1 answer

I compared the time complexity with this code:

import json
import random
import time

Ns = 10, 100, 1000, 10000, 100000, 200000, 300000, 600000, 1000000
for N in Ns:
    l = [random.random() for i in xrange(N)]
    t0 = time.time()
    s = json.dumps(l)
    t1 = time.time()
    dt = t1-t0
    print "%s %s" % (N, dt) 

On my machine, the result is:

10 7.20024108887e-05
100 0.000385999679565
1000 0.00362801551819
10000 0.036504983902
100000 0.366562128067
200000 0.73614192009
300000 1.09785795212
600000 2.20272803307
1000000 3.6590487957

First column: list length; second column: time for serialization. Building (e.g. xmgrace) shows an ideal linear relationship.

+2
source

All Articles