Python: custom JSON decoder performance

I have an application that periodically flushes and uploads a JSON file in Python using standard JSON features.

In the beginning, we decided that it was much more convenient to work with loaded JSON data as objects, rather than dictionaries. It really comes down to the convenience of the “dot” access, as opposed to the notation []for finding dictionary keys. One of the benefits of Javascript is that there is no real difference between finding dictionaries and accessing member data (which is why JSON is especially suitable for Javascript, I think). But in Python, dictionary keys and object data elements are two different things.

So, our solution was to simply use a custom JSON decoder that uses a function object_hookto return objects instead of dictionaries.

And we lived happily ever after ... until now, when this design decision may turn out to be a mistake. You see, now the JSON dump file has grown quite large (> 400 MB). As far as I know, standard Python 3 JSON tools use native code to actually parse, so they are fast enough. But if you provide a custom one object_hook, it still has to execute an interpreted bytecode for each JSON object decoded - which SERIOUSLY slows down. Without object_hookdecoding the entire 400 MB file, it takes only about 20 seconds. But with a hook, it takes half an hour!

So, at the moment 2 options come to mind, none of which are very pleasant. One of them is simply to forget about the convenience of using point access to data and just use Python dictionaries. (This means changing a significant amount of code.) Another is to write a C extension module and use it like object_hookthat and see if we get acceleration.

I am wondering if there is some better solution that I don’t think about - perhaps an easier way to access the “points”, although it is still initially decoded into the Python dictionary.

Any suggestions, solutions to this problem?

+5
source share
3 answers

object_hook, json , namedtuple.

- :

from collections import namedtuple
result = json.parse(data)
JsonData = namedtuple("JsonData", result.keys())
jsondata = JsonData(**result)

, , .

+3

dict JSON , ?

- :

class DictWrap(object):

def __init__(self, d):
    self.__d = d

def __getattr__(self, attr):
    try:
        return self.__d[attr]
    except KeyError:
        raise AttributeError


dw = DictWrap({"a": "foo", "b": "bar"})

print dw.a, dw.b // foo bar
print dw.c // AttributeError

: , .

0

.

Lennart Regebro (, , ). . python json.

nemo "" / " ", , , nemo. .

def __getattr__(self, attr):
  ...
  if isinstance(self.__d[attr], dict):
    return DictWrap(self.__d[attr])

  elif isinstance(self.__d[attr], list):
    return ListWrap(self.__d[attr])    # and create similar wrapper for List.
  ...

:

class JsonData(object):pass

data = JsonData()
data.__dict__.update(json.parse(data))
0
source

All Articles