Parsing multiple json objects on the same line

I am parsing files containing json objects. The problem is that some files have multiple objects on the same line. eg:.

{"data1": {"data1_inside": "bla{bl\"a"}}{"data1": {"data1_inside": "blabla["}}{"data1": {"data1_inside": "bla{bla"}}{"data1": {"data1_inside": "bla["}}

I created a function that tries to parse a substring when there are no open brackets on the left, but there may be curly braces in the values. I tried to skip the values ​​with checking the start and end of the quotes, but there are also values ​​with escaped quotes. Any ideas on how to handle this?

My attempt:

def get_lines(data):
    lines = []
    open_brackets = 0
    start = 0
    is_comment = False
    for index, c in enumerate(data):
        if c == '"':
            is_comment = not is_comment
        elif not is_comment:
            if c == '{':
                if not open_brackets:
                    start = index
                open_brackets += 1

            if c == '}':
                open_brackets -= 1
                if not open_brackets:
                    lines.append(data[start: index+1])

    return lines
+4
source share
3 answers

Simple but less reliable version:

>>> import re
>>> s = r'{"data1": {"data1_inside": "bla{bl\"a"}}{"data1": {"data1_inside": "blabla["}}{"data1": {"data1_inside": "bla{bla"}}{"data1": {"data1_inside": "bla["}}'
>>> r = re.split('(\{.*?\})(?= *\{)', s)
['', '{"data1": {"data1_inside": "bla{bl\\"a"}}', '', '{"data1": {"data1_inside": "blabla["}}', '', '{"data1": {"data1_inside": "bla{bla"}}', '{"data1": {"data1_inside": "bla["}}']

This will not succeed if }{contained in the line

, . , .

, r

accumulator = ''
res = []
for subs in r:
    accumulator += subs
    try:
        res.append(json.loads(accumulator))
        accumulator = ''
    except:
        pass
+2

, , , '{"data1": "}{"}{"data2":"foo"}'.

, /, JSON, '{' '}' (, '[' ']'), :

import json

with open('input.txt') as inp:
    s = inp.read().strip()

jsons = []

start, end = s.find('{'), s.find('}')
while True:
    try:
        jsons.append(json.loads(s[start:end + 1]))
    except ValueError:
        end = end + 1 + s[end + 1:].find('}')
    else:
        s = s[end + 1:]
        if not s:
            break
        start, end = s.find('{'), s.find('}')

for x  in jsons:
    print(x)

:

$ cat input.txt 
{"data1": {"data1_inside": "bla{bl\"a"}}{"data1": {"data1_inside": "blabla["}}{"data1": {"data1_inside": "bla{bla"}}{"data1": {"data1_inside": "bla["}}
$ python json_linereader.py 
{u'data1': {u'data1_inside': u'bla{bl"a'}}
{u'data1': {u'data1_inside': u'blabla['}}
{u'data1': {u'data1_inside': u'bla{bla'}}
{u'data1': {u'data1_inside': u'bla['}}

s = '{"data1": "}{"}{"data2":"foo"}'

{'data1': '}{'}
{'data2': 'foo'}

, .

+3

You can use json raw_decoder! This allows you to read json strings with additional data after the first json object. Usage example:

>>> dec = json.JSONDecoder()
>>> json_str = '{"data": "Foo"}{"data": "BarBaz"}{"data": "Qux"}'
>>> dec.raw_decode(json_str)
({u'data': u'Foo'}, 15)
>>> dec.raw_decode(json_str[15:])
({u'data': u'BarBaz'}, 18)
>>> dec.raw_decode(json_str[33:])
({u'data': u'Qux'}, 15)

The first part of the tuple is the json object, the second is what part of the string was used when reading. Therefore, such a loop will allow you to iterate over all json objects in a string.

dec = json.JSONDecoder()
pos = 0
while not pos == len(str(json_str)):
    j, json_len = dec.raw_decode(str(json_str)[pos:])
    pos += json_len
    # Do something with the json j here
+2
source

All Articles