How to convert dict to unicode JSON string?

It seems to me that this is not possible using the standard library module json. When used, json.dumpsit will automatically delete all non-ASCII characters, then encode the string in ASCII. I can indicate that it does not skip non-ASCII characters, but then it crashes when it tries to convert the output to ASCII.

Problem - I do not want ASCII! I just want my JSON string to be returned as a unicode string (or UTF-8). Are there any convenient ways to do this?

Here is an example demonstrating what I want :

d = {'navn': 'Åge', 'stilling': 'Lærling'}
json.dumps(d, output_encoding='utf8')
# => '{"stilling": "Lærling", "navn": "Åge"}'

But of course there is no such option as output_encoding, so here is the actual output:

d = {'navn': 'Åge', 'stilling': 'Lærling'}
json.dumps(d)
# => '{"stilling": "L\\u00e6rling", "navn": "\\u00c5ge"}'

So, to summarize - I want to convert Python python to a UTF-8 JSON string without any screens. How can i do this?


I will make decisions like:

  • Khaki (data entry before and after processing on dumpsto achieve the desired effect)
  • Subclass of JSONEncoder (I don't know how this works, and the documentation doesn't help much)
  • Third-party libraries available on PyPi
+5
source share
2 answers

Requirements

  • Make sure your python files are encoded in UTF-8. Or your characters will become non-ascii question marks ?. Notepad ++ has excellent coding capabilities for this.

  • , . , .

  • , IDE Unicode. UnicodeEncodeError.

:

UnicodeEncodeError: 'charmap' codec can't encode characters in position 22-23: character maps to <undefined>

PyScripter . "Portable Python" http://portablepython.com/wiki/PortablePython3.2.1.1

  • , Python 3+, .

json.dumps() Unicode.

. ...

.

- getStringWithDecodedUnicode, .

import re   
getStringWithDecodedUnicode = lambda str : re.sub( '\\\\u([\da-f]{4})', (lambda x : chr( int( x.group(1), 16 ) )), str )

getStringWithDecodedUnicode .

def getStringWithDecodedUnicode( value ):
    findUnicodeRE = re.compile( '\\\\u([\da-f]{4})' )
    def getParsedUnicode(x):
        return chr( int( x.group(1), 16 ) )

    return  findUnicodeRE.sub(getParsedUnicode, str( value ) )

testJSONWithUnicode.py( PyScripter IDE)

import re
import json
getStringWithDecodedUnicode = lambda str : re.sub( '\\\\u([\da-f]{4})', (lambda x : chr( int( x.group(1), 16 ) )), str )

data = {"Japan":"日本"}
jsonString = json.dumps( data )
print( "json.dumps({0}) = {1}".format( data, jsonString ) )
jsonString = getStringWithDecodedUnicode( jsonString )
print( "Decoded Unicode: %s" % jsonString )

json.dumps({'Japan': '日本'}) = {"Japan": "\u65e5\u672c"}
Decoded Unicode: {"Japan": "日本"}

Update

... ensure_ascii=False json.dumps.

. , , .

import json
data = {'navn': 'Åge', 'stilling': 'Lærling'}
result = json.dumps(d, ensure_ascii=False)
print( result ) # prints '{"stilling": "Lærling", "navn": "Åge"}'
+5

encode_ascii=False - IMHO.

Python2.7, python:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# example.py
from __future__ import unicode_literals
from json import dumps as json_dumps
d = {'navn': 'Åge', 'stilling': 'Lærling'}
print json_dumps(d, ensure_ascii=False).encode('utf-8')
+5

All Articles