Unicode memory location

Question

Unicode memory location

I know someone explains why when I create equal Unicode strings in Python 2.7 they don’t point to the same place in memory As in “normal” strings

>>> a1 = 'a'
>>> a2 = 'a'
>>> a1 is a2
True

ok this is what i expected but

>>> ua1 = u'a'
>>> ua2 = u'a'
>>> ua1 is ua2
False

why? as?

+5

python python-2.7 unicode

Zokis Mar 13 '13 at 18:48

source share

2 answers

, , Unicode - . , , (Python 2.6.6):

>>> intern("string")
'string'
>>> intern(u"unicode string")

Traceback (most recent call last):
  File "<pyshell#18>", line 1, in <module>
    intern(u"unicode string")
TypeError: intern() argument 1 must be string, not unicode

+3

Claudiu 13 . '13 18:49

abarnert · Accepted Answer · 2013-03-13T19:00:58+0000

Normal strings are not guaranteed to be interned. Sometimes it is, sometimes it is not. The rules are complex, version dependent, and are not intentionally documented.

You may depend on Python trying to put small and commonly used objects whenever it's a good idea. And this, if you write some code that depends either on a1 is a2or vice versa, it will break whenever it is inconvenient.

, , . CPython stringobject.c 2.6 2.7, unicodeobject.c 3.3.

, , 2.x( - unicode, str, 3.x). 2.7 , unicode, intern . , 2.7 unicode, .

, 3.3 , str UTF-8, UTF-16 UTF-32, , , API Unicode - . , a1 is a2, , .

python , . , , .

Unicode memory location

More articles: