Unicode memory location

I know someone explains why when I create equal Unicode strings in Python 2.7 they donโ€™t point to the same place in memory As in โ€œnormalโ€ strings

>>> a1 = 'a'
>>> a2 = 'a'
>>> a1 is a2
True

ok this is what i expected but

>>> ua1 = u'a'
>>> ua2 = u'a'
>>> ua1 is ua2
False

why? as?

+5
source share
2 answers

Normal strings are not guaranteed to be interned. Sometimes it is, sometimes it is not. The rules are complex, version dependent, and are not intentionally documented.

You may depend on Python trying to put small and commonly used objects whenever it's a good idea. And this, if you write some code that depends either on a1 is a2or vice versa, it will break whenever it is inconvenient.

, , . CPython stringobject.c 2.6 2.7, unicodeobject.c 3.3.

, , 2.x( - unicode, str, 3.x). 2.7 , unicode, intern . , 2.7 unicode, .

, 3.3 , str UTF-8, UTF-16 UTF-32, , , API Unicode - . , a1 is a2, , .

python , . , , .

+2

, , Unicode - . , , (Python 2.6.6):

>>> intern("string")
'string'
>>> intern(u"unicode string")

Traceback (most recent call last):
  File "<pyshell#18>", line 1, in <module>
    intern(u"unicode string")
TypeError: intern() argument 1 must be string, not unicode
+3

All Articles