In python on OSX with HFS +, how can I get the correct example of an existing file name?

I store data about files that exist on the OSX HFS + file system. Later I want to iterate over the stored data and find out if there is every file. For my purposes, I care about file case sensitivity, so if the case with the file name has changed, I think the file no longer exists.

I started trying

os.path.isfile(filename)

but with a normal OSX installation on HFS +, this returns True, even if the file name does not match. I am looking for a way to write an isfile () function that takes care of the case, even if the file system does not.

os.path.normcase () and os.path.realpath () return the file name in any case that I pass to them.

Edit:

Now I have two functions that seem to work with ASCII restricted file names. I do not know how unicode or other characters can affect this.

The first is based on the answers given here by omz and Alex L.

def does_file_exist_case_sensitive1a(fname):
    if not os.path.isfile(fname): return False
    path, filename = os.path.split(fname)
    search_path = '.' if path == '' else path
    for name in os.listdir(search_path):
        if name == filename : return True
    return False

The second is probably even less effective.

def does_file_exist_case_sensitive2(fname):
    if not os.path.isfile(fname): return False
    m = re.search('[a-zA-Z][^a-zA-Z]*\Z', fname)
    if m:
        test = string.replace(fname, fname[m.start()], '?', 1)
        print test
        actual = glob.glob(test)
        return len(actual) == 1 and actual[0] == fname
    else:
        return True  # no letters in file, case sensitivity doesn't matter

Below is the third answer to DSM.

def does_file_exist_case_sensitive3(fname):
    if not os.path.isfile(fname): return False
    path, filename = os.path.split(fname)
    search_path = '.' if path == '' else path
    inodes = {os.stat(x).st_ino: x for x in os.listdir(search_path)}
    return inodes[os.stat(fname).st_ino] == filename

I do not expect them to work well if I have thousands of files in the same directory. I still hope that I feel more effective.

Another drawback that I noticed during testing is that they only check the file name to match case. If I give them a path that contains directory names, none of these functions still check the directory name register.

+5
source share
6 answers

. : , , -, . .

>>> !ls
A.txt   b.txt
>>> inodes = {os.stat(x).st_ino: x for x in os.listdir(".")}
>>> inodes
{80827580: 'A.txt', 80827581: 'b.txt'}
>>> inodes[os.stat("A.txt").st_ino]
'A.txt'
>>> inodes[os.stat("a.txt").st_ino]
'A.txt'
>>> inodes[os.stat("B.txt").st_ino]
'b.txt'
>>> inodes[os.stat("b.txt").st_ino]
'b.txt'
+3

- os.listdir , , .

+2

omz - - :

import os

def getcase(filepath):
    path, filename = os.path.split(filepath)
    for fname in os.listdir(path):
        if filename.lower() == fname.lower():
            return os.path.join(path, fname)

print getcase('/usr/myfile.txt')
+2

, , Alex L answer, :

import os, unicodedata

def gettruecasepath(path): # IMPORTANT: <path> must be a Unicode string
  if not os.path.lexists(path): # use lexists to also find broken symlinks
    raise OSError(2, u'No such file or directory', path)
  isosx = sys.platform == u'darwin'
  if isosx: # convert to NFD for comparison with os.listdir() results
    path = unicodedata.normalize('NFD', path)
  parentpath, leaf = os.path.split(path)
  # find true case of leaf component
  if leaf not in [ u'.', u'..' ]: # skip . and .. components
    leaf_lower = leaf.lower() # if you use Py3.3+: change .lower() to .casefold()
    found = False
    for leaf in os.listdir(u'.' if parentpath == u'' else parentpath):
      if leaf_lower == leaf.lower(): # see .casefold() comment above
          found = True
          if isosx:
            leaf = unicodedata.normalize('NFC', leaf) # convert to NFC for return value
          break
    if not found:
      # should only happen if the path was just deleted
      raise OSError(2, u'Unexpectedly not found in ' + parentpath, leaf_lower)
  # recurse on parent path
  if parentpath not in [ u'', u'.', u'..', u'/', u'\\' ] and \
                not (sys.platform == u'win32' and 
                     os.path.splitdrive(parentpath)[1] in [ u'\\', u'/' ]):
      parentpath = gettruecasepath(parentpath) # recurse
  return os.path.join(parentpath, leaf)


def istruecasepath(path): # IMPORTANT: <path> must be a Unicode string
  return gettruecasepath(path) == unicodedata.normalize('NFC', path)
  • gettruecasepath() , ( ), :

    • Unicode:
      • Python 3.x: Unicode - .
      • Python 2.x: : u; , u'Motörhead'; str : , , strVar.decode('utf8')
    • - Unicode NFC ( ). NFC OSX, (HFS +) NFD ( ).
      NFC , , NFD, Python NFC NFD () . . .
    • ( , . ..), , , Windows \ .
    • Windows share/UNC-share, , -.
    • OSError, , .
    • , , , Linux ext4, , .
  • istruecasepath() gettruecasepath() , .

. ( ), - , , . .


Native API ()

, OSX, Windows API-, .

Windows API , OSX , , - - , .

Unicode: NFC NFD

HFS + ( OSX) Unicode (NFD), Unicode , (NFC).

, -ASCII ü, , Unicode, U+00FC; NFC: "C" , u ¨ ( ) .

, ü HFS +, NFD, 2 Unicode: u (U+0075), (̈, U+0308) ; "D" , .

, Unicode 2 () , , Python, ​​. Python unicodedata.normalize() .

( : Unicode Unicode, Unicode , . ü ( NFC) 2 UTF-8 (U+00FC0xC3 0xBC), ü (NFD) 3 (U+00750x75 U+03080xCC 0x88)).

+2

, , , ASCII, .

On the plus side, the answer is not looping through files in Python, and it correctly handles directory names leading to the final segment of the path.

This assumption is based on the observation that (at least when using bash) the following command finds a path /my/pathwithout error if and only if it /my/pathexists with this exact case.

$ ls /[m]y/[p]ath

(If the brackets are left outside any part of the path, then this part will not be sensitive to changes in the case.)

Here is an example function based on this idea:

import os.path
import subprocess

def does_exist(path):
    """Return whether the given path exists with the given casing.

    The given path should begin with a slash and not end with a trailing
    slash.  This function does not attempt to escape special characters
    and does not attempt to handle non-ASCII characters, file system
    encodings, etc.
    """
    parts = []
    while True:
        head, tail = os.path.split(path)
        if tail:
            parts.append(tail)
            path = head
        else:
            assert head == '/'
            break
    parts.reverse()
    # For example, for path "/my/path", pattern is "/[m]y/[p]ath".
    pattern = "/" + "/".join(["[%s]%s" % (p[0], p[1:]) for p in parts])
    cmd = "ls %s" % pattern
    return_code = subprocess.call(cmd, shell=True)
    return not return_code
+1
source

You can also try to open this file.

    try:open('test', 'r')
    except IOError: print 'File does not exist'
-2
source

All Articles