Java cannot see file on file system containing invalid characters

I am experimenting with the edge case that we see in production. We have a business model where clients generate text files and then FTP them to our servers. We swallow these files and process them on our Java server (runs on CentOS computers). Most (95% +) of our customers know how to generate these files in UTF-8, what we want. However, we do have a few stubborn clients (but large accounts) that generate these files on a Windows computer with the CP1252 character set. No problem, however, we set up our third-party libraries (which do most of the “processing” for us) to handle input in any character set through some kind of magic voo doo.

Sometimes we see that a file with illegal UTF-8 characters (CP1252) appears in his name. When our software tries to read these files from an FTP server, the usual way of reading files throttles and produces FileNotFoundException:

File f = getFileFromFTPServer();
FileReader fReader = new FileReader(f);

String line = fReader.readLine();
// ...etc.

The exceptions look something like this:

java.io.FileNotFoundException: /path/to/file/some-text-blah?blah.xml (No such file or directory) at java.io.FileInputStream.open(Native Method) at 
java.io.FileInputStream.(FileInputStream.java:120) at java.io.FileReader.(FileReader.java:55) at com.myorg.backend.app.InputFileProcessor.run(InputFileProcessor.java:60) at 
java.lang.Thread.run(Thread.java:662)

So, I think what happens because, because the file name itself contains illegal characters, we don’t even read it in the first place. If we could, then regardless of the contents of the file, our software would have to process it correctly. So this is really a problem with reading file names with illegal UTF-8 characters.

Java- ( ). Windows test£.txt. "test" . Alt-0163. FTP- , ls -ltr , , , test?.txt.

, Java-, / :

public Driver {
    public static void main(String[] args) {
        Driver d = new Driver();
        d.run(args[0]);     // I know this is bad, but its fine for our purposes here
    }

    private void run(String fileName) {
        InputStreamReader isr = null;
        BufferedReader buffReader = null;
        FileInputStream fis = null;
        String firstLineOfFile = "default";

        System.out.println("Processing " + fileName);

        try {
            System.out.println("Attempting UTF-8...");

            fis = new FileInputStream(fileName);
            isr = new InputStreamReader(fis, Charset.forName("UTF-8"));
            buffReader = new BufferedReader(isr);

            firstLineOfFile = buffReader.readLine();

            System.out.println("UTF-8 worked and first line of file is : " + firstLineOfFile);
        }
        catch(IOException io1) {
            // UTF-8 failed; try CP1252.
            try {
                System.out.println("UTF-8 failed. Attempting Windows-1252...(" + io1.getMessage() + ")");

                fis = new FileInputStream(fileName);
                // I've also tried variations "WINDOWS-1252", "Windows-1252", "CP1252", "Cp1252", "cp1252"
                isr = new InputStreamReader(fis, Charset.forName("windows-1252"));
                buffReader = new BufferedReader(isr);

                firstLineOfFile = buffReader.readLine();

                System.out.println("Windows-1252 worked and first line of file is : " + firstLineOfFile);
            }
            catch(IOException io2) {
                // Both UTF-8 and CP1252 failed...
                System.out.println("Both UTF-8 and Windows-1252 failed. Could not read file. (" + io2.getMessage() + ")");
            }
        }
    }
}

(java -cp . com/Driver t*), :

Processing test�.txt
Attempting UTF-8...
UTF-8 failed. Attempting Windows-1252...(test�.txt (No such file or directory))
Both UTF-8 and Windows-1252 failed. Could not read file.(test�.txt (No such file or directory))

test�.txt?!?! , "�" Unicode \uFFFD. , , FTP- CentOS , Alt-0163 (£), \uFFFD (�). , ls -ltr test?.txt...

, , , , , , (, , String-wise replaceAll("\uFFFD", "_") - ), .

, Java . CentOS , (test?.txt), Java, Java test�.txt - No such file or directory...

Java, , File::renameTo(String) ? , , , . !

+5
2

. , .

-, ? escape- UTF-8 - ?

, Windows, . , Windows Unicode \uFFFD, , , ( \uFFFD ).

? , . Windows , test�.txt test\uFFFD.txt . Windows test\uFFFD.txt, , ( test�.txt). , .

? dos ren test*.txt test.txt. , . , , Windows, .

: FTP. FTP - - . FTP. , , . SFTP, scp FTAPI.

, FTP ASCII. FTP- umlauts... , , FTP . , FTP- , , , . , FTP ... -. . , Unicode , UTF-8 Unicode ? (\u003f).

FTP Java new String( bytes ) FTP, - .

:

  • FTP-, , /.
  • , . Windows .
  • , . , script , , .
+5

java java api, , Mac? , java.nio api . , Unicode, java.io.... java.nio.Path . apache FileUtils ( ) java.nio.Files...

, : Files.readAllLines(myPath, StandardCharsets.UTF_8)

+1

All Articles