Defining a Java language with langdetect - how to load profiles?

I am trying to use a Java library called langdetecthosted here . It could not be easier to use:

Detector detector;
String langDetected = "";
try {
    String path = "C:/Users/myUser/Desktop/jars/langdetect/profiles";
    DetectorFactory.loadProfile(path);
    detector = DetectorFactory.create();
    detector.append(text);
    langDetected = detector.detect();
} 
catch (LangDetectException e) {
    throw e;
}

return langDetected;

Except in relation to the method DetectFactory.loadProfile. This library works fine when I give it the absolute path to the file, but in the end it seems to me that I need to pack the code and the langdetectcompanion directory profilesinto the same JAR file:

myapp.jar/
    META-INF/
    langdetect/
        profiles/
            af
            bn
            en
            ...etc.
    com/
        me/
            myorg/
                LangDetectAdaptor --> is what actually uses the code above

I make sure that LangDetectAdaptorwhich is inside myapp.jaris supplied with both dependencies langdetect.jarand those jsonic.jarthat are necessary langdetectfor working at runtime. However, I am confused by what I need to pass in DetectFactory.loadProfileto work:

  • langdetect JAR profiles, JAR. profiles JAR ( ), langdetect.jar, ?

!

. , , langdetect profiles, , JAR. API, , , profiles , DetectFactory.loadProfiles().except("fr"), , .. " !

+5
4

, . , . .

+3

. LangDetect, JarUrlConnection JarEntry. , Java 7.

    String dirname = "profiles/";
    Enumeration<URL> en = Detector.class.getClassLoader().getResources(
            dirname);
    List<String> profiles = new ArrayList<>();
    if (en.hasMoreElements()) {
        URL url = en.nextElement();
        JarURLConnection urlcon = (JarURLConnection) url.openConnection();
        try (JarFile jar = urlcon.getJarFile();) {
            Enumeration<JarEntry> entries = jar.entries();
            while (entries.hasMoreElements()) {
                String entry = entries.nextElement().getName();
                if (entry.startsWith(dirname)) {
                    try (InputStream in = Detector.class.getClassLoader()
                            .getResourceAsStream(entry);) {
                        profiles.add(IOUtils.toString(in));
                    }
                }
            }
        }
    }

    DetectorFactory.loadProfile(profiles);
    Detector detector = DetectorFactory.create();
    detector.append(text);
    String langDetected = detector.detect();
    System.out.println(langDetected);
+5

maven , ( ), , :

https://github.com/galan/language-detector

, / , - , .

, ( , ):

DetectorFactory.loadProfile(new DefaultProfile()); // SmProfile is also available
Detector detector = DetectorFactory.create();
detector.append(input);
String result = detector.detect();
// maybe work with detector.getProbabilities()

I do not like the static approach used by DetectorFactory, but I will not rewrite the full project, you need to create your own fork / pull request :)

+4
source

Setting the working directory fixed the problem for me.

 String workingDir = System.getProperty("user.dir");
 DetectorFactory.loadProfile(workingDir+"/profiles/");
+1
source

All Articles