How to group unknown text messages using an algorithm?

Question

How to group unknown text messages using an algorithm?

Below is an example of a dataset that I need to combine together, if you look carefully, these are basically similar text strings, but with very little difference in the presence of an identifier or a person identifier.

Unexpected error:java.lang.RuntimeException:Data not found for person 1X99999123 . Clear set not defined . Dump
Unexpected error:java.lang.RuntimeException:Data not found for person 2X99999123 . Clear set not defined . Dump
Unexpected error:java.lang.RuntimeException:Data not found for person 31X9393912 . Clear set not defined . Dump
Unexpected error:java.lang.RuntimeException:Data not found for person 36X9393912 . Clear set not defined . Dump
Exception in thread "main" javax.crypto.BadPaddingException: ID 1 Given final block not properly padded
Exception in thread "main" javax.crypto.BadPaddingException: ID 2 Given final block not properly padded
Unexpected error:java.lang.RuntimeException:Data not found for person 5 . Clear set not defined . Dump
Unexpected error:java.lang.RuntimeException:Data not found for person 6 . Clear set not defined . Dump
Exception in thread "main" java.lang.NullPointerException at TripleDESTest.encrypt(TripleDESTest.java:18)

I want to group them so that the end result looks like

Unexpected error:java.lang.RuntimeException:Data not found - 6
Exception in thread "main" javax.crypto.BadPaddingException - 2
Exception in thread "main" java.lang.NullPointerException at - 1

Is there an available API or algorithm to handle such cases?

Thanks at Advance. cheers shakti

+3

java algorithm machine-learning

Shakti May 03 '12 at 14:11

source share

3 answers

, , , ... , ....

, , , , StringTokenizer...

0

obelix 03 '12 14:13

If you know the message format, the easiest way is to use a regular expression and count matches.

Regular expressions are fully supported in Java, and their use is certainly faster than the clustering algorithm.

0

Aslan986 May 04 '12 at 8:59

source share

amit · Accepted Answer · 2012-05-03T14:21:03+0000

This question has been marked as machine learning, so I’m going to suggest a classification approach.

tokenize - , .

, , () C4.5 - . , > 1.

, "" ! , .

, , , .

:

, , , msg (NPE ) - , , , BadPaddingException.
- weka - java
, , , 10- ?

How to group unknown text messages using an algorithm?

More articles: