The problem statement looks something like this:
Given a website, we must classify it into one of two predefined classes (say, is it an e-commerce site or not?)
We have already tried Naive Bayes algorithms for this using several preprocessing methods (removing stop words, reduction, etc.) and the corresponding functions.
We want to increase accuracy to 90 or a little closer, which we are not getting from this approach.
The problem is that when evaluating accuracy manually, we look for several identifiers on the web page (for example, the Checkout button, Shop / Shopping, paypal and many others), which are sometimes skipped in our algorithms.
We thought, if we are too confident in these identifiers, why not create rule based classifierwhere we will classify the page according to a set of rules (which will be written based on some priority).
eg. if it contains a store / shops and has an extract button, then this is an e-commerce page. And many similar rules in some priority order.
Depending on several rules, we will also visit other pages of the site (currently we visit only the home page, which is also the reason for not very high accuracy).
What are the potential issues we will face with a rule-based approach? Or would it be better for our use case?
Would it be nice to create these rules with complex algorithms (e.g., FOIL, AQetc.)?