So I have a problem that basically looks like this: I have many lines and I want to build a DAG so that each path matches a line and vice versa. However, I have the right to freely rearrange my lines. The order of characters does not matter. The DAGs that I create have a cost associated with them. In principle, the cost of a branch in a DAG is proportional to the length of its child paths.
For example, let's say I have the lines BAAA, CAAA, DAAA, and I create a DAG that represents them without rearranging them. I get:
() β (B, C, D) β A β A β A
where the tuple represents the branch.
A cheaper view for my purposes would be:
() β A β A β A β (B, C, D)
The problem is this: n lines are given, rearrange the lines so that the corresponding DAG has the cheapest cost, where is the cost function: If we first cross the graph from the depth of the source, in order from left to right, the total number of nodes we visit, with multiplicity.
Thus, the cost of the first example is 12, because we have to visit A ββseveral times to bypass. The cost of the second example is 6, because we visit only once before we deal with branches.
I feel like NP Hard. This seems like a question about formal languages, and I am not familiar with such algorithms to figure out how I should go about reduction. I do not need a complete answer as such, but if someone could point out a class of well-known issues that seem to be related, I would really appreciate it.