I have a dataset that looks something like this:
id1 id2 size
1 5400 5505 7
2 5033 5458 1
3 5452 2873 24
4 5452 5213 2
5 5452 4242 26
6 4823 4823 4
7 5505 5400 11
Where id1and id2are the only nodes in the graph, and sizeis the value assigned to the directional edge connecting them from id1to id2. This data set is quite large (just over 2 million rows). What I would like to do is summarize a size column grouped by unordered node pairs from id1and id2. For example, in the first line we have id1=5400and id2=5505. There is another line in the data frame, where id1=5505and id2=5400. In grouped data, the sum of the size columns for these two rows will be added to one row. So I want to summarize the data in which I group the (unordered) set (id1, id2). I found a way to do this usingapplywith a custom function that checks the inverse of a pair of columns in a complete dataset, but it is painfully slow. Does anyone know a way to do this differently, perhaps with plyror with something in the base packages that would be more efficient?
source
share