Join, omitting output lines when sorting input

I have two files: aa and bb:

 $ cat aa 
84 xxx
85 xxx
10101 sdf
10301 23

 $ cat bb
82 asd
83 asf
84 asdfasdf
10101 22232
10301 llll

I use the join command to join them:

 $ join aa bb
84 xxx asdfasdf

but the expected 84, 10101 and 10301 all joined. Why only 84 joined?

+5
source share
3 answers

Use a lexicographic view rather than a numerical sort.

To do this as part of the process:

$ join <(sort aa) <(sort bb)

This gives the result:

10101 sdf 22232
10301 23 llll
84 xxx asdfasdf
+9
source

You did not indicate that an error message is displayed:

$ join aa bb
join: file 2 is not in sorted order
84 xxx asdfasdf
join: file 1 is not in sorted order

You can use the usual lexicographic view:

join <(sort aa) <(sort bb) | sort -k1,1n
+7
source

, awk:

join \
 <(awk '{printf("%05d %s\n", $1, $2)}' aa) \
 <(awk '{printf("%05d %s\n", $1, $2)}' bb) \
| awk '{print int($1),$2,$3}'

, :

84 xxx asdfasdf
10101 sdf 22232
10301 23 llll

, Unix sort - O (n log n).

+3

All Articles