How to find Set - Subset from two files from the command line?

I have two files with sorted lines. One file (B) is a subset of another file (A). I would like to find all lines in A that DO NOT ARE in B. Ideally, I would like to create a file (C) that contains these lines. Is this possible on Unix? I am looking for one command to do this instead of writing a script. I looked through the commands joinand diff, but I could not find the command parameter for this. Thanks for the help.

+5
source share
5 answers

This will suppress common lines:

comm -3 a b
+12
source

How about this:

grep -v -f B A > C
+5
source

diff. Diff ( @johlo grep answer) , ( @johnshen64 comm reply):

$ cat a
a
b
c
d
e
$ cat b
a
b
f
d
e
$ diff -dbU0 a b
--- a   2012-05-18 16:02:30.603386016 -0400
+++ b   2012-05-18 16:02:45.547817122 -0400
@@ -3 +3 @@
-c
+f

, - :

$ diff -dbU0 a b | tail -n +4 | grep ^- | cut -c2-
c
+3
source

This command joinwill do what you ask for:

join -v 1 fileA fileB > fileC

Demonstration:

$ cat fileA
a
c
d
g
h
t
u
v
z
$ cat fileB
a
d
g
t
u
z
$ join -v 1 fileA fileB
c
h
v

This involves sorting the files as you indicated in your question. For unsorted files:

join -v 1 <(sort fileA) <(sort fileB)
+1
source

Awk solution

Input files

and

aaa
bbb
ccc

b

ccc
ddd
eel

the code

awk ' NR==FNR { A[$0]=1; next; }
{ if ($0 in A) { A[$0]=0; } }
END { for (k in A) { if (A[k]==1) { print k; } } } ' a b > c

c (output file)

bbb
aaa
0
source

All Articles