How to find Set - Subset from two files from the command line?

Question

How to find Set - Subset from two files from the command line?

I have two files with sorted lines. One file (B) is a subset of another file (A). I would like to find all lines in A that DO NOT ARE in B. Ideally, I would like to create a file (C) that contains these lines. Is this possible on Unix? I am looking for one command to do this instead of writing a script. I looked through the commands joinand diff, but I could not find the command parameter for this. Thanks for the help.

+5

set linux unix bash zsh

drbunsen May 18 '12 at 19:56

source share

5 answers

How about this:

grep -v -f B A > C

+5

johlo May 18, '12 at 20:03

source share

diff. Diff ( @johlo grep answer) , ( @johnshen64 comm reply):

$ cat a
a
b
c
d
e
$ cat b
a
b
f
d
e
$ diff -dbU0 a b
--- a   2012-05-18 16:02:30.603386016 -0400
+++ b   2012-05-18 16:02:45.547817122 -0400
@@ -3 +3 @@
-c
+f

, - :

$ diff -dbU0 a b | tail -n +4 | grep ^- | cut -c2-
c

+3

derobert May 18, '12 at 20:09

source share

This command joinwill do what you ask for:

join -v 1 fileA fileB > fileC

Demonstration:

$ cat fileA
a
c
d
g
h
t
u
v
z
$ cat fileB
a
d
g
t
u
z
$ join -v 1 fileA fileB
c
h
v

This involves sorting the files as you indicated in your question. For unsorted files:

join -v 1 <(sort fileA) <(sort fileB)

+1

Dennis williamson May 18, '12 at 22:13

source share

Awk solution

Input files

and

aaa
bbb
ccc

b

ccc
ddd
eel

the code

awk ' NR==FNR { A[$0]=1; next; }
{ if ($0 in A) { A[$0]=0; } }
END { for (k in A) { if (A[k]==1) { print k; } } } ' a b > c

c (output file)

bbb
aaa

0

Debaditya May 18, '12 at 20:21

source share

johnshen64 · Accepted Answer · 2012-05-18T20:05:16+0000

This will suppress common lines:

comm -3 a b

How to find Set - Subset from two files from the command line?

More articles: