The array intersects the hive

I have two string arrays in Hive like

{'value1','value2','value3'}
{'value1', 'value2'}

I want to combine arrays without duplicates, the result:

{'value1','value2','value3'}

How can I do this in the hive?

+3
source share
2 answers

You will need UDF for this. Klout has a bunch of open source HivUDFS under the brick house package. Here is the github link . They have a bunch of UDF that exactly matches your purpose. Download, create and add a JAR. Here is an example

CREATE TEMPORARY FUNCTION combine AS 'brickhouse.udf.collect.CombineUDF';
CREATE TEMPORARY FUNCTION combine_unique AS 'brickhouse.udf.collect.CombineUniqueUDAF';

select combine_unique(combine(array('a','b','c'), array('b','c','d'))) from reqtable;

OK
["d","b","c","a"]
+5
source

A native solution might be the following:

SELECT id, collect_set(item)
FROM table
LATERAL VIEW explode(list) lTable AS item
GROUP BY id;

First explode with a side view, then the next group and remove duplicates using collect_set.

+5
source

All Articles