I have an external table in the hive
CREATE EXTERNAL TABLE FOO (
TS string,
customerId string,
products array< struct <productCategory:string, productId:string> >
)
PARTITIONED BY (ds string)
ROW FORMAT SERDE 'some.serde'
WITH SERDEPROPERTIES ('error.ignore'='true')
LOCATION 'some_locations'
;
A table record can store data such as:
1340321132000, 'some_company', [{"productCategory":"footwear","productId":"nik3756"},{"productCategory":"eyewear","productId":"oak2449"}]
Does anyone know if there is a way to simply extract the entire product category from this record and return it as an array of productCategories without using an explosion. Something like the following:
["footwear", "eyewear"]
Or do I need to write my own GenericUDF, if so, I don’t know much Java (Ruby man), can someone give me some advice? I read some UDF instructions from Apache Hive. However, I do not know what type of collection is best to process an array, and what type of collection to process structures?
===
I answered this question somewhat by writing GenericUDF, but I ran into two other problems. It is in this CO question