Retrieving an array of structures in the hive

I have an external table in the hive

CREATE EXTERNAL TABLE FOO (  
  TS string,  
  customerId string,  
  products array< struct <productCategory:string, productId:string> >  
)  
PARTITIONED BY (ds string)  
ROW FORMAT SERDE 'some.serde'  
WITH SERDEPROPERTIES ('error.ignore'='true')  
LOCATION 'some_locations'  
;

A table record can store data such as:

1340321132000, 'some_company', [{"productCategory":"footwear","productId":"nik3756"},{"productCategory":"eyewear","productId":"oak2449"}]

Does anyone know if there is a way to simply extract the entire product category from this record and return it as an array of productCategories without using an explosion. Something like the following:

["footwear", "eyewear"] 

Or do I need to write my own GenericUDF, if so, I don’t know much Java (Ruby man), can someone give me some advice? I read some UDF instructions from Apache Hive. However, I do not know what type of collection is best to process an array, and what type of collection to process structures?

===

I answered this question somewhat by writing GenericUDF, but I ran into two other problems. It is in this CO question

+5
3

(, 2). :

products[0].productCategory,products[1].productCategory

, UDF . , JRuby. GL!

0

- inline explode, :

SELECT 
    TS,
    customerId,
    pCat,
    pId,
FROM FOO 
LATERAL VIEW inline(products) p AS pCat, pId

UDF. . :

0

json serde build-in get_json_object, json_tuple.

rcongiu Hive-JSON SerDe :

:

CREATE TABLE complex_json (
DocId string,
Orders array<struct<ItemId:int, OrderDate:string>>)

json (, ):

{"DocId":"ABC","Orders":[{"ItemId":1111,"OrderDate":"11/11/2012"},{"ItemId":2222,"OrderDate":"12/12/2012"}]}

Then fetching order IDs is as easy as:

SELECT Orders.ItemId FROM complex_json LIMIT 100;

It will return a list of identifiers for you:

Itemid [1111,2222]

Checked to return correct results in my environment. Full list:

add jar hdfs:///tmp/json-serde-1.3.6.jar;

CREATE TABLE complex_json (
  DocId string,
  Orders array<struct<ItemId:int, OrderDate:string>>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe';

LOAD DATA INPATH '/tmp/test.json' OVERWRITE INTO TABLE complex_json;

SELECT Orders.ItemId FROM complex_json LIMIT 100;

More details here:

http://thornydev.blogspot.com/2013/07/querying-json-records-via-hive.html

0
source

All Articles