I store the following JSON objects in a Hive table:
{
"main_id": "qwert",
"features": [
{
"scope": "scope1",
"name": "foo",
"value": "ab12345",
"age": 50,
"somelist": ["abcde","fghij"]
},
{
"scope": "scope2",
"name": "bar",
"value": "cd67890"
},
{
"scope": "scope3",
"name": "baz",
"value": [
"A",
"B",
"C"
]
}
]
}
"features" is an array of variable length, i.e. all objects are optional. Objects have arbitrary elements, but they all contain a "scope", "name" and "value".
This is the Hive table I created:
CREATE TABLE tbl(
main_id STRING,features array<struct<scope:STRING,name:STRING,value:array<STRING>,age:INT,somelist:array<STRING>>>
)
I need a Hive request that returns main_id and a struct value named "baz", i.e.
main_id baz_value
qwert ["A","B","C"]
My problem is that the Hive UDF " get_json_object " only supports a limited version of JSONPath. It does not support the path, for example get_json_object(features, '$.features[?(@.name='baz')]').
How can I request a result using Hive? Could it be easier with a different Hive table structure?