I need to filter the list of urls in a jsonpath expression containing a substring in Python, I tried the following, but could not get the desired results.
I referred to http://goessner.net/articles/JsonPath/ and http://mikelev.in/2012/08/implementing-jsonpath-in-python-with-examples/
Here are the details of everything I tried:
My json answer:
{
"127.0.0.1": {
"URLs": [
"http://www.test.ca/",
"http://b.scorecardresearch.com/p?ns__t=1387392184071&ns__c=ISO-8859-1&c1=3&c3=_es_7948950&c4=56568219&c5=105139691&c6=&c10=1&c11=1016510&c13=728x90&c16=dfa&c2=14397547&ax_iframe=2&ns_ce_mod=vce_st&ns__p=1387391507295&ax_cid=14397547&ax_bl=0&ax_blt=1228&ns_ad_event=show&ns_ad_id=DCF277937840&ns_ad_sz=728x90",
"http://cdn.media.ca/a/mediative/sites/test_en.js",
"http://pt200233.unica.com/ntpage.gif?js=1&ts=1387392184554.791&lc=http%3A%2F%2Fwww.test.ca%2F%3Fni_title%3D%2Fhome%2Fhomepage&rf=http%3A%2F%2Fwww.test.ca%2F&rs=1680x1050&cd=32&ln=en&tz=GMT%20-05%3A00&jv=1&ck=UnicaID%3DwQVZatfvXZ5-YZ0yaPj&m.pn=homepage&m.mlc=%2Fhome&m.cv_c13=ctest-new&m.cv_c14=en&m.utv=ut.ctest.2.2.131022.74&m.host=www.test.ca&m.page=%2Fhome%2Fhomepage&m.mlc0=home&ets=1387392184559.194&site=test",
]
}
}
Above Json's answer parsed as:
parsed_input = json.loads(urllib.urlopen('<URL for the above JSON response>').read())
To get a list of all the URLs from the JSON response, I tried the following, which works fine:
'\n'.join(jsonpath.jsonpath(parsed_input, '$..URLs[*]'))
Conclusion:
http://www.test.ca/
http://b.scorecardresearch.com/p?ns__t=1387392184071&ns__c=ISO-8859-1&c1=3&c3=_es_7948950&c4=56568219&c5=105139691&c6=&c10=1&c11=1016510&c13=728x90&c16=dfa&c2=14397547&ax_iframe=2&ns_ce_mod=vce_st&ns__p=1387391507295&ax_cid=14397547&ax_bl=0&ax_blt=1228&ns_ad_event=show&ns_ad_id=DCF277937840&ns_ad_sz=728x90"
http://cdn.media.ca/a/mediative/sites/test_en.js"
http://pt200233.unica.com/ntpage.gif?js=1&ts=1387392184554.791&lc=http%3A%2F%2Fwww.test.ca%2F%3Fni_title%3D%2Fhome%2Fhomepage&rf=http%3A%2F%2Fwww.test.ca%2F&rs=1680x1050&cd=32&ln=en&tz=GMT%20-05%3A00&jv=1&ck=UnicaID%3DwQVZatfvXZ5-YZ0yaPj&m.pn=homepage&m.mlc=%2Fhome&m.cv_c13=ctest-new&m.cv_c14=en&m.host=www.test.ca&m.page=%2Fhome%2Fhomepage&m.mlc0=home&ets=1387392184559.194&site=test
Next, I should get only those URLs that contain the word "unica". I tried everything below, but get a TypeError,
what am I missing ?:
'\n'.join(jsonpath.jsonpath(parsed_input, '$..URLs[?(/unica/)]'))
'\n'.join(jsonpath.jsonpath(parsed_input, '$..URLs[?(@(unica))]'))
'\n'.join(jsonpath.jsonpath(parsed_input, '$..URLs[?(@.(*.unica.*))]'))
'\n'.join(jsonpath.jsonpath(parsed_input, '$.*.URLs[?(unica)]'))
'\n'.join(jsonpath.jsonpath(parsed_input, '$.*.URLs[?:unica]'))
,