I have several hundred thousand documents in the elasticsearch index with corresponding latitudes and longitudes (stored as geo_point types). I would like to be able to create a map visualization that looks something like this: http://leaflet.imtqy.com/Leaflet.markercluster/example/marker-clustering-realworld.388.html
So, I think that I want to run a query with a bounding box (i.e. the borders of the map the user is looking at) and return a summary of the clusters in that bounding box. Is there a good way to achieve this in elasticsearch? Perhaps a new indexing strategy? Something like geohash might work, but it would cluster things into a rectangular grid, and not into arbitrary polygons based on the density of points, as can be seen from the above example.
@kumetix - Good question. I am responding to your comment here because the text is too long to add another comment. The geohash_precision parameter will determine the maximum accuracy with which geohash aggregation can return. For example, if geohash_precision is set to 8, we can start geohash aggregation in this field with a maximum accuracy of 8. This, according to reference , results are grouped into geohash fields of approximately 38.2mx 19m. An accuracy of 7 or 8 would probably be accurate enough to display a web map similar to the one I mentioned in the above example.
How much geohash_precision affects the internal elements of the cluster, I assume that this parameter stores in geo_point a geohash string of length <= geohash_precision. Let's say we have a point in the Statue of Liberty: 40.6892, -74.0444. Geohash12 for this: dr5r7p4xb2ts. Setting geohash_precision to geo_point to 8 will internally store the lines: d dr DR5 dr5r dr5r7 dr5r7p dr5r7p4 dr5r7p4x
and geohash_precision of 12 will additionally store the lines: dr5r7p4xb dr5r7p4xb2 dr5r7p4xb2t dr5r7p4xb2ts
which leads to less storage overhead for each geoinformation. Setting geohash_precision to a distance value (1 km, 1 m, etc.) probably just saves it at the closest geohash string length accuracy value.
Note. How to calculate geohash using python
$ pip install python-geohash
>>> import geohash
>>> geohash.encode(40.6892,-74.0444)
'dr5r7p4xb2ts'
source