Score dependent of the position in an array -


i'm indexing documents in elastic search contain arrays.

sample documents :

doc1: {   ...   actors: ["tom cruise", "brad pitt", ...],   ... }  doc2: {   ...   actors: ["brad pitt", "tom cruise", ...],   ... } 

when searching in such documents, have score dependent of matching position in array, meaning in sample documents, searching "tom cruise" should boost first document doc1 because matching position 1.

the solution can think of right adding limited number of fields (something 5) containing first actors, , putting boosts, :

doc1: {   ...   actors: ["tom cruise", "brad pitt", ...],   actor1: "tom cruise",   actor2: "brad pitt",   ... } 

with actor1 having boost of 5, actor2 4, , on.

do have better solution handle that, maybe using custom_score ?

thanks !

given this

curl -xpost localhost:9200/films  curl -xpost localhost:9200/films/film/1 -d'{     actors: ["tom cruise", "brad pitt", "patrick stewart", "christopher walken"] }' curl -xpost localhost:9200/films/film/2 -d'{     actors: ["brad pitt", "patrick stewart", "tom cruise", "christopher walken"] }' 

then query

{     "query":{         "custom_score":{             "query": {"match_all":{}},             "script":"length = _source.actors.size();             found = false; index=0;             while(!found && index<length){               if(_source.actors[index] == target){                 found=true;               }               else{                 index+=1               }             }             length - index;",             "params":{                 "target": "tom cruise"             }         }     } } 

calculates score of 4 first film , 2 last 1 (if you're pasting curl, had remove line breaks in custom script)

some caveats:

  • you want better way of converting offset score: code returns length - offset score can compare things of same length
  • it looks doc.actors (i.e. indexed data) has alphabetically sorted version of array, not useful, had use _source believe lot slower. might acceptable performance-wise if custom_score query wraps filtered query.

Comments