What is the best solution for filtering out Elasticsearch results with hate words? -


i want filter out docs hate words in ealsticsearch result. having bool filter in every search query list of words. , results in tons of slow queries, since list of hate words long (so of hatred around :( )

i wondering best practices spam/hate words filtering.

here considering:

  1. pre-process : scan doc prior indexing , hence mark them bad or not index them. problem : documents indexed several processes , difficult force rule on new component 1 writes.

  2. creating percolator , running periodically (not sure of best frequency , timing) tag documents bad words "baddoc" : true. hence have filter in queries. problem: not sure of performance impact due periodical running of percolator, secondly same problem of discipline in queries exclude baddoc

personally favor pure es solution , sure not new problem, , hence seeking expert guidance , best practices.

thanks , regards varun

using percolator tag bad document need define percolator include search criteria of "hate words".

one possible solution without percolator defining synonym list(if not using already) or extending existing synonym file in analyzer. can define synonym "hate words" gets replaced single term "badbaddocument". during query can filter out bad documents using simple boolean filter containing single term.


Comments

Popular posts from this blog

java - Spring Data JPA: Why findOne(id) executing delete query internally? -

python - Mongodb How to add addtional information when aggregating? -

java - Incorrect order of records in M-M relationship in hibernate -