Innopolis University DevOps Playground
Skip to content

Replace text_similarity_checker.py

Daniil Abrosimov requested to merge text_similarity into changes

This text similarity algorithm maybe not that accurate, but it is way faster. The previous implementation utilized the pretrained language model that increases the accuracy of the text comparison, but it had some problems:

  • Poor computation speed
  • All texts had to be compared in individual manner

New implementation utilezes the TF-IDF algorithm with text preprocessing (not sensitive to context, but way faster) and compares multiple texts simultaneously

Merge request reports