Dec 30, 2024
Coming up with a hard threshold for similarity and keeping or discarding listings based on the threshold is enforcing a binary decision. Such binary decisions are not "smooth" and are usually difficult to get good performance out of because they are losing a lot of information about the degree of similarity, its relation to the scores etc and mapping everything to 1/0. It is usually better to have scoring functions that generalize smoothly.