Regular expression matches do not always line up nicely on word boundaries, so the inverted index cannot be based on words Then _________ old information retrieval trick and build an index of n-grams, substrings of length n. This sounds more general than it is. In practice, there are too few distinct 2-grams and too many distinct 4-grams, so 3-grams (trigrams) it is.

🎲 Try a Random Question  |  Total Questions in Quiz: 79  |  🧠 Study this quiz with Flashcards
This question is part of a full practice quiz:
Data Science - Information Retrieval — practice the complete quiz, review flashcards, or try a random question.

Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other content-based indexing.


1. Regular expression matches do not always line up nicely on word boundaries, so the inverted index cannot be based on words Then _________ old information retrieval trick and build an index of n-grams, substrings of length n. This sounds more general than it is. In practice, there are too few distinct 2-grams and too many distinct 4-grams, so 3-grams (trigrams) it is.