Substring Index

From Handwiki

Short description: Data structure

In computer science, a substring index is a data structure which gives substring search in a text or text collection in sublinear time. If you have a document [math]\displaystyle{ S }[/math] of length [math]\displaystyle{ n }[/math], or a set of documents [math]\displaystyle{ D=\{S^1,S^2, \dots, S^d\} }[/math] of total length [math]\displaystyle{ n }[/math], you can locate all occurrences of a pattern [math]\displaystyle{ P }[/math] in [math]\displaystyle{ o(n) }[/math] time. (See Big O notation.)

The phrase full-text index is also often used for an index of all substrings of a text. But this is ambiguous, as it is also used for regular word indexes such as inverted files and document retrieval. See full text search.

Substring indexes include:

  • Suffix tree
  • Suffix array
  • N-gram index, an inverted file for all N-grams of the text
  • Compressed suffix array[1]
  • FM-index
  • LZ-index

References

  1. R. Grossi and J. S. Vitter, Compressed Suffix Arrays and Suffix Trees, with Applications to Text Indexing and String Matching, SIAM Journal on Computing, 35(2), 2005, 378–407.



Retrieved from "https://handwiki.org/wiki/index.php?title=Substring_index&oldid=3369794"

Categories: [Algorithms on strings] [String data structures] [Database index techniques] [Substring indices]


Download as ZWI file | Last modified: 03/04/2024 11:38:28 | 3 views
☰ Source: https://handwiki.org/wiki/Substring_index | License: CC BY-SA 3.0

ZWI is not signed. [what is this?]