Substring Index

Short description: Data structure

In computer science, a substring index is a data structure which gives substring search in a text or text collection in sublinear time. If you have a document [math]\displaystyle{ S }[/math] of length [math]\displaystyle{ n }[/math], or a set of documents [math]\displaystyle{ D=\{S^1,S^2, \dots, S^d\} }[/math] of total length [math]\displaystyle{ n }[/math], you can locate all occurrences of a pattern [math]\displaystyle{ P }[/math] in [math]\displaystyle{ o(n) }[/math] time. (See Big O notation.)

The phrase full-text index is also often used for an index of all substrings of a text. But this is ambiguous, as it is also used for regular word indexes such as inverted files and document retrieval. See full text search.

Substring indexes include:

Suffix tree
Suffix array
N-gram index, an inverted file for all N-grams of the text
Compressed suffix array^[1]
FM-index
LZ-index

References

↑ R. Grossi and J. S. Vitter, Compressed Suffix Arrays and Suffix Trees, with Applications to Text Indexing and String Matching, SIAM Journal on Computing, 35(2), 2005, 378–407.

0.00

(0 votes)

Original source: https://en.wikipedia.org/wiki/Substring index. Read more

[1] R. Grossi and J. S. Vitter, Compressed Suffix Arrays and Suffix Trees, with Applications to Text Indexing and String Matching, SIAM Journal on Computing, 35(2), 2005, 378–407.