TY - JOUR

T1 - Negative selection algorithms on strings with efficient training and linear-time classification

AU - Elberfeld, Michael

AU - Textor, Johannes

PY - 2011/2/16

Y1 - 2011/2/16

N2 - A string-based negative selection algorithm is an immune-inspired classifier that infers a partitioning of a string space Σℓ into "normal" and "anomalous" partitions from a training set S containing only samples from the "normal" partition. The algorithm generates a set of patterns, called "detectors", to cover regions of the string space containing none of the training samples. Strings that match at least one of these detectors are then classified as "anomalous". A major problem with existing implementations of this approach is that the detector generating step needs exponential time in the worst case. Here we show that for the two most widely used kinds of detectors, the r-chunk and r-contiguous detectors based on partial matching to substrings of length r, negative selection can be implemented more efficiently by avoiding generating detectors altogether: for each detector type, training set S⊆Σℓ and parameter r≤ℓ one can construct an automaton whose acceptance behaviour is equivalent to the algorithm's classification outcome. The resulting runtime is O(|S|ℓr|Σ|) for constructing the automaton in the training phase and O(ℓ) for classifying a string.

AB - A string-based negative selection algorithm is an immune-inspired classifier that infers a partitioning of a string space Σℓ into "normal" and "anomalous" partitions from a training set S containing only samples from the "normal" partition. The algorithm generates a set of patterns, called "detectors", to cover regions of the string space containing none of the training samples. Strings that match at least one of these detectors are then classified as "anomalous". A major problem with existing implementations of this approach is that the detector generating step needs exponential time in the worst case. Here we show that for the two most widely used kinds of detectors, the r-chunk and r-contiguous detectors based on partial matching to substrings of length r, negative selection can be implemented more efficiently by avoiding generating detectors altogether: for each detector type, training set S⊆Σℓ and parameter r≤ℓ one can construct an automaton whose acceptance behaviour is equivalent to the algorithm's classification outcome. The resulting runtime is O(|S|ℓr|Σ|) for constructing the automaton in the training phase and O(ℓ) for classifying a string.

UR - http://www.scopus.com/inward/record.url?scp=78651312881&partnerID=8YFLogxK

U2 - 10.1016/j.tcs.2010.09.022

DO - 10.1016/j.tcs.2010.09.022

M3 - Journal articles

AN - SCOPUS:78651312881

SN - 0304-3975

VL - 412

SP - 534

EP - 542

JO - Theoretical Computer Science

JF - Theoretical Computer Science

IS - 6

ER -