Delphi's TStringList is one of the objects I love the most. If it's sorted (StringList.Sorted := true) then you can use it to parse huge chunks of data quickly.
For instance, looping through an enormous amount of IP addresses and keeping count of how many times they appeared, is easily done using the following code (not compiled or checked for syntax errors):
ls := TStringList.Create; ls.Sorted := true; for ip in ipAddresses do begin n := ls.IndexOf(ip); if n = -1 then ls.AddObject(ip, TObject(1)) else ls.Objects[n] := TObject(Integer(ls.Objects[n]) + 1); end;
It's very efficient. Since TStringList.IndexOf always does a binary search, it operates in log2(n) time, and using Objects as integers allows us to keep track of count without messing with the string data.
But there are things we can do to speed it up. For instance, TStringList.IndexOf relies on TStringList.Find, which itself uses AnsiCompareStr, which is a slow Windows call, taking locale and its mother into consideration. Overriding this with our own method should be worthwhile. (The code below is adapted straight from the Classes unit.)
type TStringListEx = class(TStringList) public function Find(const S: string; var Index: Integer): Boolean; override; end;
function TStringListEx.Find(const S: string; var Index: Integer): Boolean; var L, H, I, C: Integer; begin Result := False; L := 0; H := Count - 1; while L <= H do begin I := (L + H) shr 1; C := CompareStr(Get(I), S); if C < 0 then L := I + 1 else begin H := I - 1; if C = 0 then begin Result := True; if Duplicates <> dupAccept then L := I; end; end; end; Index := L; end;
We've replaced AnsiCompareStr with Delphi's own CompareStr, which is a highly optimized FastCode routine. There are some drawbacks - things will always be sorted in byte order and no case-sensitivity is done. But we don't care about this - it can always be done afterwards; right now, speed is the main importance.
And it turns out that using the above code, in pure examples, can slash execution time with up to about 80%. Dramatic savings, indeed. In my own example, where I analyze ftp log data, I managed to cut execution time on 122 MB of data down from 7 seconds down to 3.1 seconds.
Best of all, since TStringList.Find is declared virtual, we don't need to change any types anywhere, just do a TStringListEx.Create instead of a TStringList.Create and you're good to go.