-
Notifications
You must be signed in to change notification settings - Fork 159
Open
Labels

Description
findCharIndex :: Char -> Text -> Maybe Int
When having look at #369, it looks like having access to this primitive, with a fast C implementation that uses the known size of each char in utf-8 might be able to make isSubsequenceOf
quite fast, particularly when the haystack is quite large and the needle chars are spread far apart.
int findCharIndex(const char * haystack, int length, int target) {
char * offset;
if (target < 128) {
offset = memchr(haystack, length target);
} else {
// Implementation left as an exercise to the reader, but it would look for
// utf-8 encoded bytes for the Char.
offset = efficientlyFindUTF8Codepoint(haystack, length, target);
}
if (offset == NULL) return -1;
return (int)(offset - haystack);
}
Having this also means we could add the rule
{-# RULES "findIndex Char" findIndex (c ==) = findCharIndex c #-}
{-# RULES "findIndex Char2" findIndex (== c) = findCharIndex c #-}