Skip to content

Add efficient findCharIndex #445

@ghost

Description

findCharIndex :: Char -> Text -> Maybe Int

When having look at #369, it looks like having access to this primitive, with a fast C implementation that uses the known size of each char in utf-8 might be able to make isSubsequenceOf quite fast, particularly when the haystack is quite large and the needle chars are spread far apart.

int findCharIndex(const char * haystack, int length, int target) {
    char * offset;
    if (target < 128) {
        offset = memchr(haystack, length target);
    } else {
        // Implementation left as an exercise to the reader, but it would look for
        // utf-8 encoded bytes for the Char.
        offset = efficientlyFindUTF8Codepoint(haystack, length, target);
    }
    if (offset == NULL) return -1;
    return (int)(offset - haystack);
}

Having this also means we could add the rule

{-# RULES "findIndex Char" findIndex (c ==) = findCharIndex c #-}
{-# RULES "findIndex Char2" findIndex (== c) = findCharIndex c #-}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions