Here's an implementation of the GNU function [[http://linux.die.net/man/3/memmem|memmem]] that appears to be much faster (2x) than the standard memmem included on my Debian system for my workload. #include #include // Implements the GNU function memmem much faster (2x) than the standard memmem included on my Debian system char* memmem(char* haystack, int hlen, char* needle, int nlen) { if (nlen > hlen) return 0; int i=0,j=0; switch(nlen) { // we have a few specialized compares for certain needle sizes case 0: // no needle? just give the haystack return haystack; case 1: // just use memchr for 1-byte needle return memchr(haystack, needle[0], hlen); case 2: // use 16-bit compares for 2-byte needles for (i=0; i Also, here's a really dumb implementation for no reason. char* DUMB_memmem(char* haystack, int hlen, char* needle, int nlen) { // naive implementation if (nlen > hlen) return 0; int i; for (i=0; i