banner
李大仁博客

李大仁博客

天地虽大,但有一念向善,心存良知,虽凡夫俗子,皆可为圣贤。

[Algorithm] Comparison of two string matching algorithms (Index method, KMP algorithm), implemented in C language.

Today I made a simple character comparison program that implements the operation of deleting the most characters containing B from string A. For example, if A = "aaaaabbbbbbabababa" and B = "aaccbaab", "aab" should be deleted, not "aa". I believe friends who know search engines must know about this algorithm. This algorithm is mainly used to remove invalid keywords from web pages, reducing the computational cost of indexing. Well, let's talk about two commonly used string matching algorithms today: KMP algorithm and index method.

KMP algorithm is a fast string matching algorithm proposed by Knuth, Morris, and Pratt, hence the name KMP algorithm. It is a classic algorithm, and there are also BM and AB-BM algorithms developed later. But don't worry, I'll talk about them next time. I haven't had time to write a blog recently, but the principle is very simple. It uses additional values to record the number of index matches, and then calculates the result based on this result. Don't know? You can refer to http://www.chinaitpower.com/A/2003-01-04/45995.html

The index method is the simplest string matching algorithm. The principle is to try one by one, find the first one, and then see if it matches the second one, and so on until the end of the string. It is a very classic algorithm.

Now let's make a simple comparison and look at the code:

#include "stdio.h"
#include "string.h"
#include "stdlib.h"
/getnext/
void getNext(char p[],int next[]){
int i, j, slen;
slen = strlen(p);
i = 0;
j = -1;
next[0] = -1;
while(i < slen){
if((j == -1) || (p[i] == p[j])){
i++;
j++;
next[i] = j;
}/if/
else{
j = next[j];
}/else/
}/while/
}/get_next/

int kmp(char s[], char p[], int pos, int next[]) {
/KMP algorithm/
int i, j, slen, plen;
i = pos;
j = 0;
slen = strlen(s);
plen = strlen(p);
while((i < slen) && (j < plen)){
if((j == -1) || (s[i] == p[j])){
i++;
j++;
}
else
{
j = next[j];
}
}/while/
if(j >= plen) {
return (i-plen);
}/if/
else {
return -1;
}/else/
}/kmp/

int index(char s[], char p[], int pos){
/Index method/
int i, j, slen, plen;
i = pos;
j = 0;
slen = strlen(s);
plen = strlen(p);
while((i < slen) && (j < plen)){
if((s[i] == p[j])){
i++;
j++;
}/if/
else{
i = i-j+1;
j = 0;
}/else/
}/while/
if(j >= plen) {
return (i-plen);
}
else {
return -1;
}
}/index/

void main(){
char s[] = "acbaabcaacabaabaabcacaabc"; /Test string/
char p[] = "abaabca";
int next[50];
getNext(p, next);
printf("found index at %dn", kmp(s, p, 0, next));
printf("found index at %dn", index(s, p, 0));
}/main/

The KMP algorithm is the kmp() function, and the index method is the index() function. I believe you all can see it. KMP requires an additional calculation array, which is the essence and efficiency of the algorithm. It improves the efficiency by an order of magnitude. I believe you all know the result of improving efficiency in this way. However, in the worst case, it may waste a lot of space. Well, the algorithm is provided. If you don't understand the algorithm part, you can leave a comment or contact me directly.

Attachment: KMP matching algorithm C++ implementation algorithm from the internet, sourced from http://www.chinaitpower.com/A/2003-01-04/45995.html. Author, please don't mind, I'm just borrowing it for everyone's reference. I personally prefer pure C, and I haven't been using C++ much recently.

KMP algorithm to find the number of occurrences of string P in string S, count
#include
#include
using namespace std;

inline void NEXT(const string& T,vector& next)
{
//Generate vector next(T.size()) according to pattern string T
next[0]=-1;
for(int i=1;i<T.size();++i)
{
int j=next[i-1];
while(j>=0 && T[i]!=T[j+1])
j=next[j];
if(T[i]==T[j+1])next[i]=j+1;
else next[i]=0;
}
}

inline string::size_type COUNT_KMP(const string& S,const string& T)
{
//Count the number of occurrences of pattern string T in main string S using KMP algorithm
//where T is not empty
vector next(T.size());
NEXT(T,next);
string::size_type index,count=0;
for(index=0;index<S.size();++index)
{
int pos=0;
for(int i=index;i<S.size();++i)
{
if(S[i]==T[pos])
{
if(pos==T.size()-1)
{
++count;
break;
}
++pos;
}
else
{
break;
}
}
}
return count;
}

int main()
{
string S="acbaabcaacabaabaabcacaabc"; //Test string
string T="abaabca";
cout<<"found index at "<<COUNT_KMP(S,T)<<endl;
system("PAUSE");
return 0;
}

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.