Friday 23 May 2014

DS : TRIE

Trie is an indexed based approach on building trees to store strings (same as that of a dictionary) and then searching them in linear time. The tree formed is different from that of the conventional binary trees that we study, but the approach is more or less quite simple.
The intention is very clear. We need to store a bunch of strings in a data structure first, and then later on when we have a sample string that has to be searched for existence our data structure would serve as a database in which we will search for the string, the same way we search word by word in a dictionary. This reduces the complexity to linear time as opposite to a string comparison search in n^2 complexity.

3 basic things we need to do when we encounter a trie:
  (+)  Tree Initialization - setup tree structure
  (+)  Filling/Insertion - insert you elements/strings
  (+)  Searching - search you favorite hero or movie name ;)

Another point to ponder -
Unlike binary trees, where we have the node of type of a struct and we needed only one of it, here we will have two types of struct.
  (-)  Type1 - With a character array of pointers that will keep track of the string letters as it passes by. And an integer value to keep track of a termination. Obviously multiple occurrences of a string automatically gets ignored.
  (-)  Type2 - With an integer value keeping track of all the nodes it encountered and a pointer to point to nodes of type 1.

The way we will design it will be clear diagrammatically below.



So lets see the code that does the trick. I have furnished it with comments as i usually do so that the picture is clear of what we are currently doing.


#include<iostream>
#include<stdio.h>
#include<string.h>
#include<stdlib.h>

//dont forget to include this, you might get strange errors, LOL :P
using namespace std;

/**
 * trie nodes
 * type1 as mentioned in text
 */
typedef struct trie_node trie_node_t;
struct trie_node
{
    /**
    * contains and integer to count the occurence of
    * any string ending and what number did that
    * string entered on.
    * Also contains an array of size 26 to keep track
    * of all strings passing by.
    **/
    int count;
    trie_node_t * children[26];
};

/**
 * trie root
 * type2 as mentioned in text
 **/
typedef struct trie trie_t;
struct trie
{
    /**
    * contains an integer to keep track of total
    * entries in the structure.
    * also contains a pointer that points to a
    * node. NOTE THE TYPE OF POINTER IT CONTAINS.
    **/
    int value;
    trie_node_t * root;
};


trie_node_t * getNewTrieNode()
{
    /**
    * this function creates a new trie node of type1.
    * all values have been flushed and returned then.
    **/
    trie_node_t * pTemp = (trie_node_t *)malloc(sizeof(trie_node_t));
    pTemp->count=0;                 //reset val
    for(int i=0;i<26;++i)
        pTemp->children[i]=NULL;    //reset val
    return pTemp;

    /**********************************************************/
}

void initializeTRIE(trie_t * pRoot)
{
    /**
    * initialize the trie root node to
    * start linking trees.
    **/
    pRoot->value=0; //reset val
    pRoot->root = getNewTrieNode();
}

void insertTRIE_NODE(trie_t * pRoot,char key[])
{
    /**
    * this function takes in a string and travels
    * through it, and places contents accordingly
    * in the trie.
    **/

    trie_node_t * pCrawl = pRoot->root; //a walking pointer
                                        //through the trie
    pRoot->value++;     //we can increase the value,
                        //as we know we are about to add a new
                        //string to the structure.
    int len = strlen(key);
    for(int i=0; i<len; ++i)    //while the string is not
                                //fully travelled
    {
        if( !pCrawl->children[ key[i]-'a' ] )   //has not been set
        {
            //then set it
            pCrawl->children[ key[i]-'a'] = getNewTrieNode();
        }
        pCrawl = pCrawl->children[key[i]-'a'];  //move ahead
    }
    pCrawl->count = pRoot->value;   //set the end point on the node
                                    //better set it as the entry
                                    //number in the structure
    /***********************************************************/
}




int searchTRIE(trie_t * pRoot,char sample[])
{
    //returns 0 on failure
    //returns 1 on success
    /**
    * first as usual get to the root node
    * and then gather data to start searching
    **/
    trie_node_t * pCrawl = pRoot->root;
    int len = strlen(sample);
    for(int i=0; i<len; ++i)
    {
        /**
        * if a mismatch is found, break off and return
        * a failure flag, else continue searching.
        **/
        if(!pCrawl->children[sample[i]-'a'])
            return 0;   //failure
        pCrawl = pCrawl->children[sample[i]-'a'];
    }

    /**
    * It is possible that the string we were looking for
    * was a substring of a another bigger string.
    * Take for example the string already entered
    * could be 'jaiwardhan', and the search string could
    * be 'jai'.
    * Had that been the case the upper loop would not
    * return a failure.
    * to check if a string did end, we simple check
    * the count value. If it is not 0 means, some
    * entry was made in the past, else there exists
    * none of it.
    **/
    return (pCrawl->count && pCrawl);

    /*******************************************************/
}


int main()
{
    trie_t pRoot;
    initializeTRIE(&pRoot); //INIT trie
    char s1[]="jaiwardhan"; //first entry
    char s2[]="jai";        //second entry
    char s3[]="jayesh";     //third entry

    insertTRIE_NODE(&pRoot,s1); //DO INSERTION
    insertTRIE_NODE(&pRoot,s2); //DO INSERTION
    insertTRIE_NODE(&pRoot,s3); //DO INSERTION

    //search now
    char s[]="jai";
    int result;
    result = searchTRIE(&pRoot,s);
    if(result)cout<<"Found"<<endl;
    else cout<<"Not Found"<<endl;
    return 0;
}
So this was a basic method to setup and use a trie, a very useful tool in programming. It has a wide application prospect and later we will come to know that a special part called suffix array (or suffix tree) with 'Compressed Trie'.
Hope you enjoyed reading it! All comments and suggestions invited :)
Haste-la-vista!!

Wednesday 21 May 2014


Problem : Given two numbers in the form of linked lists. Get the sum of these two numbers in another linked list.


Hi! This is a common question that you may often encounter in interviews. Having a close look, i found that this question is primarily asked by Amazon during first round of technical (f2f / telephonic) interviews. The question is pretty simple. You have two numbers in the form of linked lists. You wanna add them and store the result in another linked list.



Two approaches :
     (+)  Using Stack
     (+)  Using Recursion
We would go the stack way. Would require O(m+n) space but faster access times. Recursion is a better option when you are concerned with space.

Algorithm :
- start from the lower LSB of both numbers and go on adding.
- maintain a variable for the carry (initialize by 0) and if the sum of any two digits exceeds 9 then extract the msb into the carry variable and place the lsb number on result's place.
- Go on doing this till one or both of the numbers end up.
- Add the remaining digits with the carry (if any) and place them to their respective places.

Time analysis :
 (+) Extracting digits and setting them to their respective places seems to be linear as we shall see.
 (+) O(m+n), where m is the size of first number (length of linked list) and n is of the second number.

SPOILER ALERT :
The following code requires some kind knowledge of using STL library of C++. A simple library that we can use.
Stack and its internal functions can be used by the members of the stack data structure.

#include<iostream>
#include<stack>
#include<stdio.h>
#include<stdlib.h>
#include<string.h>

using namespace std;

/**
total complexity = O(m+n)
**/

/*** llist ***/
typedef struct nodej node;
struct nodej
{
    int data;
    struct nodej * next;
};

node * insertNum(int num)
{
    node * temp = (node *)malloc(sizeof(node));
    temp->data = num;
    temp->next = NULL;
    return temp;
}
/*************/

int main()
{
    node *first, *second, *sum;
    node * sum_t=sum;

    /**
    * Given numbers
    * insert them into LL
    **/

    first = insertNum(9);
    first->next = insertNum(1);
    first->next->next = insertNum(3);
    first->next->next->next = insertNum(5);
    second = insertNum(9);
    second->next = insertNum(4);
    second->next->next = insertNum(2);



    stack<int> my1,my2,my3;
    node * temp;
    temp=first;
    /**
    * push the whole linked list into
    * a stack
    **/
    while(temp!=NULL)
    {
        my1.push(temp->data);
        temp = temp->next;
    }

    temp=second;
    while(temp!=NULL)
    {
        my2.push(temp->data);
        temp = temp->next;
    }
    /***********************************/

    /** now solve **/
    int rem=0;
    int a,b,c;

    /**
    * solve by taking top digits, while both exist,
    * and add them with the remainder.
    * if sum>9, extract msb into the remainder and
    * push the lsb into the result stack.
    **/
    while(my2.size()!=0 && my1.size()!=0)
    {
        a=my1.top();
        b=my2.top();
        c=a+b+rem;
        if(c>9)
        {
            rem = c/10;
            c = c%10;
        }
        my1.pop();
        my2.pop();
        my3.push(c);
    }
    /**********************************************/


    /**
    * it may be possible that either of their lengths
    * are greater than the other or not equal.
    * in such a case, add the remaining digits with
    * the remainder from above, extract msb into
    * the remainder and push the lsb into the stack
    **/
    if(my1.size()>0)
    {
        int a;
        while(my1.size()!=0)
        {
            a = my1.top();
            a = a+rem;
            rem = a/10;
            a = a%10;
            my3.push(a);
            my1.pop();
        }
    }
    if(my2.size()>0)
    {
        int a;
        while(my2.size()!=0)
        {
            a = my2.top();
            a = a+rem;
            rem = a/10;
            a = a%10;
            my3.push(a);
            my2.pop();
        }
    }
    /*********************************************/

    /**
    * it is possible that the remainder still
    * goes further into msb than the numbers
    * themselves. hence if there is a remainder
    * which is greater than 0 (obviously), then
    * push to stack
    **/
    if(rem>0)
    {
        my3.push(rem);
    }
    /**********************************************/


    /**
    * now inserting back the stack contents
    * into the linked list.
    **/
    sum = insertNum(my3.top());
    my3.pop();
    sum_t=sum;
    while(my3.size()!=0)
    {
        sum_t->next = insertNum(my3.top());
        sum_t = sum_t->next;
        my3.pop();
    }
    /***********************************************/

    /**
    * printing the result
    **/
    sum_t=sum;
    while(sum_t!=NULL)
    {
        cout<<sum_t->data<<" ";
        sum_t = sum_t->next;
    }
    cout<<endl;

    return 0;
}


So this was way I thought it could be done using stack. However if you are cheesy on space then do proceed for the recursive code. Its process is similar to the above with some modifications in coding style. Any suggestions or pointing out errors would be really helpful (to me and to others).