Friday 31 January 2014

Problem : Finding if a given string is a rotation of another given string.

Aaah! After bit of a break I am back writing somthing again.
Now this one is a common problem in strings. Two strings be given to you.

Let us assume there are two strings A & B.

A : BCAD B : DBCA

Clearly we can see that if we rotate the string A towards right by 1 then we get the string B. And hence these two are rotations of each other.
One possible way of thinking :
One can possibly think about one of the ways as like follows. I take the first character in A and find this character in B. I do this in O(n). And then I traverse forward in B and A simultaneously by keeping index condition as i%length. Where 'length' is the length of the strings. Obviously this will avoid index overflow till we reach the start index in A again (i hope you did the math). This way if each character matches we reach the start index in A again meaning the strings are identical and hence only differ in rotation. Hence we did this in a linear time complexity.

A sample code for this :

int checkStrRotate(int org[],int sample[],int strLength)
{
  /**
  ** function to check string rotations, provided repetition is not allowed
  ** and that we have already handled the case that both the strings are of equal length
  ** return type :
  **   1  if  string is a rotation
  **  -1  else wise
  **/
  char c=org[0];
  int flag=0,index;
  for(int j=0;j<strLength  && !flag ;j%=strLength)
  {
    if(c==sample[j])//found
    {
      index=j;
      flag=1;
    }
    ++j;
  }
  if(flag==0) //not found this character, return false
    return -1;
  index=(index+1)&37;strLength;
  flag=1;
  for(int j=1;j!=0&&flag;j=(j+1)&37;strLength)
  {
    if(org[j]!=sample[index])//does not match, setup exit condition
      flag=0;
    index=(index+1)&37;strLength; //increase index without overflow
  }
  if(flag)//successfull attempt
    return 1;
  else
    return -1;
}

So far so good.
The Catch : The above method works very well as far as distinct components are considered in the string. Hence the other case that we left was including repeated characters. In that case the method was saw earlier would simply fail.
Hence we must change the approach. Okay what is special about A & B? Its just if we rotate A about some parameter we would eventually land up at B a.w.a. all the letters in A are present in B.

Hence another approach is as follows.
* Take a temporary sting.
* Copy the string A in the temp string.
* Concatenate the string A with temp string. Hence what you get is two times A concatenated in temp string.
* now simple use the library function substr from strings.h to know if the string B is a substring of A.

This may be a bit hard to click on the first go with a newbie programmer. But believe me this method is very elegant and the beauty lies in its simplicity of idea. Using the inbuilt library function is a very good practice (unless specified) and especially with the STL.
Here is the code of the above said :

# include <stdio.h>
# include <string.h>
# include <stdlib.h>
 
/* Function to check if str1 and str2
   are rotations of each other */
int checkRot(char *str1, char *str2)
{
  /**
  ** this function returns 0 on failure
  ** returns 1 on success
  **/
  int size1   = strlen(str1); //get length
  int size2   = strlen(str2); //get length
  char *temp;  //we take pointer and dynamic allocation
    //is intended as the length of string might
  //be variable
  void *ptr;
 
  /* First check if sizes of two strings are equal or not */
  if (size1 != size2)//failed
     return 0;
 
  /* now create a temp string with value concat(str1,str1) */
  temp   = (char *)malloc(sizeof(char)*(size1*2 + 1));
  temp[0] = '\0';
  strcat(temp, str1);
  strcat(temp, str1);
 
  /* Now checking for presence of the other */
  ptr = strstr(temp, str2);
 
  free(temp); // Free dynamically allocated memory
 
  /* Property of strstr returns NULL if
 the second string is NOT a
     substring of first string */
  if (ptr != NULL)//success
    return 1;
  else //failure
    return 0;
}
 
/************ MAIN **************/
int main()
{
    char *str1 = "AACD";
    char *str2 = "ACDA";
 
    if (checkRot(str1, str2))
       printf("They are rotations");
    else
       printf("They are not rotations");
    return 0;
}

I hope the code above was pretty explainable and it did the trick. This is a good question at starter levels in strings and maybe asked at interviews or level testers. I hope this article made a good ground and I was explainable at all key points. I thank everyone referred to my page and appreciated me. Do refer with your doubts if any. I will keep posting and maybe next topic be some STL topics :) Bye for now!

Friday 3 January 2014

Code : Fastinput / Fastread Integers

Fastread (for online programming sites)

Some of you who might had been writing code for online programming websites like codechef often use scanf or cin inbuilt methods to get your input. This article that i am posting today is based on several arguments that programmers believe and is true to the most extent. I will be telling about gathering input from the stream in a faster way into variables which i am assuming here to be integer types. Later you will come to realize that with a small change in the code (by appending some if's and else's) you can do the same for floats as well. However i would be concerned with integers and their bigger counterparts only.

Getting that fast from input might save some of your time and is good when you are in a competition where the ranking is separated by the amount of time your code eats. You might want to make it tighter and fast enough to beat your opponents.

Also i used a word 'stream' above in my article. If you are not familiar with it follow this wiki link (i am sorry its from wikipd but its reliable for the moment ;) ).

Anyways, gathering input such as numbers can be done by gathering characters, and then internally manipulating them to form the number. Everything that is an input in the stream to your code is a character or a string. Hence apart from using scanf/cin you can go for opting string methods for input. One of which - getchar() - you might had probably heard about. Now the code that you are going to see below uses a specific targeted function called getchar_unlocked(). As the name would suggest it has something to do with locks. Every time an input source directs data into your code, all other input sources get notified and are locked i.e. they would not keep on producing input that while one of them in already doing it. Hence functions like getchar check for a lock on input streams that are pointing to the code. However in case when there is only one input stream (the stdin), it becomes useless to check for stream locks as there is just one source. Hence getchar_unlocked() function is designed to not to check stream locks assuming they are already open. We will talk about a feature called thread unsafe of getchar_unclocked() later.

Below is a pretty simple code that might convince you on this input method.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void fastread_int (int * a)
{
  char c=0; *a=0;  //first flush all previous data
  
  // now we shall eat all spaces before the actual num
  //as it would also count otherwise (remember ' ' has
  // also got an ASCII value, i guess 32
  while(c<33){c=getchar_unlocked();}

  //now we shall manipulate the number. i hope
  //you know the math 
  while(c>33)
  {
   *a=(*a<<3)+(*a<<1)+c-'0'; //why? :D
   c=getchar_unlocked();
  }
}

I hope the above code was clear and explainable. You can similarly create other functions which could accept long long int 's as well and functions that read a variable and return it so that you can avoid usage of a variable in your main code if not required. However this creates a problem. For a problem that includes more than one or two data types, your code would certainly look ugly. If you know C++ and have some knowledge of templates, you can avoid this code-jizz around you by shortening your function. You must had observed that all of them are doing the same task but only differ in data type, hence why not use a template. :D

template <typename DataType>;
void fasread_custom ( DataType * a)
{
  char c=0; *a=0; 

  while(c<33){c=getchar_unlocked();}
  while(c>33){*a=(*a<<3)+(*a<<1)+c-'0'; 
  c=getchar_unlocked();  }
}

As a matter of many asked arguments and queries on the relative speeds of input methods, a standard argument given is something like the one given below (you can also refer stackoverflow) :

cin (~=) < printf < getchar < getchar_unlocked

This means that getchar_unlocked() takes the least time and is the fastest of them.

Two things i am gonna talk about :
(+) The template code : Here the code seems very well and short and definitely is a good way to handle things. But dont think that this would be just one piece of code at runtime. At runtime, if you queried on a datatype 'int' and other as 'long int', both would maintain seperate spaces during runtime once they are created. This is something called as code bloat.

(+) thread safe : Well this is a concept that comes from operating systems (if you had heard of mutexes and deadlocks and mutual exclusion principal). Standard streams lock mutexes to avoid access to their stream buffer and hence they are thread safe. However this increases the overhead of locking and mutual exclusion. _unlocked allows to avoid that overhead and is marginally faster.

I hope this article was good enough for some fastread enthusiasts. A small modification can give you results for strings and floats as well. Thats where the fun lies. Give it a try. シ

Thursday 2 January 2014

Algorithm : Longest Common Subsequence (a.k.a. LCS)

The problem statement :

We are given two strings 'S' of length 'n' and 'T' of length 'm'. Our goal is to produce the length of their longest common subsequence : the longest sequence of characters that apperar from left to right (not necessarly continos) in both strings, a.w.a. this string.

Example : S is "ABAZDC" & T is :BACBAD". Their LCS string is ABAD

Time complexity : O( m*n )

This problem is probably one the most fundamental problems in strings and is frequently asked or implemented in programming websites like codechef. Typically we might be asked to display the length of the LCS string and then display the string, or maybe even vice-versa. There are two methods through which we can do it, i will explain with code one of it which has a fair advantage of using Dynamic Programming (DP) while the other one would be for you to work upon.

Methods :

Bottom-Up approach
Top-Down approach

I will be very explanatory and describe you the approach from grass root level. I expect you have some time and patience to read and understand.

Now how can we start thinking. One way which is pretty bizarre to think and probably the framework of code wont set in mind is find all sub-sequences of both strings, store them in 2-d array locations and then compare them once we have computed all sub sequences.

Probably as easy it is to see, we can easily infer that this is waste of space and a lot of waste of time. In tight situations we need something where we can avoid work that has nothing to do with the question. Although no doubt this problem is not the simplest of all and hence requires considerable manipulation time for the string.

1. Lets start with Bottom-Up approach. We find the length of the LCS string and then get the string. In this approach we break the solution in simpler sub problems. Once we clear the sub-problems we should be able to combine them all to get to the answer.

Let us say that LCS[ i , j ] denotes the length of LCS in string S from index 1 to i and string T from index 1 to j. For the sake of simplicity we shall denote them as LCS[ i , j ] from S[ 1...i ] and T[ 1...j ]. Since both S and T are character arrays hence for any index, say 'x', S[x] or T[y] would yield the respective charcaters at those indices. So now here there are 2 main possibilities.

S[ i ] != T[ j ] : Since these two characters are a mismatch hence we can say that at least either of them don't deserve to be in the LCS[ i , j ]. Hence one thing is sure as given by the expression below :

LCS[ i , j ] = __MAX__ ( LCS[ i-1 , j ] , LCS[ i , j-1 ] )

where __MAX__ is a function that returns the maximum length of the two inputs. See if you can convince yourself for the argument above. The one that would produce the largest string would be automatically accepted. We have used some recursion over here as well.
S[ i ] == T[ j ] : Since we have a match, hence it is clear that this character (common to both strings) has to be included in the LCS string and would certainly contribute to a 'unit' of length of the length of the final answer string. Hence we can certainly agree upon the expression below

LCS[ i , j ] = 1 + LCS[ i-1 , j-1 ]

Hence proceeding this way for 'i' and 'j' to cover the length of their whole strings, we can find the length of the end LCS string.

I give here a pictorial representation of a matrix which has the string S on its left panel arranged downwards, and the string T arranged on the top panel left to right. The contents of this matrinx are nothing but the outcomes of the two inequalities we had discussed above.

Just grab a page and manipulate if you get the same output or not. Obviously if you are computing values outside the lowerbound (i.e. 0 ) then return value should be 0.

Length of the LCS : the length of the resulting LCS will be the bottom-rightmost corner of this matrix, i.e. 4.

The LCS String : Keep these two rules in mind and get the string. Start from the bottom-rightmost corner.

Rule 1 : if cell above or cell to the left have equal value as to the current cell, then move to that cell. If both cells have equal value to the current cell, then choose either of it.

Rule 2 : If both cells have value strictly less than the current cell value, then push the current cell character on to a stack and move one cell diagonally up-left (ie, i-1,j-1).

Once you have done this, pop all the elements of the stack and this gives you the lcs string, We used the stack, coz otherwise it would had given us a reversed string. Hence popping all elements of the stack would give ABAD.
I think i have given you a very fair and detailed description. I leave to code this part to you people.

2 . Top-Down DP method.
Time Complexity : O( m*n )
We follow the approach of DP which we shall discuss here. Once you are fairly known with DP (in case you havn't ) this method would be clear in a matter of seconds. Lets first write a recursive function to get the length of the LCS, which would be a pseudo code.

typedef unsigned long long int ullint;
ullint LCS( S , n , T , m )
{

  llint result;
  if( n==0 || m==0)  return 0;  //obviously
  if( S[n] == T[m])  result= 1 + LCS( S, n-1, T, m-1);
  else result = MAX(LCS[ S, n-1, T, m] , LCS[S, n, T, m-1]);
  return result;

}

Definitely this seems now a piece of cake, but wait! For large strings it will rip you off your time bounds. We are doing a lot of work that has been done already once in the past in this recursive loop. What I am trying to say, that a lot of work that might had been done due to same calls again in sub problems, can be saved (also called memoization).

A very interesting article and explanation on memoization is given here. Click to jump in!

(+) first let us take matrix m*n and initialize each of its element to 0 (we will refer it to as a macro UNKNOWN)
(+) if Current LCS is unknown in the matrix (untouched or hasnt been replaced by a new data yet) then compute the result and save it at the location for further use.
(+) if LCS is already known (the UNKNOWN was replaced by some data previously) then simple return the result.
Let us assume that there is an array 'arr' that has been initially initialized to UNKNOWN.
The above recursion can be modified as.

llint LCS ( S, n, T, m)

{
 llint result;
 if(n==0 || m==0)  return 0;
 if(arr[n][m] != UNKNOWN)  return arr[n][m];  //since it is known already
 if(S[ n ]==T[ m ]) result = 1 + LCS(S,n-1,T,m-1);
 else result = MAX( LCS[ S, n-1, T, m] , LCS[ S, n, T, m-1] );
 arr[n][m]=result;   //store this work for future
 return result;
}

If you have understood the above code then it would be a matter of minutes before you figure out where can we insert a small code so that it yields the string as well. ッ

Any suggestions are formally welcome. I hope i presented a very detailed description of this problem and must had increased your understanding.
You can also follow me on facebook!

Algorithm : Exponentiate

Time complexity : O( log(n) )

Problem Description : To calculate x^n (x to the power of n) using minimum steps. Trick lies in the Algorithm being followed.

One general way to calculate x^n that is very crude is shown below.

void exponentaite(long int x,lont int n)
{

    long int result=1;  //to store the final result

    while( n-- )
    {
      result  *= x;
    }

    printf("%lld\n",result);
    return;
} //end of function

Definitely it takes O(n) linear time to do the work. But in tighter cases, it might be handy to do a task in short hand just if you know this small trick. I will write a short code below in C again, and i hope you might get the logic.

 typedef long int lint;
 typedef long long int llint;

 llint exponentiate_fast ( llint sample ,  llint n )
 {
   llint result=1 ;
   while( m > 0 )
   {
     while( m%2 == 0 )
     {
       sample *= sample;
       m /= 2;
     }
     --m;
     result *= sample;
   }
   return result;
 }

An Ideone sample running example has been provided to get you a clear insight. You can change the datatype as per your usage.

Concept behind this computation :

Every computation of x^n can be expressed as x^(2m+k), for some 'm' and 'k'. Hence instead of multiplying n times - which obviously was O(n) - we calculate its squares until the the inner loop terminates when m equals its value to 1.

Hence every time we square it, we are dividing are task into halves and hence its easy to show up that the time req is order of log(n).

Tip : You can examine the execution flow by placing 'printf' check points at suspected positions and hence analyse them. Its a good practice for newbies and helps improve debugging skills.