Sunday, 5 June 2016

**Akhil And Random String( smallest subsequence string of length k)

Akhil And Random String 


Akhil comes across a string S of length N. He started wondering about the smallest lexicographical subsequence of string S of length K.
A subsequence of a string is formed by deleting some characters (possibly none) from it's original string.
A string A is said to be lexicographically smaller than the string B of the same length if at the first position where A and B differ, A contains a letter which appears earlier in the dictionary than the corresponding letter in B.

Input

  • The first line of the input contains an integer T denoting the number of test cases. The description of Ttest cases follows:
  • First line of each test case will contain string S
  • Second line of each test case will contain an integer K.

Output

  • For each test case, output a single line containing the lexicographically smallest subsequence of S of length K.

Constraints

  • 1 ≤ T ≤ 5
  • 1 ≤ K ≤ N
  • S consists of lowercase English alphabet characters, i.e. from 'a' to 'z'.

Subtasks

  • Subtask #1 (10 points) : 1 ≤ |S| ≤ 100
  • Subtask #2 (20 points) : 1 ≤ |S| ≤ 103
  • Subtask #3 (70 points) : 1 ≤ |S| ≤ 105
  • Example

    Input:
    2
    abdc
    3
    bacb
    2
    
    Output:
    abc
    ab
    

    Explanation

    Example case 1. "abc" is the smallest lexicographical subsequence out of ["abd", "bdc", "abc", "adc"].
    Example case 2. "ab" is the smallest lexicographical subsequence of length 2.



    ====================editorial==================================
    seg tree ----
    this problem can be easily solve using seg tree , start building the ans string from msb , each time try to fix  smallest character , supposed we are going to fix ith characters , and we have X number of characters remain and we have to fix Y more characters , than we  will fix ith character as min among characters ( 0 to x-y)
    ===================================code===========================
    #include<bits/stdc++.h>
    using namespace std;
    typedef long long int lli;
     int  seg[1000000];
     int arr[1000000];
     int build(int idx,int start,int end)
      {
      if(start>end)return 100;
      
      else if(start==end)
      {
         seg[idx]=arr[start];
         return 0;
      }
      else
      {
       build(2*idx,start,(start+end)/2);
       build(2*idx+1,(start+end)/2+1,end);
       seg[idx]=min(seg[2*idx],seg[2*idx+1]);
      }
      }
      
      
      int query(int idx,int start,int end,int qs,int qe)
       {
        if(start>end || qe<start || qs>end)
            {
        return 100;
    }
    else if(start>=qs && end<=qe)
    {
    return seg[idx];
    }
    else
    {
    int p1=query(2*idx,(start),(start+end)/2,qs,qe);
    int p2=query(2*idx+1,((start+end)/2)+1,end,qs,qe);
    return min(p1,p2);
    }
       }
    int main()
     {
     
      int t;
      cin>>t;
      while(t--)
       {
          
          string s;
          cin>>s;
          int k;
           cin>>k;
           
           int len=s.length();
           for(int i=0;i<len;i++)
            {
            arr[i]=s[i]-'a';
    }
    build(1,0,len-1);
    string res="";
    int l=0;
    int r;
    for(int i=1;i<=k;i++)
    {
     int rem=k-i;
     int r=(len-1)-rem;
    //   cout<<" l  "<<l<<" r "<<r<<endl;
     int mini=query(1,0,len-1,l,r);
     // cout<<" mini is "<<mini<<endl;
     res+=(char)('a'+mini);
     for(int j=l;j<=r;j++)
      {
     //  cout<<" check ";
      if(arr[j]==mini)
         {
      l=j+1;
      break;
    }
      }
        //cout<<l<<endl;
    }
     
     
     
    cout<<res<<endl;
      }
     }
    ========================another approach===========================

    DIFFICULTY:

    Easy

    PREREQUISITES:

    Sets, Implementation

    PROBLEM:

    Find the smallest lexicographical subsequence of length K from a given string of length N

    EXPLANATION


    Solution for Subtask 1:
    The smallest subtask can indeed be solved by dynamic programming. Basically, you have a choice at every position of the string S i.e, whether to include a character at particular position in final subsequence of K characters or not. Now, let's define a DP state as dp(i, j) denoting the smallest lexicographical subsequence of length j formed from first i characters of the original string S. Hence, the following equations hold true.
    dp(0, 0) = null
    dp(0, j) = INFj  (1, N)
    dp(i, j) = min( dp(i  1, j) , si+dp(i  1, j  1)


    Let's have a look at the pseudo code for the bottom up solution of the above approach.
        dp[0][0] = ""
        for ( j = 1 to K ) dp[0][j] = "INF"
    
        for ( i = 1 to N )  {
            for ( j = 1 to K  ) { 
                dp[i][j] = "INF"
                if ( dp[i - 1][j] != "INF" ) dp[i][j] = dp[i - 1][j]
                if ( dp[i - 1][j - 1] != "INF" )  {
                    if ( dp[i][j] == "INF" ) dp[i][j] = dp[i - 1][j - 1] + s[i - 1]
                    else dp[i][j] = min(dp[i][j], dp[i - 1][j - 1] + s[i - 1])
                }
            }
        }
    
        print(dp[N][K])
    

    The same thing can be implemented by recursive memoization solution. You might like to read this blog post to know more about Dynamic Programming. The time complexity for the above approach is O(N2) but the space complexity is around O(N3). Due to the higher space complexity, it won't be possible to implement the same solution for subtask 2 as it can cause memory overflow leading to Runtime Error.

    Solution for subtask 2:
    The approach for this subtask is based on the fact that the subsequence formed at last has to be of length K exactly. Having said that, it is not difficult to realize that the first character of the smallest lexicographical subsequence has to be from substring s[0, n  k]. Why? If the first character is not from this substring, then it would not be possible to form a subsequence of length K. Similarly, the ith character should be from the substring s[prev + 1, n  k + i], where prev is the position found for (i1)th character of the subsequence. Hence, we should always choose the smallest characters in the substrings. And in case, there are two smallest characters, we should choose the one with leftmost position, since it will give a larger substring to choose from, for the rest of the characters in the subsequence. Let us look at the pseudo code for this greedy approach.
        subseq = ""
        prev_pos = -1
        for ( i from 0 to k - 1 ) {
            for ( j from last_pos + 1 to n - k + i ) {
                if ( j == last_pos + 1 ) smallest_char = s[j], leftmost_pos = j
                else if ( smallest_char > s[j] ) smallest_char = s[j], leftmost_pos = j
            }
            prev_pos = leftmost_pos
            subseq.append(smallest_char)
        }
    
        print(subseq)
    
    Since, for all K characters of the subsequence, it traverses the whole string in the worst case to find the smallest character, the time complexity for this pseudo code is O(NK). But, this does not yet suffice to pass all the input files.

    Solution for subtask 3:
    The logical approach for the problem remains the same as mentioned in the subtask 2. But, the implementation can be optimized to reduce the overall time complexity. For this, we can maintain a data structure that will give the smallest character in a range and in case, there are several smaller characters, it can give us the position of the leftmost smallest character in that range. The data structure which seems most easier ti implement at this stage is a set of pairs. Hence, the pairs of (characters, indexes) will be inserted into the set when required and removed when they go outside the current range. The first element of the set will give us the required character and it's corresponding index for that particular position. Let's denote pos[x] as the index where the xth character of the K-length subsequence was found.

    To calculate the pos[i], the range that to be considered will be (pos[i  1] + 1, n  k + i). Hence, the pairs within the range (pos[i  2] + 1, pos[i  1]) are removed and a new pair of (sn  k + i, n  k + i) should be inserted into the set. This would allow you to remove and insert each pair exactly once. Let us now have a look at the pseudo code for the same.
        for ( i = 0 to n - k ) Set.insert(make_pair(s[i], i))
    
        subseq.append(Set.top().character)
    
        a = 0, b = Set.top().index
        for (i = n - k + 1 to n - 1 ) {
            while ( a <= b )  Set.erase(make_pair(s[a], a)), a++
            Set.insert(make_pair(s[i], i))
            b = Set.top().second
            subseq.append(Set.top().character)
        }
    
        print(subseq)
    
    For more details on the implementation of any subtasks, have a look at the tester's solution.