Question

link

Find all the repeating substring of specified length in a large string sequence.

For e.g.

Input String: "ABCACBABC" 
repeated sub-string length: 3 
Output: ABC 

eg.

Input String: "ABCABCA" 
repeated sub-string length: 2 
Output: AB, BC, CA

Solution

Similar to [Amazon] Longest Repeating Substring, the best solution is to do Suffix Tree, or suffix array. We then need to print nodes on a certain level, who has more than 1 descendant.

However, since the length of substring is given, we can also do simply iteration: insert all substring with given length into a HashSet, and check repetition. ref

Code

Suffix tree solution: not written.

Hashset code:

public List<String> solve(String input, int k) {
    List<String> ans = new ArrayList<String>();
    HashSet<String> set = new HashSet<String>();
    for (int i = 0; i <= input.length() - k; i++) {
        String sub = input.substring(i, i + k);
        if (set.contains(sub)) {
            ans.add(sub);
        }
        set.add(sub);
    }
    return ans;
}