Question
Find all the repeating substring of specified length in a large string sequence.
For e.g.
Input String: "ABCACBABC"
repeated sub-string length: 3
Output: ABC
eg.
Input String: "ABCABCA"
repeated sub-string length: 2
Output: AB, BC, CA
Solution
Similar to [Amazon] Longest Repeating Substring, the best solution is to do Suffix Tree, or suffix array. We then need to print nodes on a certain level, who has more than 1 descendant.
However, since the length of substring is given, we can also do simply iteration: insert all substring with given length into a HashSet, and check repetition. ref
Code
Suffix tree solution: not written.
Hashset code:
public List<String> solve(String input, int k) {
List<String> ans = new ArrayList<String>();
HashSet<String> set = new HashSet<String>();
for (int i = 0; i <= input.length() - k; i++) {
String sub = input.substring(i, i + k);
if (set.contains(sub)) {
ans.add(sub);
}
set.add(sub);
}
return ans;
}