187 Repeated DNA Sequences
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",
Return:
["AAAAACCCCC", "CCCCCAAAAA"].
这题一开始只想出了暴力的方法,要O(n^2)的时间和空间。后来看到一种rolling hash的解法,可以O(n)搞定。(待补代码)
public List<String> findRepeatedDnaSequences(String s) {
List<String> result = new ArrayList<>();
if (s == null || s.length() == 0) {
return result;
}
HashMap<String, Integer> checkDup = new HashMap<>();
for (int i = 0; i < s.length() - 10 + 1; i++) {
String subString = s.substring(i, i + 10);
if (checkDup.containsKey(subString)) {
int times = checkDup.get(subString);
if (times == 1) {
result.add(subString);
}
checkDup.put(subString, times + 1);
} else {
checkDup.put(subString, 1);
}
}
return result;
}
Last updated
Was this helpful?