字串演算法 - 練習題¶
Q1. KMP Prefix Function (110)¶
The Knuth-Morris-Pratt (KMP) algorithm is a string-matching algorithm. Its core is the prefix function. Given a pattern \(P[1..m]\), the prefix function for this pattern \(P\) is the function \(\pi : \{1,2,...,m\} \rightarrow \{0,1,...,m-1\}\). What is \(\pi[7]\) for the following pattern \(P\)?
| i | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
|---|---|---|---|---|---|---|---|---|---|---|
| P[i] | a | b | a | b | a | b | a | b | c | a |
| π[i] | ? |
| (A) 4 | (B) 5 | (C) 6 | (D) 7 |
答案與解析
答案: (B) 5
解析:
\(\pi[i]\) = P[1..i] 的最長 proper prefix 也是 suffix 的長度
計算過程:
P[1..7] = "abababa"
檢查各長度的 prefix 是否也是 suffix:
- 長度 1: "a" vs "a" ✓
- 長度 2: "ab" vs "ba" ✗
- 長度 3: "aba" vs "aba" ✓
- 長度 4: "abab" vs "baba" ✗
- 長度 5: "ababa" vs "ababa" ✓
- 長度 6: "ababab" vs "bababa" ✗
最長的是長度 5,所以 π[7] = 5
完整的 prefix function: | i | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |---|---|---|---|---|---|---|---|---|---|---| | P[i] | a | b | a | b | a | b | a | b | c | a | | π[i] | 0 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 0 | 1 |
Q2. KMP Prefix Function 執行次數 (112)¶
The pseudo code function to compute the prefix function in the Knuth-Morris-Pratt (KMP) string matching algorithm is given below. Given a pattern string P=ABACABACABAD, how many times is line 7 executed?
COMPUTE-PREFIX-FUNCTION(P)
1 m = P.length
2 let π[1..m] be a new array
3 π[1] = 0
4 k = 0
5 for q = 2 to m
6 while k > 0 and P[k + 1] ≠ P[q]
7 k = π[k]
8 if P[k + 1] == P[q]
9 k = k + 1
10 π[q] = k
11 return π
答案與解析
答案: 需要逐步追蹤
解析:
Pattern: A B A C A B A C A B A D Index: 1 2 3 4 5 6 7 8 9 10 11 12
逐步執行: - q=2: k=0, P[1]='A' ≠ P[2]='B', k=0 保持, π[2]=0 - q=3: k=0, P[1]='A' = P[3]='A', k=1, π[3]=1 - q=4: k=1, P[2]='B' ≠ P[4]='C', line 7 執行 1 次, k=0, π[4]=0 - q=5: k=0, P[1]='A' = P[5]='A', k=1, π[5]=1 - q=6: k=1, P[2]='B' = P[6]='B', k=2, π[6]=2 - q=7: k=2, P[3]='A' = P[7]='A', k=3, π[7]=3 - q=8: k=3, P[4]='C' = P[8]='C', k=4, π[8]=4 - q=9: k=4, P[5]='A' = P[9]='A', k=5, π[9]=5 - q=10: k=5, P[6]='B' = P[10]='B', k=6, π[10]=6 - q=11: k=6, P[7]='A' = P[11]='A', k=7, π[11]=7 - q=12: k=7, P[8]='C' ≠ P[12]='D', line 7 執行, k=π[7]=3 k=3, P[4]='C' ≠ P[12]='D', line 7 執行, k=π[3]=1 k=1, P[2]='B' ≠ P[12]='D', line 7 執行, k=π[1]=0 k=0, 退出 while, π[12]=0
Line 7 執行次數:1 + 3 = 4 次
Q3. Finite Automata 字串匹配 (110)¶
Many string-matching algorithms build a finite automata. Below is a partially-completed state transition function for a string s of length 10 over the alphabet \(\{A, B, C\}\). It is possible to reconstruct s from the partial table. How many A's does the string s have?
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
|---|---|---|---|---|---|---|---|---|---|---|
| A | 4 | 5 | 2 | |||||||
| B | 0 | 0 | 7 | 3 | ||||||
| C | 0 | 0 | 0 | 10 |
| (A) 4 | (B) 5 | (C) 6 | (D) 7 |
答案與解析
答案: (B) 5
解析:
根據 Finite Automata 的特性: - 狀態 i 表示已經匹配了 pattern 的前 i 個字元 - 如果在狀態 i 讀入字元 c 後轉移到狀態 j,表示 P[j] = c 且前面匹配
從表格推斷: - 狀態 0 → 1:需要確定 P[1] - 狀態 1 讀 A → 4:說明 P[1..4] 是某種以 A 結尾的串 - 狀態 9 讀 C → 10:說明 P[10] = C
逐步重建 pattern,統計 A 的數量為 5。
Q4. Hamming Distance (114)¶
Consider the following five DNA sequences: - a) ATCGTA - b) ATCGTT - c) GCATAG - d) GCATAC - e) TTAGCG
Which of the following statements is correct regarding the application of a c-means clustering algorithm to these sequences \((c = 3)\)?
- (A) The algorithm will always halt with exactly three clusters regardless of the chosen distance metric.
- (B) Using Hamming distance as the similarity measure, sequences a and b will always be in the same cluster.
- (C) The algorithm guarantees finding the globally optimal clustering solution for these sequences.
- (D) Using k-mer frequency vectors \((k = 2)\) as features, sequences c and d will never be in the same cluster.
- (E) The time complexity of the k-means algorithm for clustering these sequences is \(O(n)\), where n is the number of sequences.
答案與解析
答案: (B)
解析:
計算 Hamming distance(相同位置不同字元數):
- (A) 錯誤: k-means 可能收斂到不同結果
- (B) 正確: a 和 b 只差 1,是最相似的一對,一定會在同一 cluster
- (C) 錯誤: k-means 不保證全局最優
- (D) 錯誤: c 和 d 很相似,可能在同一 cluster
- (E) 錯誤: k-means 複雜度不只是 O(n)
延伸練習¶
更多字串演算法相關題目: - 110 Data Structure.Pdf Q18 - Finite Automata 詳細分析