恢复空格
难度:
标签:
题目描述
Oh, no! You have accidentally removed all spaces, punctuation, and capitalization in a lengthy document. A sentence like "I reset the computer. It still didn't boot!" became "iresetthecomputeritstilldidntboot''. You'll deal with the punctuation and capitalization later; right now you need to re-insert the spaces. Most of the words are in a dictionary but a few are not. Given a dictionary (a list of strings) and the document (a string), design an algorithm to unconcatenate the document in a way that minimizes the number of unrecognized characters. Return the number of unrecognized characters.
Note: This problem is slightly different from the original one in the book.
Example:
Input: dictionary = ["looked","just","like","her","brother"] sentence = "jesslookedjustliketimherbrother" Output: 7 Explanation: After unconcatenating, we got "jess looked just like tim her brother", which containing 7 unrecognized characters.
Note:
0 <= len(sentence) <= 1000
The total number of characters in dictionary
is less than or equal to 150000.- There are only lowercase letters in
dictionary
andsentence
.
代码结果
运行时间: 68 ms, 内存: 16.5 MB
/*
* 思路:
* 使用Java Stream和集合来解决这个问题。先将词典转换为集合,
* 然后通过流处理来更新动态规划数组dp。
*/
import java.util.HashSet;
import java.util.Set;
import java.util.stream.IntStream;
public class SolutionStream {
public int respace(String[] dictionary, String sentence) {
Set<String> dict = new HashSet<>(Set.of(dictionary));
int n = sentence.length();
int[] dp = new int[n + 1];
IntStream.rangeClosed(1, n).forEach(i -> {
dp[i] = dp[i - 1] + 1;
IntStream.range(0, i).filter(j -> dict.contains(sentence.substring(j, i))).forEach(j -> {
dp[i] = Math.min(dp[i], dp[j]);
});
});
return dp[n];
}
}
解释
方法:
此题解采用了动态规划的方法。首先,将词典中在句子中出现的词筛选出来,以减少后续的计算量。然后,定义一个动态规划数组dp,其中dp[i]表示句子中前i个字符中未识别的字符数量的最小值。初始时,dp[0]=0,表示空字符串中没有未识别的字符。接着,遍历句子中的每个字符,对于每个位置i,首先将dp[i]更新为dp[i-1]+1,因为句子中的第i个字符可能是未识别的,然后遍历筛选后的词典,检查以当前位置为结尾的子字符串是否是词典中的单词,如果是,则尝试更新dp[j](j是单词结束的位置),使得dp[j]=min(dp[j], dp[i]),这表示如果以i结尾的子字符串是一个词典中的单词,那么前j个字符中未识别的字符数量可以不包括这个单词。最后,返回dp数组的最后一个元素,即为整个句子中未识别的字符数量的最小值。
时间复杂度:
O(nmk)
空间复杂度:
O(n)
代码细节讲解
🦆
为什么在动态规划数组`dp`中,初始值设置为`float('inf')`而非其他数字,如0或一个具体的大数?
▷🦆
在进行词典筛选`dictionary = [x for x in dictionary if x in sentence]`时,这种筛选是否可能导致一些必要词汇被遗漏,特别是当词汇形成嵌套或重叠时?
▷🦆
在内层循环中,判断子字符串是否为词典中的单词`if sentence[i:j] == x:`,这种方法的效率如何?是否有更高效的字符串匹配方法可用于此场景?
▷