句子相似性 II

难度:

标签:

题目描述

代码结果

运行时间: 62 ms, 内存: 19.2 MB

// Problem: Sentence Similarity II
// Given two sentences and a list of similar word pairs, determine if the two sentences are similar.
// Sentences are represented as arrays of strings, and similar word pairs are given as a 2D array.
// Two sentences are similar if the words in one sentence can be replaced with similar words to get the other sentence.

import java.util.*;
import java.util.stream.Collectors;

public class SentenceSimilarityIIStream {
    private Map<String, String> parent;

    public SentenceSimilarityIIStream() {
        this.parent = new HashMap<>();
    }

    private String find(String word) {
        parent.putIfAbsent(word, word);
        if (!parent.get(word).equals(word)) {
            parent.put(word, find(parent.get(word)));
        }
        return parent.get(word);
    }

    private void union(String word1, String word2) {
        String root1 = find(word1);
        String root2 = find(word2);
        if (!root1.equals(root2)) {
            parent.put(root1, root2);
        }
    }

    public boolean areSentencesSimilarTwo(String[] sentence1, String[] sentence2, List<List<String>> similarPairs) {
        if (sentence1.length != sentence2.length) {
            return false;
        }

        similarPairs.forEach(pair -> union(pair.get(0), pair.get(1)));

        return IntStream.range(0, sentence1.length)
                .allMatch(i -> find(sentence1[i]).equals(find(sentence2[i])));
    }

    public static void main(String[] args) {
        SentenceSimilarityIIStream similarityChecker = new SentenceSimilarityIIStream();
        String[] sentence1 = {"great", "acting", "skills"};
        String[] sentence2 = {"fine", "drama", "talent"};
        List<List<String>> similarPairs = Arrays.asList(
                Arrays.asList("great", "good"),
                Arrays.asList("fine", "good"),
                Arrays.asList("drama", "acting"),
                Arrays.asList("skills", "talent")
        );
        System.out.println(similarityChecker.areSentencesSimilarTwo(sentence1, sentence2, similarPairs)); // true
    }
}

解释

方法:

该题解使用了并查集的思想。首先，将相似单词对构建成一个无向图，然后对两个句子中的每个单词进行遍历，对于每个单词，如果它没有出现在并查集中，就对它进行深度优先搜索，将所有与它相连的单词都归属到同一个集合中，并给该集合分配一个编号。最后，对于两个句子中的每个单词，检查它们是否属于同一个集合，如果有任意一对单词不属于同一个集合，就返回 False，否则返回 True。

时间复杂度:

O(m + n)

空间复杂度:

O(m + n)

代码细节讲解

🦆

为什么选择使用并查集而不是其他数据结构如哈希表或树来解决句子相似性的问题？

▷

并查集是一种高效处理数据分组问题的数据结构，特别适合处理动态连通性问题，即快速判断和合并集合。在句子相似性问题中，我们需要频繁判断两个单词是否通过一系列中间单词相互连接，这种'动态连通性'是并查集的强项。使用哈希表或树虽然可以实现相似功能，但在动态查询和合并操作上，并查集提供了更优的时间复杂度，尤其是在路径压缩和按秩合并优化后。

🦆

在构建无向图时，相似单词对的处理为什么要使得每对单词双向连接，即a到b和b到a都要连接？

▷

无向图的特性要求每条边双向连接，这意味着从任一节点都可以到达与之连接的任何节点。在句子相似性问题中，如果单词a与单词b相似，则单词b也必然与单词a相似。因此，为了正确反映这一逻辑关系并使得图的遍历（如深度优先搜索）可以从任一节点双向进行，我们需要将每对相似单词双向连接。这样做也简化了图的构造和遍历过程。

🦆

在DFS中，如果一个单词已经被访问过，为什么选择跳过它而不是重新探索其相似单词？这样做有哪些潜在的优势或风险？

▷

在DFS过程中，如果一个单词已经被访问过，意味着它已被分配到一个特定的集合中，并且所有与它相连的单词也都已经被处理过。跳过已访问的单词可以避免重复工作和无限循环，大大提高算法的效率。优势包括减少不必要的计算和提高执行速度。风险是如果图的构建或DFS的实现错误，可能导致某些连接不被正确识别，但这主要是实现错误而非方法本身的问题。

🦆

在将单词归属到集合中时，为什么在单词a处理后立即增加集合编号k，而不是在一系列相连单词处理完后再增加？

▷

在DFS中处理单词时，我们采用递归方式将所有相连的单词归属到同一个集合中。只有当我们从一个新的未被访问的单词启动DFS时，才意味着我们开始探索一个新的集合，因此在这时增加集合编号k是合适的。这样做确保了每次遇到未访问的单词，我们都正确地标记新的集合开始，而不是在一系列相连单词处理完后再增加，这有助于更清晰地管理不同集合的界限和识别。

相关问题

省份数量

有 n 个城市，其中一些彼此相连，另一些没有相连。如果城市 a 与城市 b 直接相连，且城市 b 与城市 c 直接相连，那么城市 a 与城市 c 间接相连。

省份是一组直接或间接相连的城市，组内不含其他没有相连的城市。

给你一个 n x n 的矩阵 isConnected ，其中 isConnected[i][j] = 1 表示第 i 个城市和第 j 个城市直接相连，而 isConnected[i][j] = 0 表示二者不直接相连。

返回矩阵中省份的数量。

示例 1：

输入：isConnected = [[1,1,0],[1,1,0],[0,0,1]]
输出：2

示例 2：

输入：isConnected = [[1,0,0],[0,1,0],[0,0,1]]
输出：3

提示：

1 <= n <= 200
n == isConnected.length
n == isConnected[i].length
isConnected[i][j] 为 1 或 0
isConnected[i][i] == 1
isConnected[i][j] == isConnected[j][i]

账户合并

给定一个列表 accounts，每个元素 accounts[i] 是一个字符串列表，其中第一个元素 accounts[i][0] 是 名称 (name)，其余元素是 emails 表示该账户的邮箱地址。

现在，我们想合并这些账户。如果两个账户都有一些共同的邮箱地址，则两个账户必定属于同一个人。请注意，即使两个账户具有相同的名称，它们也可能属于不同的人，因为人们可能具有相同的名称。一个人最初可以拥有任意数量的账户，但其所有账户都具有相同的名称。

合并账户后，按以下格式返回账户：每个账户的第一个元素是名称，其余元素是 按字符 ASCII 顺序排列 的邮箱地址。账户本身可以以 任意顺序 返回。

示例 1：

输入：accounts = [["John", "[email protected]", "[email protected]"], ["John", "[email protected]"], ["John", "[email protected]", "[email protected]"], ["Mary", "[email protected]"]]
输出：[["John", '[email protected]', '[email protected]', '[email protected]'],  ["John", "[email protected]"], ["Mary", "[email protected]"]]
解释：
第一个和第三个 John 是同一个人，因为他们有共同的邮箱地址 "[email protected]"。 
第二个 John 和 Mary 是不同的人，因为他们的邮箱地址没有被其他帐户使用。
可以以任何顺序返回这些列表，例如答案 [['Mary'，'[email protected]']，['John'，'[email protected]']，
['John'，'[email protected]'，'[email protected]'，'[email protected]']] 也是正确的。

示例 2：

输入：accounts = [["Gabe","[email protected]","[email protected]","[email protected]"],["Kevin","[email protected]","[email protected]","[email protected]"],["Ethan","[email protected]","[email protected]","[email protected]"],["Hanzo","[email protected]","[email protected]","[email protected]"],["Fern","[email protected]","[email protected]","[email protected]"]]
输出：[["Ethan","[email protected]","[email protected]","[email protected]"],["Gabe","[email protected]","[email protected]","[email protected]"],["Hanzo","[email protected]","[email protected]","[email protected]"],["Kevin","[email protected]","[email protected]","[email protected]"],["Fern","[email protected]","[email protected]","[email protected]"]]

提示：

1 <= accounts.length <= 1000
2 <= accounts[i].length <= 10
1 <= accounts[i][j].length <= 30
accounts[i][0] 由英文字母组成
accounts[i][j] (for j > 0) 是有效的邮箱地址

句子相似性

小行星碰撞单调递增的数字