统计相似字符串对的数目

难度:

标签:

题目描述

You are given a 0-indexed string array words.

Two strings are similar if they consist of the same characters.

For example, "abca" and "cba" are similar since both consist of characters 'a', 'b', and 'c'.
However, "abacba" and "bcfd" are not similar since they do not consist of the same characters.

Return the number of pairs (i, j) such that 0 <= i < j <= word.length - 1 and the two strings words[i] and words[j] are similar.

Example 1:

Input: words = ["aba","aabb","abcd","bac","aabc"]
Output: 2
Explanation: There are 2 pairs that satisfy the conditions:
- i = 0 and j = 1 : both words[0] and words[1] only consist of characters 'a' and 'b'. 
- i = 3 and j = 4 : both words[3] and words[4] only consist of characters 'a', 'b', and 'c'.

Example 2:

Input: words = ["aabb","ab","ba"]
Output: 3
Explanation: There are 3 pairs that satisfy the conditions:
- i = 0 and j = 1 : both words[0] and words[1] only consist of characters 'a' and 'b'. 
- i = 0 and j = 2 : both words[0] and words[2] only consist of characters 'a' and 'b'.
- i = 1 and j = 2 : both words[1] and words[2] only consist of characters 'a' and 'b'.

Example 3:

Input: words = ["nba","cba","dba"]
Output: 0
Explanation: Since there does not exist any pair that satisfies the conditions, we return 0.

Constraints:

1 <= words.length <= 100
1 <= words[i].length <= 100
words[i] consist of only lowercase English letters.

代码结果

运行时间: 27 ms, 内存: 16.4 MB

/*
 * 思路：
 * 使用Java Stream来处理字符串数组。
 * 对于每个字符串，提取其字符集（使用Set存储字符）。
 * 通过比较两个字符串的字符集是否相同来判断它们是否相似。
 * 使用flatMap和filter来生成并过滤满足条件的下标对。
 */
import java.util.Arrays;
import java.util.HashSet;
import java.util.Set;
import java.util.stream.IntStream;

public class SolutionStream {
    public long similarPairs(String[] words) {
        return IntStream.range(0, words.length)
            .flatMap(i -> IntStream.range(i + 1, words.length)
                .filter(j -> {
                    Set<Character> set1 = wordsToSet(words[i]);
                    Set<Character> set2 = wordsToSet(words[j]);
                    return set1.equals(set2);
                })
            ).count();
    }

    private Set<Character> wordsToSet(String word) {
        Set<Character> set = new HashSet<>();
        for (char c : word.toCharArray()) {
            set.add(c);
        }
        return set;
    }
}

解释

方法:

题解使用了一个简单而有效的方法来统计相似字符串对的数量。它首先创建一个字典 `char_sets` 来存储每个字符串的字符集合作为键，和这些集合出现的次数作为值。对于数组 `words` 中的每个单词，算法会计算其字符的集合（使用 `frozenset` 以确保集合可以作为字典的键）。如果这个字符集合已经出现在字典中，那么就将字典中该集合的值（即之前这种集合出现的次数）加到总的相似对计数 `count` 上。这样，对于任何新出现的相同字符集合的字符串，我们都能立即知道它与之前多少个字符串是相似的，从而更新计数器。最后，算法返回计数器的值，即为相似字符串对的总数。这种方法巧妙地利用了集合的唯一性和字典的高效查找特性，避免了复杂的双重循环比较。

时间复杂度:

O(n * m)

空间复杂度:

O(n)

代码细节讲解

🦆

为什么在计算字符串相似对时选择使用frozenset而不是其他数据结构，如列表或元组？

▷

在计算字符串相似对时选择使用 `frozenset` 主要是因为 `frozenset` 是不可变的，可以作为字典的键。此外，`frozenset` 自动处理了字符的去重和排序问题，确保只要组成字符相同，无论字符顺序如何，都映射到相同的键。而列表或元组虽然可以保存字符，但它们保存的是字符的顺序信息，不适合此类应用场景，因为相似字符串的定义仅与字符种类和数量有关，与顺序无关。

🦆

在题解中，如果两个字符串字符完全相同但顺序不同，它们被认为是相似的。这种设计选择的原因是什么？

▷

题解中将字符完全相同但顺序不同的字符串视为相似的设计选择，是基于对相似字符串的定义——即两个字符串包含完全相同的字符集合。这种设计允许算法以高效的方式通过字符集合来识别和统计相似字符串对，避免了复杂的排序或多重循环比较，显著提高了算法效率。

🦆

如何处理极端情况，比如输入数组`words`为空？

▷

如果输入数组 `words` 为空，根据题解的算法逻辑，字典 `char_sets` 也将保持为空状态。因为没有字符串进行处理，循环不会执行任何操作，相似对的计数 `count` 将保持初始值0。因此，算法会返回0，表明没有找到任何相似的字符串对。这种处理方式自然地解决了空输入数组的情况，无需额外的特殊处理代码。

🦆

题解算法在遇到重复的字符串时如何处理？例如，如果`words`数组中有多个相同的字符串会怎样？

▷

题解算法在遇到重复的字符串时，会将这些字符串对应的字符集 `char_set` 的出现频率进行累加。每次遇到已存在的字符集时，算法会将该字符集的当前频率（即之前出现的次数）加到相似对计数 `count` 上，并将频率增加1。这意味着，如果数组中有多个相同的字符串，每一个后续出现的相同字符串都会与前面的所有相同字符串形成新的相似对，从而正确地统计总的相似对数量。

统计相似字符串对的数目

题目描述

代码结果

解释

代码细节讲解

相关问题