统计同位异构字符串数目

难度:

标签:

题目描述

You are given a string s containing one or more words. Every consecutive pair of words is separated by a single space ' '.

A string t is an anagram of string s if the i^th word of t is a permutation of the i^th word of s.

For example, "acb dfe" is an anagram of "abc def", but "def cab" and "adc bef" are not.

Return the number of distinct anagrams of s. Since the answer may be very large, return it modulo 10⁹ + 7.

Example 1:

Input: s = "too hot"
Output: 18
Explanation: Some of the anagrams of the given string are "too hot", "oot hot", "oto toh", "too toh", and "too oht".

Example 2:

Input: s = "aa"
Output: 1
Explanation: There is only one anagram possible for the given string.

Constraints:

1 <= s.length <= 10⁵
s consists of lowercase English letters and spaces ' '.
There is single space between consecutive words.

代码结果

运行时间: 118 ms, 内存: 17.2 MB

/*
 * 思路：
 * 1. 将字符串 s 按照空格分割成单词流。
 * 2. 对于每个单词，计算它的排列数目。
 * 3. 计算排列数目时，使用计数因子方法，计算每个字符的频率并求排列数目。
 * 4. 最终将所有单词的排列数目相乘，并取模 10^9 + 7。
 */

import java.util.*;
import java.util.stream.*;

public class AnagramCountStream {
    private static final int MOD = 1000000007;

    public static int countAnagrams(String s) {
        return Arrays.stream(s.split(" "))
                     .mapToLong(AnagramCountStream::countAnagramsForWord)
                     .reduce(1, (a, b) -> (a * b) % MOD);
    }

    private static long countAnagramsForWord(String word) {
        int[] freq = new int[26];
        word.chars().forEach(c -> freq[c - 'a']++);
        long count = factorial(word.length());
        return Arrays.stream(freq).filter(f -> f > 0)
                     .reduce(count, (acc, f) -> (acc * modInverse(factorial(f), MOD)) % MOD);
    }

    private static long factorial(int n) {
        return IntStream.rangeClosed(2, n).reduce(1, (a, b) -> (a * b) % MOD);
    }

    private static long modInverse(long a, int m) {
        return power(a, m - 2, m);
    }

    private static long power(long x, long y, int p) {
        long res = 1;
        x = x % p;
        while (y > 0) {
            if ((y & 1) == 1) {
                res = (res * x) % p;
            }
            y = y >> 1;
            x = (x * x) % p;
        }
        return res;
    }

    public static void main(String[] args) {
        System.out.println(countAnagrams("too hot")); // 18
        System.out.println(countAnagrams("aa")); // 1
    }
}

解释

方法:

此题解利用了组合数学的原理来计算每个单词的异构字符串数目。对于每个单词，我们首先计算每个字符的出现次数。然后，我们利用组合公式 C(n, k) 来计算每个字符能构成的异构字符串数目，并将这些数目相乘得到该单词的异构字符串总数。最后，我们将所有单词的异构字符串总数相乘得到整个字符串的异构字符串数目。

时间复杂度:

O(n^2)

空间复杂度:

O(n)

代码细节讲解

🦆

题解中提到利用组合公式C(n, k)来计算每个字符能构成的异构字符串数目，请问具体是如何应用这个公式来得出每个单词的异构字符串总数的？

▷

在题解中，每个单词中的字符首先被统计其出现次数。利用组合公式 C(n, k)，计算一个字符可以在不同位置出现的组合数。例如，单词中有 n 个位置，若某个字符出现 k 次，则这个字符在单词中的排列组合数为 C(n, k)。对于单词中的每个字符，计算完这种组合后，n 需要减去这个字符的出现次数（因为这些位置已被占用），然后对下一个字符使用更新后的 n 继续计算。这样，对于单词中的所有字符，他们的组合数乘积给出了该单词的所有可能的异构字符串数目。

🦆

为什么在计算过程中需要对结果取模10^9+7？这个操作有什么特别的意义或者是必要性吗？

▷

在计算大数的组合数时，结果很快就会非常大，可能导致计算机处理这些大数时出现溢出错误。使用模数 10^9+7 进行取模操作是常见的技术，主要是因为 10^9+7 是一个大的质数，这有助于减小计算时的冲突和保持结果的稳定。此外，取模操作还可以保持数值在一个可管理的范围内，避免因数值过大而导致的性能问题。

🦆

在解法中，每次计算组合数后都进行了取模操作，这样做是否可能影响最终的计算结果或计算过程的效率？

▷

进行取模操作不会影响最终的计算结果的正确性，因为模运算满足分配律。即，(a * b) % c = [(a % c) * (b % c)] % c。因此，每次计算后进行取模可以保证结果始终在模数范围内，避免溢出。关于效率，虽然取模操作本身需要额外计算，但它可以防止数字过大，从而加快其他数学运算的速度，总体上是提升了效率。

🦆

题解中使用了Python的collections.Counter和math.comb函数，它们的内部实现是怎样的，对于算法的性能有何影响？

▷

Python的 `collections.Counter` 用于快速统计各元素的数量，其内部实现基于哈希表，因此能在平均O(n)时间内完成对元素的计数，这对于算法性能是非常有利的。`math.comb` 函数用于计算组合数 C(n, k)，它可能使用递归方法或更优化的迭代方法来计算阶乘和组合，确保计算效率。这些函数的使用大大简化了代码的复杂度并提升了执行效率，使得处理大量数据或复杂计算成为可能。

统计同位异构字符串数目

题目描述

代码结果

解释

代码细节讲解

相关问题