统计一个字符串的 k 子序列美丽值最大的数目

难度:

标签:

题目描述

You are given a string s and an integer k.

A k-subsequence is a subsequence of s, having length k, and all its characters are unique, i.e., every character occurs once.

Let f(c) denote the number of times the character c occurs in s.

The beauty of a k-subsequence is the sum of f(c) for every character c in the k-subsequence.

For example, consider s = "abbbdd" and k = 2:

f('a') = 1, f('b') = 3, f('d') = 2
Some k-subsequences of s are:
- "abbbdd" -> "ab" having a beauty of f('a') + f('b') = 4
- "abbbdd" -> "ad" having a beauty of f('a') + f('d') = 3
- "abbbdd" -> "bd" having a beauty of f('b') + f('d') = 5

Return an integer denoting the number of k-subsequences whose beauty is the maximum among all k-subsequences. Since the answer may be too large, return it modulo 10⁹ + 7.

A subsequence of a string is a new string formed from the original string by deleting some (possibly none) of the characters without disturbing the relative positions of the remaining characters.

Notes

f(c) is the number of times a character c occurs in s, not a k-subsequence.
Two k-subsequences are considered different if one is formed by an index that is not present in the other. So, two k-subsequences may form the same string.

Example 1:

Input: s = "bcca", k = 2
Output: 4
Explanation: From s we have f('a') = 1, f('b') = 1, and f('c') = 2.
The k-subsequences of s are: 
bcca having a beauty of f('b') + f('c') = 3 
bcca having a beauty of f('b') + f('c') = 3 
bcca having a beauty of f('b') + f('a') = 2 
bcca having a beauty of f('c') + f('a') = 3
bcca having a beauty of f('c') + f('a') = 3 
There are 4 k-subsequences that have the maximum beauty, 3. 
Hence, the answer is 4.

Example 2:

Input: s = "abbcd", k = 4
Output: 2
Explanation: From s we have f('a') = 1, f('b') = 2, f('c') = 1, and f('d') = 1. 
The k-subsequences of s are: 
abbcd having a beauty of f('a') + f('b') + f('c') + f('d') = 5
abbcd having a beauty of f('a') + f('b') + f('c') + f('d') = 5 
There are 2 k-subsequences that have the maximum beauty, 5. 
Hence, the answer is 2.

Constraints:

1 <= s.length <= 2 * 10⁵
1 <= k <= s.length
s consists only of lowercase English letters.

代码结果

运行时间: 72 ms, 内存: 17.0 MB

/*
 * Java Stream Approach:
 * The problem can be solved using a similar approach as above but with streams.
 * 1. Calculate character frequencies.
 * 2. Filter and sort them using streams.
 * 3. Sum the largest 'k' values for the maximum beauty value.
 * 4. Count the occurrences of this maximum value.
 */
import java.util.*;
import java.util.stream.*;

public class SolutionStream {
    private static final int MOD = 1000000007;

    public int maxBeautySubsequences(String s, int k) {
        int[] freq = new int[26];
        s.chars().forEach(c -> freq[c - 'a']++);

        int[] sortedFreq = Arrays.stream(freq)
                .filter(f -> f > 0)
                .boxed()
                .sorted(Collections.reverseOrder())
                .mapToInt(Integer::intValue)
                .toArray();

        long maxBeauty = 0;
        long maxBeautyCount = 0;
        for (int i = 0; i <= sortedFreq.length - k; i++) {
            long beauty = IntStream.range(i, i + k).mapToLong(j -> sortedFreq[j]).sum();
            if (beauty > maxBeauty) {
                maxBeauty = beauty;
                maxBeautyCount = 1;
            } else if (beauty == maxBeauty) {
                maxBeautyCount++;
            }
        }
        return (int) (maxBeautyCount % MOD);
    }
}

解释

方法:

这个题解首先通过计算每个字符在字符串中的出现次数，然后进一步统计每个出现次数的字符有多少个。接着，从出现次数最高的字符开始，尝试构建长度为k的子序列，计算其美丽值。如果某个出现次数的字符数量足以构成长度k的子序列，那么直接计算该子序列的美丽值并返回。如果不足以构成长度k，就将这些字符的美丽值加入总和，并减少k的值，继续寻找下一组出现次数较多的字符。这种方法保证了在满足子序列长度为k的条件下，尽可能地使用出现次数高的字符，从而最大化子序列的美丽值。

时间复杂度:

O(u log u) 其中 u 是不同出现次数的数量

空间复杂度:

O(u) 其中 u 是不同出现次数的数量

代码细节讲解

🦆

在题解中提到首先计算每个字符的出现次数，然后是每个出现次数的字符数量。请问如何确保这种方法在所有情况下都能正确统计？

▷

在Python中，使用`collections.Counter`可以直接计算每个字符的出现次数。首先，通过`Counter(s)`获取字符串`s`中每个字符的出现次数，然后再次使用`Counter`对这些出现次数进行计数。这种方法是基于哈希表实现的，可以准确快速地统计字符及其出现次数，以及出现次数的频率。因此，只要输入是一个有效的字符串，这种方法都可以正确地统计每个字符的出现次数以及每个出现次数的字符数量。

🦆

题解中使用了逆序遍历出现次数的方式来构建子序列，这种方法是否总是保证可以得到最大的美丽值？是否有可能遗漏更优的字符组合？

▷

题解中采用逆序遍历是基于假设：越高的字符出现次数对美丽值的贡献越大，因此首先使用出现次数最多的字符组合尝试构建子序列，以期达到最大美丽值。这种方法在大部分情况下是有效的，因为使用次数高的字符构建的子序列美丽值较大。然而，理论上可能存在特定情况，例如当需要的子序列长度k较长且高频字符不足以组成多个子序列时，可能会需要组合不同出现次数的字符以达到最优解。但在大多数实际情况下，此策略已足够接近最优解。

🦆

在计算美丽值时使用了组合数 `comb(num, k)`，这需要计算阶乘。在实际实现中如何高效地计算这些值，尤其是对于大数如何处理溢出或大计算量？

▷

在实际应用中，可以通过预计算阶乘值并使用模逆元来高效计算组合数。特别是在模数是质数的情况下，利用费马小定理，可以通过计算阶乘的逆元来快速求解组合数，从而避免直接计算大数的阶乘和可能的溢出。此外，使用模数来进行所有运算可以有效防止溢出，同时保持计算的正确性和效率。

🦆

题解中的 `ans` 变量在循环中不断与 `pow(c, num, mod)` 相乘，这里的 `pow` 函数是如何确保不会因为大数运算而导致错误？

▷

Python的`pow`函数内置了模幂运算的功能，可以直接计算`(base^exp) % mod`的值。这种方式不仅计算效率高，而且可以有效避免在幂运算过程中产生过大的中间值导致的溢出问题。通过在每次计算后立即应用模数约束，确保了运算结果始终在安全的数值范围内。这是一种在处理大数运算时常用且高效的方法。

统计一个字符串的 k 子序列美丽值最大的数目

题目描述

代码结果

解释

代码细节讲解

相关问题