奖励最顶尖的 K 名学生

难度:

标签:

题目描述

You are given two string arrays positive_feedback and negative_feedback, containing the words denoting positive and negative feedback, respectively. Note that no word is both positive and negative.

Initially every student has 0 points. Each positive word in a feedback report increases the points of a student by 3, whereas each negative word decreases the points by 1.

You are given n feedback reports, represented by a 0-indexed string array report and a 0-indexed integer array student_id, where student_id[i] represents the ID of the student who has received the feedback report report[i]. The ID of each student is unique.

Given an integer k, return the top k students after ranking them in non-increasing order by their points. In case more than one student has the same points, the one with the lower ID ranks higher.

Example 1:

Input: positive_feedback = ["smart","brilliant","studious"], negative_feedback = ["not"], report = ["this student is studious","the student is smart"], student_id = [1,2], k = 2
Output: [1,2]
Explanation: 
Both the students have 1 positive feedback and 3 points but since student 1 has a lower ID he ranks higher.

Example 2:

Input: positive_feedback = ["smart","brilliant","studious"], negative_feedback = ["not"], report = ["this student is not studious","the student is smart"], student_id = [1,2], k = 2
Output: [2,1]
Explanation: 
- The student with ID 1 has 1 positive feedback and 1 negative feedback, so he has 3-1=2 points. 
- The student with ID 2 has 1 positive feedback, so he has 3 points. 
Since student 2 has more points, [2,1] is returned.

Constraints:

1 <= positive_feedback.length, negative_feedback.length <= 10⁴
1 <= positive_feedback[i].length, negative_feedback[j].length <= 100
Both positive_feedback[i] and negative_feedback[j] consists of lowercase English letters.
No word is present in both positive_feedback and negative_feedback.
n == report.length == student_id.length
1 <= n <= 10⁴
report[i] consists of lowercase English letters and spaces ' '.
There is a single space between consecutive words of report[i].
1 <= report[i].length <= 100
1 <= student_id[i] <= 10⁹
All the values of student_id[i] are unique.
1 <= k <= n

代码结果

运行时间: 107 ms, 内存: 23.9 MB

/*
思路：
1. 使用Java Streams操作数据。
2. 创建两个集合存储正面和负面的词汇。
3. 对每个学生的评语进行流操作，计算每个学生的得分。
4. 使用Comparator根据得分和ID进行排序。
5. 获取前k个学生的ID。
*/

import java.util.*;
import java.util.stream.*;

public class Solution {
    public List<Integer> topStudents(String[] positive_feedback, String[] negative_feedback, String[] report, int[] student_id, int k) {
        Set<String> positiveWords = new HashSet<>(Arrays.asList(positive_feedback));
        Set<String> negativeWords = new HashSet<>(Arrays.asList(negative_feedback));

        return IntStream.range(0, report.length)
            .mapToObj(i -> new int[]{student_id[i], calculateScore(report[i], positiveWords, negativeWords)})
            .sorted((a, b) -> b[1] != a[1] ? Integer.compare(b[1], a[1]) : Integer.compare(a[0], b[0]))
            .limit(k)
            .map(a -> a[0])
            .collect(Collectors.toList());
    }

    private int calculateScore(String report, Set<String> positiveWords, Set<String> negativeWords) {
        return Arrays.stream(report.split(" "))
            .mapToInt(word -> positiveWords.contains(word) ? 3 : (negativeWords.contains(word) ? -1 : 0))
            .sum();
    }
}

解释

方法:

此题解采用的方法是：首先将正面反馈词汇和负面反馈词汇存入集合中，方便快速查找。对于每个学生的评语，通过分割字符串获取到评语中的每个单词，然后判断这些单词是否存在于正面或负面词汇的集合中，根据匹配结果更新学生的得分。最后，将学生根据得分从高到低排序，得分相同则按ID升序排序。最终返回得分最高的k名学生的ID。

时间复杂度:

O(n*m + n log n)

空间复杂度:

O(p+q+n)

代码细节讲解

🦆

你是如何处理学生评语中多次出现相同正面或负面单词的情况？是否每次都计算其分数影响？

▷

在题解中，每次一个单词在评语中出现，无论是正面还是负面单词，都会计算其对总分的影响。如果正面单词多次出现，每次出现都会增加3分；如果负面单词多次出现，每次出现都会减少1分。这种处理方法确保了评语中每个词的影响都被准确计入最终的得分。

🦆

在进行学生排序时，如果有多名学生分数相同，是否已经确保了ID较小的学生排名更前？代码中的排序逻辑能否正确实现这一要求？

▷

是的，代码中使用了排序功能，首先按照分数降序排序，如果分数相同，则按照学生ID升序排序。这是通过元组`(g, sid)`进行的排序，其中`g`是分数，`sid`是学生ID。排序键`key=lambda x: (-x[0], x[1])`确保了在分数相同的情况下，ID较小的学生排在前面。

🦆

请问在此题解中，为何选择将正面和负面反馈词汇存储为集合而不是列表？集合在这里有什么特别的优势吗？

▷

在此题解中选择将正面和负面词汇存储为集合而不是列表，是因为集合在Python中基于哈希表实现，提供了平均时间复杂度为O(1)的查找效率。这使得每次检查一个单词是否为正面或负面词汇时更快，特别是当词汇列表较大时。相比之下，列表的查找效率为O(n)。因此，使用集合可以显著提高算法的性能。

🦆

在处理学生的评语分割时，是否考虑了诸如标点符号、大小写不一致等实际问题？这些问题在代码中是如何处理的？

▷

题解代码中没有直接处理评语中可能出现的标点符号或大小写不一致的问题。这意味着如果评语中的单词由于标点符号或大小写问题而与集合中的单词不完全匹配，它们将不会被识别为正面或负面词汇。在实际应用中，可以通过预处理步骤来改进，例如使用正则表达式移除标点符号，以及将所有单词转换为小写或大写形式，以确保匹配的一致性。

奖励最顶尖的 K 名学生

题目描述

代码结果

解释

代码细节讲解

相关问题