Problem G
That's How I Roll!
A new Mario Party minigame “Dicey Dictionary” has been introduced and Bowser is trying to win it all. The mysterious language $L$ of the Mario Universe consists of some set of words. Bowser is trying to decide what letters to put on his customized regular $6$-sided dice in order to maximize his chances at winning it all. Instead of numbers, each die has a letter on each side.
In Dicey Dictionary, you play a game where you build a word $w$ by repeatedly rolling the die for as long as you please. $w$ starts as an empty string, and after each roll, the letter rolled is appended to the end of $w$. After you stop rolling, if $w$ is in the language $L$, you receive a score equal to the length of $w$. You also can receive partial credit: if $w$ is a prefix of some word in $L$, you also receive $|w|$ points where $|w|$ represents the length of $w$. Otherwise, you receive a score of zero.
Bowser wants to maximize his expected score, so he can be the best Mario Party character. In order to maximize his expected score, which letters should he choose to have on his die?
Input
The first line of the input consists of a single integer $1\leq T \leq 1\, 000$, which denotes the number of testcases which follow. The first line of each testcase consists of a single integer $1\leq L \leq 200\, 000$, which denotes the number of words in the language $L$. Then follow $L$ lines, each containing exactly one string. Each string in $L$ consists only of lowercase english letters $a$-$z$, and no letter appears more than once in the same word.
The total length of all strings across all testcases does not exceed $1\, 000\, 000$.
Output
For each testcase, output two lines. The first line must contain Bowser’s optimal choice of die, represented as a string that consists of exactly 6 distinct lowercase letters in lexicographic order. If there are multiple solutions, output the lexicographically smallest one. The second line must contain the expected score if Bowser chooses to play the game with this die. Your answer will be judged correct if it has an absolute or relative error of at most $10^{-6}$.
Sample Explanation
The choice of letters on the die is clearly optimal as those are the only letters present in the language $L$.
With probability $\frac{5}{6}$, Bowser rolls a letter that is not an $a$, and receives a score of $0$ regardless of any future rolls. This is because there is no word in the language $L$ that begins with $b$, $c$, $d$, $e$, or $f$.
With probability $\frac{1}{6}$, he rolls an $a$. In this case, if he chooses to stop rolling, his score is $1$, leading to an expected score of $\frac{1}{6} \cdot 1 = \frac{1}{6}$. However, if he chooses to roll again, there’s a $\frac{5}{6}$ chance that he spells one of the words in $L$, and a $\frac{1}{6}$ chance he spells the word $aa$ which is not a prefix of any word in $L$. Thus, the expected score from rolling again is $\frac{1}{6} \cdot \frac{5}{6} \cdot 2= \frac{5}{18}$, which is greater than $\frac{1}{6}$.
Thus, if Bowser plays optimally, his expected score using this die is $\frac{5}{18}$.
Sample Input 1 | Sample Output 1 |
---|---|
1 5 ab ac ad ae af |
abcdef 0.27777777777777777778 |