Week 204: Literary Characters and Automation

Scripting languages and quiz bowl, and the need for an editor.

Feb 08, 2024

Last week we talked about building a new study guide for world literature. A by-product of doing that was a books and authors list which I could convert into simple questions. I got into looking at that again this week because I realized two facts:

This sort of thing could be used for television study.
The list I had was too small to be used even for television study.

The first task I made for myself this week was to locate my larger collections of data. Over a decade ago, I had pulled together the most relevant portions of the Larousse Dictionary of Literary Characters, and compiled it into a spreadsheet. (As a side note the entire Larousse collection is an excellent source of too much depth for the subject for quiz bowl, in a lot of categories that are too narrow a wedge of the distribution. They possess immense potential, but a thorough study leaves you with more information than you’ll need, the classic trap of specialization.)

Once I found the spreadsheet, I turned it into a comma delimited csv file and slurped it into a python script, which then converted it into two dictionary variables. For those of you who aren’t programmers or familiar with python, a dictionary stores information as a set of key:value pairs. The value can be a list, as it is in this case. The Authors dictionary has the author’s full name as the key, and the list of works recorded in the dictionary is the value. The Books dictionary has keys consisting of each work which was filled in as values for the Authors dictionary, and in the values of that dictionary, every character included in the Dictionary of Literary Characters.

Once you have set up this arrangement, what can you do? Utter magic…at least in the field of creating practice questions.

The value of literary characters.

Last week I showed you the pyramid of details of literary works. The idea being that while we need to know the basic fact of the cover of the book (title and author) to start, we need more details of the content of the books as we proceed through quiz bowl.

The pyramid of detail:

Books and Authors lists (almanac, best sellers list, reading lists)
Benet's (one paragraph summary)
Masterplots (one page)
Other questions (infused with noise and filtered by preferences of players and writers of questions.)
Actually reading the books.

Literary characters are one of the easiest and most common second sets of clues to build off of. Very few works of fiction have no characters, and most of those have given characters names. A dictionary of literary characters, with single paragraph summaries of the listed characters is the same depth devoted to the character as Benet’s does to the work. And these sketches of characters can be turned into questions just as easily. The pattern of describing a character and then giving their name in the next sentence is a pattern seen in every circuit packet’s distribution. Since television questions are limited in length, they’re also limited in detail. So both Benet’s and Larousse can slide into those questions where other details are not a good fit. And now that we’ve converted that information into a machine readable and manipulatable format, we can make practice questions that look a lot like what we see on TV.

You'll notice in this edition, I'm still smarting a bit about that 60-second round having titles with missing words. If it happens again, I’m not letting my team enter that unprepared.

So with your two dictionaries (of data), you can run a particular algorithm over them to produce practice questions.

An algorithm like this:

Find all keys in Authors, which have a value of length 3 or more.

Shuffle the value of each key, and pull the first three entries off the list.

Allows one to iterate over the filtered author dictionary and output with a print statement.

print " and ".join(Authors[x][1:3])+" were written by what author of "+Authors[x][0]+"?\t"+x

And output a set of very simplistic questions that nonetheless teach the lesson, and can be read in practice, fed into quizlet or kahoot, or dropped into Anki for spaced repetition.

Now this is automation. Automation is ugly, it needs an editor and someone willing to nuke plenty of these, and accept a 10% return on their production run. It needs someone to notice the flaws in the construction (these obviously produce questions too similar for use in a game without dilution), and the flaws in the dataset. (In this case, Larousse is overwhelmingly deep on British authors, and shallow on world lit.) But what this is not is AI. Anyone with a passing interest in scripting languages could build up a database of questions. The key is figuring out the sort of questions you want to output, the formatting of the data, and what you need to cut.

So if you wanted questions which highlighted title characters, you could just find values in the Books dictionary that matched the same entry’s key value. Then find the matching key from the book in the Authors list of values and extract that particular key (the author’s name. Pick ten and you have a lightning round.

If you wanted to create a question given three titles with the same word, you’d simply break each title in the Books.keys() data down into words using .split(“ “) and populate a Counter with the output.

from collections import Counter

You can filter out the common words (“and”, “to”, …) and then filter out any word that appears less than three times. Iterating on that list of words and finding matching examples among the Books.keys(), and pulling the corresponding Authors.keys() for each book gives you something like this:

import itertools,collections,random
Titles= [x for x in (books.keys())]
Words=[y for y in flatten([x.split(" ") for x in Titles])]
W = collections.Counter(Words)
CommonWords=[x for x in W.keys() if W[x]>2]
for i in CommonWords:
    MM = random.sample([x for x in Titles if i in x],3)
    Title1=[x for x in authors.items() if MM[0] in x[1]]
    Title2=[x for x in authors.items() if MM[1] in x[1]]
    Title3=[x for x in authors.items() if MM[2] in x[1]]
    print MM[0].replace(i,"_"*len(i)),Title1[0][0], MM[1].replace(i,"_"*len(i)), Title2[0][0], MM[2].replace(i,"_"*len(i)),Title3[0][0]

which will print the titles and author of each of three books, with the common word replaced by a set of underscores. This is perfect for formatting into Anki, quizlet, google slideshow or the chyron on It’s Academic (this format was occasionally used by the show, but with only two works displayed. We use three because we’re trying to build associations.)

Remember me saying I was still smarting about that round on television? I got some medicine for that.

I’d finally like to note on this subject that while this does allow you to produce almost complete questions for televised competition, it’s not all that far removed from the basic skeleton that can be used for circuit questions, even up to the highest levels of difficulty. With the dictionary of literary characters you just extracted your data from, you’d have the basic meat to fill the bones of a questions that starts by reading like this:

In one work by this author, [description of Character1 taken from Larousse].  In another book by this author, [description of Character2].  In addition to [Character1.name] and [Character2.name], the title character in another novel [description of Character3].  For 10 points—name this author of [Book1], [Book2] and [Book3].

Automation can serve as the inspiration, but it will require human effort to make it usable. However if you’re looking at this and thinking it needs to be made pyramidal, remember that we could plug in a frequency list and get a very good first approximation to the right order for Book1, Book2, and Book3.

If we were to sub in a list of philosophical works and authors, we could easily turn them into a tossup based on a word common in a bunch of titles. In fact, I’m sure this process, done manually over a list of philosophy texts has inspired more than a few questions.

I do this not to demonstrate that automation or AI is going to eat every writer’s lunch. I’m doing it because it’s a structure that can create questions used for practice or coaxed into useable shape for competition in exactly the same way that a writer would arrange clues. Players can use that to train, to guide their study in a subject, or even to inspire them to write questions. The knowledge that is gained from those activities will carry through and be valuable in competition because the clue arrangement and selection will be the same.

Share Holed up with a buzzer

OTW

This week plays the hits, mostly. Atalanta in mythology is due to her falling in the gap of common answers that weren’t covered in 9th grade mythology. The You Gotta Know selection is due to a slideshow presented yesterday which we will discuss next week. The Star this week is based on my belief that the writer of the show will dip back into the James Webb camera roll. And I was originally pairing this week’s In Our Time with last week’s battle, but that seemed overkill.

# Poem OTW: Harlem

https://poets.org/poem/harlem-0

# Poet OTW: Langston Hughes

https://poets.org/poet/langston-hughes

# YouTube Terminology Video OTW