In-class activity: making simple network graphs with Palladio

[note: this is an activity I developed for my Smith College course “Doing Digital History” taught in the Spring of 2020]

Early American Newspaper Citations

  1. Download this data that I produced as a part of my research.
    • It represents nearly the entirety of late 18th century North American newspapers’ citations to newspapers outside of North America. These are all aggregated based on city and weighted by the number of citations. It’s structured in five-year increments.
  2. Go to Palladio. Click start.
  3. Paste in the data from the above file. Click “load.”
  4. Click “Graph” on the header (in between “Map” and “Table”)
  5. Under settings, for “Source,” choose “City of Origin.” For “Target,” choose “City of Publication.” (it doesn’t actually matter which of these you choose for which). Tick “Highlight” for one of them.
  6. Tick “Size Nodes,” then according to “Sum of Weight.”
  7. You’ve just created a very simple network graph.
  8. Click “Timeline” below. Palladio will recognize “Year” as a temporal variable, and allow you to look at how the network looks during specified time periods.
    • Note that the data is structured in five-year increments: 1755–1759, 1760–1764, etc. You can’t get more granular than that.
    • How does this network of citation change over time?
  9. Click “Facet.” This will allow you to explore the data based on particular cities.
    • How does Boston’s role in the network change over time?

Quaker dataset

Try it on your own with a different dataset. Use sample data about 18th-century Quaker networks from The Programming Historian’s tutorial for Python: edgelist; node list.

  1. Open up the edgelist and examine it.
  2. Copy-and-paste the edgelist into the sample screen on Palladio where you loaded the previous data.
  3. Under the Untitled table, click “Source.” Then click “Add a New Table.” Paste in the data from the node list. Click “done.” Do the same for “Target.”
  4. See if you can make a network graph from here.
  5. What does this graph tell you? What doesn’t it tell you? If you knew nothing about the history of 17th-century Quakerism, what would you conclude about it based on this data?
  6. What’s different about this dataset from the previous dataset on citations?

In-class activity: topic modeling the State of the Union with MALLET

[note: this is an activity I developed for my Smith College “Doing Digital History” class from Spring 2020]

This quick guide is based on a more complete tutorial in The Programming Historian.

Step 1: Download these three things

Java developer’s kit

Download MALLET.

Download the SOTU corpus. [this is a slightly modified version of a corpus I found here]

  • N.B.: the Java kit takes about 300mb of space. If you’d prefer not to install software on your personal computer, hop onto one of the classroom PCs.
  • I have written these instructions for Mac users, but they can be easily translated to PC instructions. Using PC guidelines mostly involves making sure that the slashes go the opposite way. See Mac and PC directions on The Programming Historian site.

Step 2: Install MALLET

  • Unzip the MALLET file.
  • Place it in the /user/ directory.
  • Place the “sotu” file, without changing any names, within the MALLET file directory.

Step 3: Open Terminal and begin entering some commands

In the command line, enter:

cd mallet-2.0.8

This will tell the Terminal to change change the directory to MALLET. If you get an error message, that probably means that the MALLET file isn’t in your Users directory.

Next, enter this code at the command line:

./bin/mallet import-dir --input sotu --output sotu.mallet --keep-sequence --remove-stopwords

For PCs, enter this command:

bin\mallet import-dir --input sotu --output sotu.mallet --keep-sequence --remove-stopwords

This command tells MALLET to take the corpus within the “sotu” file, and transform it into a single, readable file. It will also remove common “stopwords,” like “and” or “the,” which aren’t helpful for textual analysis. It should return with:

"Labels =

Next, enter this command:

bin/mallet train-topics --input sotu.mallet --num-topics 20 --optimize-interval 20 --output-state sotu.gz --output-topic-keys sotu_keys.txt --output-doc-topics sotu_composition.txt

For PCs, the command is:

bin\mallet train-topics --input sotu.mallet --num-topics 20 --optimize-interval 20 --output-state sotu.gz --output-topic-keys sotu_keys.txt --output-doc-topics sotu_composition.txt

This is more complicated. It tells MALLET to take the file that it just created and to analyze it for 20 topics (“–nom-topics 20”). Depending on the speed of your machine, it may take around a minute to process.

Step 4: Analyze the Keys

It will spit out two files that you need to look at. First, the “keys” will look something like this, but different:

(Click for larger view)

In the mallet folder, which should still be in your “user” file, you will find a document called sotu_keys.txt. Open it up.

If something prevented you from running the topic model, here are my sample keys and composition.

Each line, numbered 0 through 19 on the left, represents a “topic.” The decimal number next to the words represents the prevalence of that particular topic throughout the corpus. The string of 20 disconnected words are some of the words that MALLET identified as being part of a topic. MALLET doesn’t have the ability to tell you what each topic is about. You need to figure that out for yourself.

Some of the topics may be difficult to discern. But others might be intriguing. Consider topic 16, above:

war free nations world military peace production forces united communist men defense effort strength aggression fighting soviet attack europe peoples

Given that we know that these topics come from States of the Union, we can probably guess that the topic that’s identified here has something to do with war, particularly in the 20th century given that we see words like “communist” and “soviet.” To test that hypothesis, we need to look at the “composition” file.

Step 5: Composition

Open your second file, “sotu_composition.txt” in Microsoft Excel (if possible). This will make it easier to analyze. But it’s still a bit bewildering, so we’ll have to organize it a bit. In order to do that, add a new top row, and do a “find and replace” and get rid of the extraneous material in column B (everything other than the year and .txt). Sort the file by column B, so that the rows are ordered by year. You can add the name of each topic to the first row.

Your file should look something like this:

(Click for a larger view)

Each row represents a text file, or a State of the Union. Each column represents a topic that MALLET has identified.

Now you can graph the topics. In this example, I’ll try topic 16, which we looked at earlier, which is in column S. By graphing column S, you can see the relative frequency of that topic across the corpus. Because we’ve ordered the corpus by year, and because there’s one SOTU every year (with very few exceptions) you get a year-by-year breakdown of when each topic appeared.

Once I’ve selected column S, inserted a line graph of it, and added horizontal labels to it, I get a graph that looks like this:

(Click for a larger view)

Okay, so that’s not too surprising. Given that we thought that topic might be about 20th century wars, it makes sense that we see much more frequency for this topic in the 20th century.

Our model shows that this topic spikes a bit during the War of 1812, more in World War I, and much more during World War II and the Cold War. Given that we saw words like “soviet” and “communist,” it’s not surprising to see that this model applies most to the wars of the mid-20th century.

But what is this missing? We don’t see such large spikes during the Civil War or the wars of the Middle East during the late 20th and 21st century. Why? Well, let’s return to our keys:

war free nations world military peace production forces united communist men defense effort strength aggression fighting soviet attack europe peoples

Now that I have a bit more context, my eye goes straight to “Europe” here. Perhaps MALLET has really identified a “European wars” topic. In doing so, perhaps it’s indicating that the rhetoric in States of the Union surrounding the Civil War and wars in the Middle East was different in some significant ways than the rhetoric surrounding wars between the United States and European powers.

That may be worth investigating in more depth. If I ever wanted to write a book or article about presidential rhetoric about war, that would be a potentially interesting line of inquiry. So MALLET has helped me to generate a historical question: “How did U.S. presidential rhetoric about wars in Europe differ from rhetoric surrounding non-western wars?”

Step 6: Analyze your own keys and composition

A crucial thing to note about MALLET is that every time you use it on the same corpus, even with the same inputs in the terminal, you’ll generate different results. Your results will be different from mine, and from your classmates.

If you want to run this process a second time, you can simply enter this command again, which features a slight amendment to the output file names:

bin/mallet train-topics --input sotu.mallet --num-topics 20 --optimize-interval 20 --output-state sotu.gz --output-topic-keys sotu_keys_2.txt --output-doc-topics sotu_composition_2.txt

Try to find a topic that has some coherence, and graph it to show how it changed over time. Does it make sense given what you know about American history? What surprises you? Is there anything that doesn’t make sense at all? What kinds of historical questions does this raise for you? How would you answer them?

Step 7: Think about potential applications to other text corpuses

The SOTU is a convenient example because it’s annual and their texts are easily accessible on the internet. But most of you probably aren’t that interested in States of the Union or presidential rhetoric as a historical subject. What might you be interested in investigating using this tool?

Anonymity in Early American Print Culture

1[Note: I wrote a different piece on the same topic for the Washington Post’s Made by History section, available here]

Lately, anonymity has, oddly, become rather public. Today, the New York Times published an unusual anonymous op-ed, arousing considerable debate. President Trump has taken to tweeting about his distrust of anonymous sources in news stories. On the fringes, the conspiracy theory known as “QAnon” posits that there is a high-level pseudonymous government source named “Q” who is slowly revealing the plot of a massive conspiracy. In each of these cases, an individual has chosen to reveal information or opinions without attaching their (real) name to it—likely in order to avoid compromising their job (or in the case of “Q,” exposing the charade for what it is).

Recently, some commentators have attempted to defend, or attack, anonymity today by looking to revolutionary America. I responded to one argument about anonymity in early American newspapers with a few twitter threads. But the anonymity and pseudonymity that people such as Alexander Hamilton, Arthur Lee, John Jay, Ben Franklin, Thomas Jefferson, and James Madison used often expressed something very different and specific to the moment that they were writing.

In the era of the imperial crisis and revolutionary war, Anglo-American newspaper printers published hundreds of political essays that expressed their authors’ opinions—which generally used a pseudonym. Additionally, a great deal of the news that they published was anonymously sourced. Most prominently, newspapers regularly printed extracts of letters that had arrived in town from other cities. Usually, these letters did not note the name of the author—though they might describe him (it was usually a him) as a “gentleman of veracity” or as a merchant from a very “respectable house.”

In addition to newspapers, authors of political pamphlets often chose to use pseudonyms or remain anonymous. The most famous pamphlets of the American Revolution, Tom Paine’s Common Sense and John Dickinson’s Letters from a Farmer in Pennsylvania (originally a series of newspaper essays)were at least initially anonymous.

For the most part, anonymous sourcing for essays and for news was uncontroversial and expected. These were longstanding practices in the eighteenth-century Anglophone Atlantic world that predated the American Revolution. But on a few occasions, an author’s anonymity rankled commentators. Tracing the example of one of these occasions helps us to understand the use and limitations of anonymity in revolutionary America.

In 1764, a Philadelphian named William Smith published an anonymous pamphlet attacking Benjamin Franklin. Even though Smith did not use his name, Franklin and his allies quickly surmised who was behind the pamphlet. This speaks to an important fact about anonymity in this era—places like Philadelphia and New York City were, effectively, small towns. In many cases, it was not difficult for a city’s elite to figure out who was behind a pseudonym. This is obviously quite different in an internet age, when anonymity can be quite effective at shielding a person’s identity.

Franklin and his allies organized a response which attacked Smith for remaining anonymous. In response, an anonymous writer explained, “the World, in general, seldom considers a Paper to be more or less true for want of a Name.” If authors were forced to use their real names, he continued, “The Cause of Liberty would often be left to suffer,” because they would be exposing themselves to the “Clamour of Party, or the Resentment of Power.”[1] This person also, correctly, pointed out that Franklin himself had made use of the device of anonymity in the past.

In response, someone writing under the pseudonym “Poplicola” explained that some uses of anonymity were legitimate, while others were not. When an author attacked a “particular Character,” Poplicola argued, it was “proper and necessary” for that author to affix their name. Because this was a “private Cause,” or in other words not something of public concern, it “would be expected” that any responses would not be anonymous.

This logic also provided Poplicola with some justification for using a pseudonym himself. According to his view, pseudonyms and anonymity were legitimate when they were used to speak truth to power and to discuss important topics of public interest. Anonymity was not legitimate to simply attack individuals or spread gossip.

But of course, it was not unusual for a writer to disregard this rule and hide behind a pseudonym in order to to circulate scandalous rumors and assail rivals. As historian Joanne Freeman has written, “Because they enabled men of honor to behave dishonorably, anonymous print warfare had equivocal status. Many considered it a cowardly means of attacking one’s foes without fear of retribution.”[2] Crucially, when an author’s identity could be widely inferred, anonymity allowed them to have it both ways—insulated from consequence and response, but with their identity lending their words authority.

While anonymity could be used for selfish reasons, many American commentators also recognized that it could be extremely valuable to the broader community. As scholar Michael Warner has written, pseudonyms enabled “the virtue of the citizen by the very fact that writing is not regarded as a form of personal presence. The difference between the private, interested person and the citizen of the public sphere appears both as a condition of political validity and as the expression of the character of print.”[3] Speaking through a pseudonym allowed an individual to embody a broader public, or at least appear to. Writers reinforced this by choosing pseudonyms that suggested that they were virtuous and focused on the good of the broader public: Publius (for Publius Valerius, who helped to found the Roman Republic), Brutus (great opponent of Julius Caesar), Cato (for Cato the Younger, another enemy of Caesar), and Catullus (who opposed the conspirator Catiline).

Moreover, anonymity offered an opportunity for writers of opinion essays to speak openly without fear of retribution. By refusing to provide their names, writers protected themselves from ad hominem attacks or political persecution—an important consideration in the era of the American Revolution, when many writers were well-known elites who did could face punishment from the imperial government. Perhaps most importantly, anonymity encouraged readers to focus on arguments, rather than personalities. In these ways, contemporaries recognized that anonymity could be valuable, especially for political essays.

For news items, the relative status of anonymous news sources became somewhat controversial, and even partisan, during the 1790s. Federalist printers began to emphasize the character and status of the sources who contributed to their paper. They regularly recommended news that they knew had come from “high authority,” or from a ship captain’s “own mouth.”[4] In contrast, Republican printers such as Philip Freneau cared less about knowing the identity or status of their sources. Freneau complained that some people wanted to know about the character or status of his correspondents: “whether he be a foreigner, or home born, or well-born… A man of property, or a no property man?” He concluded by asking “such inquisitive persons” to “mind your own business.”[5]

These examples all suggest that anonymity was political, contested, and nuanced in revolutionary America. Much more can, and has, been said on the topic. Today, people rely on anonymity for many of the same reasons: to protect themselves from “doxxing,” to speak freely, and unfortunately to abuse others without fear of consequences. But anonymity also has different meanings and uses today as well. Most prominently, it is much easier today to speak anonymously without fear of being unmasked. In fact, it’s as simple as starting a Twitter account. But today, being anonymous carries no real responsibilities. Anonymous trolls, for example, demonstrate no recognition that their protection is an important privilege—not an inevitable condition.

There are dangers in anonymity. As we have seen, it can be used against democratic institutions, as Russian “bots” have aspired to do. But for myself, I see the ability to speak anonymously to be an essential tool in a democracy—particularly in our present moment. When power aligns against truth, truth must have a safe harbor from power.


[1] Pennsylvania Gazette, Dec. 27, 1764. The incident is recounted by the editors of the Benjamin Franklin Papers here:

[2] Joanne Freeman, Affairs of Honor: National Politics in the New Republic (New Haven, 2001), 129.

[4] Michael Warner, The Letters of the Republic: Publication and the Public Sphere in Eighteenth-Century America (Cambridge, 1990), 43.

[5] Columbian Herald, or the Southern Star, Nov. 9, 1793; American Minerva, March 17, 1795.

[6] National Gazette, June 18, 1792.

The best thing about looking through a 1775 Philadelphia tax list…

…is the names. Some finds:

  • Moses Bozee
  • Fergus Purdon
  • Lancelot Harrison
  • Everhard Bolton
  • Dorcas Montgomery
  • Hilarius Baker
  • Wendle Zerban
  • Blaze Boyer
  • Urban Fribley
  • Baltus Flisher
  • Christian Hero
  • Jabiz Buzby
  • Benjamoses Kite
  • George Shittz

In the historical namesake category”

  • Peter Campbell
  • Elizabeth Taylor
  • William Faulkner
  • Richard Nixon
  • John Updike

In the “parents lacking in creativity” category:

  • Evan Evans
  • John Johnston
  • Wendle Wendlyn
  • George George
  • Christian Christian
  • James James
  • William Williams

And the aptronym category:

  • John Baker (who was a baker)
  • James Smith (who was, you guessed it, a smith!)

Trump and the Sedition Act

I wrote a piece for the Washington Post‘s “Made by History” blog about Donald Trump’s recent tirades against the press.

In just the past week, we’ve seen the following unfold:

  • Trump has called for the Senate to investigate the “Fake News Networks”
  • Sarah Huckabee Sanders, his press secretary, has insisted that there’s “no difference” between actual “fake news” and mainstream reporting that the Trump administration has deemed inaccurate.
  • Trump has tweeted twice about challenging and potentially taking away NBC’s network license (despite the fact that the FEC doesn’t license entire networks).

Combined with his previous references to opening up libel laws (despite the fact, again, that there is no federal libel law) and his continual assault on the so-called “fake news,” this looks like a dangerous assault on the First Amendment.

It also looks a lot like the final chapter of my dissertation. Something that historians don’t often discuss is that when the Federalist party passed the Alien and Sedition Acts, they used the circulation of false information (“fake news”) as a pretext. Like Trump, they made no distinction between actual made-up nonsense (though it’s funny to imagine the story “The Pope endorses Thomas Jefferson” playing out in the 1800 election) and what they believed to be falsehoods directed at them. The Alien and Sedition Acts were largely an attempt to take control of information networks.

We’re living in a different moment than 1798. Most notably, we benefit from the more expansive definitions of free speech and press freedoms that commentators such as Tunis Wortman and George Hay articulated in response to the Alien and Sedition Acts. Yet that doesn’t mean that the press freedom is as safe as we would like to imagine.

In fact, the Trump administration’s campaign of intimidation against the media may already be having an effect. Some members of the media may be acting more cautiously in covering the administration than they otherwise would be. They’re only human. Moreover, his attacks on the mainstream press are reducing his supporters’ trust for these institutions and increasing their credulity for alternative news sources. A Morning Consult poll of American “brands” indicates considerable partisan polarization over news outlets. Even if Trump doesn’t actually attempt to regulate or control media outlets, his rhetoric has already had powerful consequences.

If President Trump does take action beyond tweeting, we could see a reaction much like the one that took place in 1798–1800 that ousted John Adams and the Federalist party from political power. It’s worth noting that trust in the media overall has risen from 39% to 48% since Trump’s election (which is now quite a bit higher than President Trump’s approval rating). If I was advising Trump, I would tell him to keep his hands off the First Amendment.

A few of my favorite historical names

Judge William Wayne Justice.

I used to keep a running list of amazing names of historical figures as I came across them. Here are a few of my favorites in no particular order.

  • Waltrina Stovall, restaurant critic.
  • Bacon Tait, Virginia slave trader.
  • Everardus Bogardus, minister in New Netherland.
  • Flavel Shurtleff, Massachusetts attorney.
  • Sir Henry George Outram Bax-Ironside, British diplomat.
  • Hayward Turnipseed, Georgia dog owner.
  • Reverdy Cassius Ransom, AME bishop and civil rights activist.
  • Judge William Wayne Justice, American jurist, unsurprisingly from Texas.
  • Fulwar Skipwith, American consul in Paris during the French Revolution.
  • Wager Swayne, Union army colonel.

Citation Visualization

I’ve been exploring Palladio, the Stanford Humanities + Design Lab’s tool for data visualization. I’ve been focusing on American newspaper citations to foreign papers in the late eighteenth century. This map depicts citations from the years 1755 through 1804. This data forms a part of my second chapter, “English Channels,” which examines the impact of the American revolutionary war on information networks.

1755-1804 Map.png