An honest thief
Aaron Swartz, an American computing prodigy, was a zealous young advocate for the free exchange of information and creative content online. From the age of 15, when Swartz worked to launch Creative Commons, to his years as a fighter for copyright reform to his protests against the US legislation, Stop Online Piracy Act to his posthumous status as an icon of internet culture, Swartz was inextricably connected to the free culture movement.
Five years ago, Swartz was caught illegally downloading millions of academic articles from nonprofit online database JSTOR, and was subsequently indicted by the government for the theft. As an anti-copyright activist, the justice system decided to make an example of Swartz, facing him with 13 felony charges, a maximum prison sentence of 95 years and a fine of over $3 million. He committed suicide in 2013.
In a new book, released today, Justin Peters examines Swartz's life in the context of two hundred years of struggle over the control of information. Peters also explains how we reached the point where government-funded academic research came to be considered private property and downloading that material in bulk came to be construed as a state crime. In an edited extract from the book, Peters tells the story of that decisive period of Swartz's life.
Guerilla open access
'Open access' is an anodyne term for a profoundly transformative idea. Advocates argue that academic research should be made freely available to the world at the time of publication, and that access should not be contingent on an individual’s or institution’s ability to afford a subscription to a given journal or database. Academic authors do not usually write for profit; rather, their work aims to augment the common store of knowledge. What’s more, since the government often funds their research, it’s not a stretch to claim that the fruits of that research should belong to the public. So why should this material be subject to the same access restrictions as a mystery bestseller or a Hollywood film? As with many other inexplicable policies, the blame belongs to a vestigial middleman.
When a university professor finishes a research project, she typically records her results in an academic paper, which she submits for publication in a peer-reviewed journal. These journals—the reputable ones, at least—operate via volunteers, with authors, editors, and peer reviewers all working for free. Nobody gets paid, or expects to get paid, except the publisher. In exchange for the publisher’s services, which include coordinating the publication and peer-review processes, formatting, and distribution, the author concedes the copyright to her article in perpetuity. It’s a simple trade: the academic publisher assumes the financial risk of preparing and distributing an esoteric work for which there's a limited audience and in exchange retains all the profits that might come from its sale.
In commercial trade publishing, publishers realise profit by selling a book for a relatively low price to a wide audience. Since no wide audience exists for academic papers, academic publishers realise profit by selling them at high prices to the few entities who can’t do without them—libraries and scholars, mostly—which renders these papers functionally inaccessible to the casual or impoverished user.
Today, several multinational publishing conglomerates, such as the Anglo-Dutch company Reed Elsevier, dominate the academic journal business. Tens of thousands of scholarly journals exist, and since the 1970s their subscription prices have risen at a rate higher than the rate of inflation, leading to what librarians have dubbed the "serials pricing crisis". These journals are expensive — yearly subscriptions to specialised STEM (science, technology, engineering, math) journals can cost more than $10,000 — but academic libraries are, more or less, compelled to subscribe. Many academic libraries wind up spending the bulk of their yearly acquisitions budgets on journal subscriptions.
Get more stories from The Long + Short straight to your inbox with our free weekly newsletter
Sign UpThat’s assuming that a library has a meaningful acquisitions budget at all. Many of them do not. This plight is especially common in poor, underdeveloped countries, where librarians have enough trouble keeping their computers on, let alone keeping up with the latest research in a thousand microdisciplines. The result is an ever-widening gap between rich institutions and poor ones.
The open access movement emerged in the early 1990s, when librarians and researchers realised that the internet had the potential to transform academic publishing. Online content distribution could reduce the physical production costs that publishers cited to justify their journals’ high prices. Publishers could let underresourced clients access proprietary material online and incur no direct financial loss by doing so. Academics might even choose to eliminate the middleman and simply publish their research online, for free. “In the old, Faustian days, the reluctant choice was to accept the Faustian pact (of allowing access to a work only to paid ticket-holders) because that was the only way to reach an audience AT ALL,” the open access pioneer Stevan Harnad wrote in 1994. “But now that there is another option, it’s time to rethink all of this.”
JSTOR, which stands for "journal storage,” is an online database of academic journal articles that was conceived in 1993. With complete archival runs of scholarly journals in many academic disciplines available to institutional subscribers in an instant, JSTOR, in many ways, could be considered the incarnate dream of the infinite library.
JSTOR launched in 1997 and has expanded ever since, to the delight of the many students and scholars who have come to rely on its vast digital archives. But soon enough, observed Roger Schonfeld in his comprehensive history of the service, “JSTOR began to behave like a business, with proprietary rights that required protection." And when those rights were threatened, JSTOR did not hesitate to act in self-defence.
Hacks and hackers
In the early evening of September 25, 2010, a JSTOR employee noticed something strange. The JSTOR website was sluggish: tasks were accumulating and going uncompleted, web forms weren’t loading. At 6:48 p.m., the staffer reported the problem in an e-mail to colleagues with the subject line "website sad." Nobody likes a sad website, especially not those people tasked with keeping it happy. The JSTOR tech team examined the problem and three minutes later identified its cause: someone was bombarding the JSTOR servers with download requests. Hundreds per minute. And those requests were unravelling the system. The user in question was clearly using a computer program to initiate download sessions in rapid succession and acquire articles from JSTOR's database in the process. These actions violated JSTOR’s terms of service—and, of more immediate concern to the employees on duty that Saturday night, they threatened the stability of JSTOR servers in Ann Arbor, Michigan. Soon, other JSTOR staffers chimed in. "Any chance the offending scraper has an IP from the Portland area?" one asked. "We had a tool from Portland State University apologize and admit he was using 3+ PCs to mass download after they went to his house and punched him in the face (if only)."
But the activity was coming from the Massachusetts Institute of Technology—and as the night went on, the scraper started to pick up speed.
The JSTOR system couldn't handle these voluminous download requests. Much as a home computer might freeze when launching a dozen programs simultaneously, a computer server can easily stall if hit with lots of requests in rapid succession. It takes a powerful machine to survive such an onslaught, and the JSTOR servers were, apparently, relatively feeble, or at least unprepared.
After some internal debate about what was happening and how to respond, one of the JSTOR staffers sent out the order to "Jack 'em"—that is, to ban the offending MIT IP address from the system. "You mess with the bull," said another, "you get the horns."
And that, for the moment, was that. But by eight the next morning, the scraper had reactivated at a different IP address and resumed downloading. While JSTOR limited the number of articles that a given user could download per session, it did not limit the number of sessions a user could initiate. The MIT scraper had identified this loophole and, at peak activity, had initiated over two hundred thousand download sessions in a single hour—an average of 55.5 new sessions per second. "This is too much activity for the system," one staffer wrote, and JSTOR responded by again banning the offending IP address. Moves and countermoves: the scraper neatly evaded JSTOR's second ban by adopting yet another IP address. This time, JSTOR retaliated by showing the entire school its horns, temporarily banning a wide range of MIT IP addresses. The downloads ceased, and the JSTOR site slowly recovered. Employees of both MIT and JSTOR proceeded to assess the damage.
In an e-mail to Ellen Finnie Duranceau, JSTOR's contact at MIT Libraries, JSTOR user services manager Brian Larsen refrained from characterising the incident as a hostile attack, noting, "This activity is normally a compromised username and password or a student/researcher unaware of the impact of their activities or that this method of gathering PDFs is in violation of our Terms and Conditions of Use." The method—"robotic harvesting"—was not only prohibited, it was unnecessary. Larsen noted that JSTOR was accustomed to working with scholars who required bulk access to articles for research purposes "and would be happy to do so in this case as well if that turns out to be the motivation."
Duranceau and MIT's information technology department soon determined that the download requests originated on a computer that had logged in to the school's network with a guest account, which meant that MIT could not precisely identify the guilty party. Nevertheless, MIT recorded the offending computer's MAC address— basically, an identification number that is unique to every computer’s network adapter, like a fingerprint—and banned that address from the network. Duranceau told Larsen that the harvesting was unlikely to happen again.
However, on October 9, 2010, a JSTOR employee sent an ominous e-mail informing colleagues, "The MIT scraper is back." Just as before, the scraper would start a session and download a document, then start another session and download another document, then repeat the process ad infinitum. With its servers suffering under the strain of the scraping and other users' activities affected by these actions, JSTOR, in an unprecedented move, blocked access to its database for the entire MIT campus to maintain its server stability. Or, as one JSTOR employee put it, "MIT went Rambo on us, and we suspended the whole range."
For JSTOR, this was a drastic measure. Blanket bans for entire institutions risk eliciting angry screeds from scholars wondering why, exactly, the database had failed them in their hour of need. Access remained suspended for several days. Drastic though it may have been, the downloads ceased.
In October, the scraper had downloaded 8,422 articles in 8,515 total sessions. In September, however, the scraper had acquired 453,570 articles from 562 different journals over 1,256,249 sessions. "This is an extraordinary amount and blows away any recorded abuse case that I am aware of," Larsen noted.
What could anyone want with that many articles? The extent and pattern of the robotic harvesting indicated intentionality; the scraping clearly wasn't the work of a student or a professor who had fallen down a research hole. Worries mounted after a dive into the downloaded content pointed toward a disturbing conclusion. The first document that had been downloaded in the October scraping session was the article "The Mystery of Misspelling" from a 1957 issue of the Elementary School Journal. The final article downloaded came from a 1950 issue of the Elementary School Journal. After presumably considering and dismissing the possibility that the system had been breached by a nostalgic fourth-grade English teacher, a JSTOR employee stated what appeared to be obvious: "They’re clearly going after substantially the entire corpus."
"The entire corpus" referred to the whole of the JSTOR database: more than 5 million articles from more than a thousand academic journals, all of which had been legally licensed and carefully digitised by the nonprofit organisation. In September, MIT told JSTOR that a guest had been responsible for the downloads and that the problem was unlikely to recur. But it had recurred, prompting questions that MIT seemed reluctant or unable to answer. Who was draining the database? And why? JSTOR officials worried that voracious overseas hackers had downloaded the files. "By doing a simplified Chinese language Google search on 'EZProxy password,' you will find numerous lists with valid authentication information for hundreds if not thousands of schools," one JSTOR employee wrote, implying that unscrupulous foreigners might be siphoning the archives.
A senior JSTOR official reacted with alarm, asserting that the "activity noted is outright theft and may merit a call with university counsel, and even the local police, to ensure not only that the activity has stopped but that—e.g. the visiting scholar who left—isn't leaving with a hard drive containing our database." Another JSTOR employee concurred: "This is an astronomical number of articles— again, real theft (and one can assume wilful malfeasance given the use of a robot, etc.). Does the university contact law enforcement? Would they be willing to do so in this instance?
In September 2010, Aaron Swartz purchased a new Acer laptop and visited the Massachusetts Institute of Technology, planning to download as many articles as possible from JSTOR. Logging on to the school's network under the alias Gary Host (G Host, or "ghost"), Swartz played patty-cake with JSTOR's and MIT's tech teams for months before finding a way to access the database without arousing attention.
His actions shouldn't have surprised anyone. If the city of Cambridge had compiled a yearbook of all its residents, Aaron Swartz would surely have been named Most Likely to Try to Download the Entire JSTOR Corpus. Swartz was an ideologue who had spent the past few years not only bulk-downloading large data sets that were inaccessible to the public, but also writing and speaking on the moral necessity of doing so. The JSTOR hack derived directly from the Guerilla Open Access playbook and the Content Liberation Front's to-do list.
In late September 2010, Swartz travelled to Budapest for the Internet at Liberty conference, where he spoke on "online free expression and enforcing ethics & accountability for corporations & governments." At the conference, Noam Scheiber of the New Republic reported, Swartz dined with some activists who, with the backing of George Soros, had tried to get JSTOR to make its archives available to the public. But the price had been prohibitive—securing all the necessary copyrights would have cost Soros hundreds of millions of dollars—and Swartz’s dinner companions decried "the outrageous sum of money it would take to free up JSTOR for public consumption." Scheiber makes clear that Swartz's companions did not propose any sort of guerrilla downloading campaign or suggest that Swartz take matters into his own hands. The conference concluded on September 22, 2010. Three days later, Swartz set up shop at MIT.
There is not necessarily any causal connection to be found here. Swartz never announced his plans for the JSTOR documents—not publicly, at least. If he confided in friends or family members, they have kept his secret. "Maybe he was downloading them because he’d figured out a way to do it and he was going to wait to see what to do next," his friend Ben Wikler would later suggest. "Maybe he did it so he didn’t have to have an Internet connection to read whatever journal he wanted." Feel free to examine the evidence and draw your own conclusions—the federal government certainly did.
By the time he started his JSTOR operation, Aaron Swartz had been living in Cambridge, Massachusetts, for more than two years. he wrote on his blog upon his departure from San Francisco in 2008. "Surrounded by Harvard and MIT and Tufts and BC and BU and on and on it's a city of thinking and of books, of quiet contemplation and peaceful concentration. And it has actual weather, with real snow and seasons and everything, not this time-stands-still sun that San Francisco insists upon."
Although Swartz was never formally enrolled in or employed by MIT, he was nevertheless a member of the broader community there. Officials recalled that Swartz had been "a member of MIT’s Free Culture Group, a regular visitor at MIT’s Student Information Processing Board (SIPB), and an active participant in the annual MIT International Puzzle Mystery Hunt Competition." Mystery Hunt is a puzzle-solving contest that is half scavenger hunt, half Mensa entrance exam. The annual event attracts participants from around the world, many of them grown adults unaffiliated with MIT. Teams spend the weekend of the hunt running around the MIT campus solving a series of difficult puzzles, occasionally sneaking into rooms and campus locations that are technically off-limits.
In Cambridge, Swartz started to treat his own life as a puzzle to be solved. He designed various lifestyle experiments to optimise his efficiency and happiness. He dabbled in creative sleep schedules. In the spring of 2009, he spent a month away from computers and the internet for the first time in his adult life. His laptop had become "a beckoning world of IMs to friends, brain-gelatinizing television shows, and an endless pile of emails to answer. It's like a constant stream of depression," he wrote. "I want to be human again. Even if that means isolating myself from the rest of you humans."
He spent June offline, an experience he later described as revelatory. "I am not happy. I used to think of myself as just an unhappy person: a misanthrope, prone to mood swings and eating binges, who spends his days moping around the house in his pajamas, too shy and sad to step outside. But that’s not how I was offline," Swartz wrote, recounting how he had come to enjoy simple human pleasures such as shaving and exercising in the absence of perpetual connectivity. "Normal days weren't painful anymore. I didn't spend them filled with worry, like before. Offline, I felt solid and composed. Online, I feel like my brain wants to run off in a million different directions, even when I try to point it forward."
In 2010, Swartz was named a fellow at the Edmond J. Safra Center for Ethics at Harvard University. Lawrence Lessig, who had also returned to Cambridge from Palo Alto, brought him aboard. Lessig was supervising a Safra Center program that examined institutional corruption and its effect on public life. The fellowship was well suited for Swartz, who had spent so much of his life fixated on institutional and personal ethics. Individual ethicality had obsessed Swartz for years, and as he aged, it became perhaps his chief concern.
"It seems impossible to be moral. Not only does everything I do cause great harm, but so does everything I don't do. Standard accounts of morality assume that it's difficult, but attainable: don't lie, don't cheat, don't steal. But it seems like living a moral life isn't even possible," Swartz declared in August 2009. The next month, he extrapolated from this line of thought:
The conclusion is inescapable: we must live our lives to promote the most overall good. And that would seem to mean helping those most in want—the world’s poorest people.
Our rule demands one do everything they can to help the poorest—not just spending one’s wealth and selling one’s possessions, but breaking the law if that will help. I have friends who, to save money, break into buildings on the MIT campus to steal food and drink and naps and showers. They use the money they save to promote the public good. It seems like these criminals, not the average workaday law-abiding citizen, should be our moral exemplars.
Most read
Mind-wandering: the rise of a new anti-mindfulness movement Computer games that heal you Laser guided strategiesThis section ignited a debate in the comments section of Swartz’s blog. Readers chided Swartz for sanctioning the theft of services from MIT. The next day, in a blog post titled "Honest Theft," Swartz defended his position: "There's the obvious argument that by taking these things without paying, they're actually passing on their costs to the rest of the MIT community." But perhaps that wasn't as bad as it seemed, since "MIT receives enormous sums from the wealthy and powerful..."
Other readers argued that the freeloaders' actions just forced MIT to spend more money on security. "I don’t see how that’s true unless the students get caught," Swartz responded. "Even if they did, MIT has a notoriously relaxed security policy, so they likely wouldn't get in too much trouble and MIT probably wouldn't do anything to up their security." Swartz had good reason to think this way. MIT was the birthplace of the hacker ethic. The university tacitly encourages the pranks and exploits of its students; stories abound of clever undergraduates breaking into classrooms, crawling through air ducts, or otherwise evading security measures for various esoteric and delightful reasons, and these antics have been catalogued in museum exhibits and coffee-table books. By officially celebrating these pranks, MIT sends the message that it is an open society, a place where students are encouraged to pursue all sorts of creative projects, even ones that break the rules.
As a Safra Center fellow, Swartz had access to JSTOR via Harvard's library. So why did he choose to deploy his crawler at MIT, a school with which he was not formally affiliated? One possible reason is that computer-aided bulk-downloading violated JSTOR's stated terms of service, and, for that reason, Swartz may have preferred to remain anonymous. MIT might even have seemed to him like the sort of place that would be unbothered by, and possibly encourage, his actions. But Swartz would soon realise that MIT's public image did not directly align with reality.
On December 26, 2010, JSTOR realised that the MIT hacker had returned. "Woot . . . mit scraper is back," read the subject line of an internal e-mail speculating that the hackers were working out of the Dorrance Building on MIT's campus. "87 GB of PDFs this time, that’s no small feat, requires Organization," one JSTOR employee wrote. "The script itself isn't very smart, but the activity is organized and on purpose."
The harvesting sessions had ceased only because Swartz had been out of town for a couple of months. In mid-October of 2010, Swartz travelled to Urbana, Illinois, to speak at Reflections | Projections, a conference hosted by computing students at the University of Illinois. Swartz's topic was 'The Social Responsibility of Computer Science', and he argued that computer programmers have an ethical responsibility to advance the public welfare. He spoke about utilitarianism, and the coder's special ability to write simple programs that could automate and speed tedious, mundane activities, and complete countless tasks in the time it would ordinarily take to complete just one. "Now, as programmers, we have sort of special abilities. We almost have a magic power," Swartz said. "But with great power comes great responsibility, and we need to think about the good that we can do with this magical ability. We need to think about, from a utilitarian perspective, what's the greatest good we can achieve in the world at small cost to ourselves?"
In early November 2010, Swartz and a friend, Ben Wikler went to Washington, DC, to volunteer for the Democratic National Committee (DNC) in the days preceding that year's midterm elections. Swartz was assigned to work under Taren Stinebrickner-Kauffman, a political activist who served as project manager for a telephone-outreach tool that helped volunteers contact voters in key states and districts. "Nobody really knew Aaron, so he sort of got plopped on to my team and, needless to say, was very helpful," Stinebrickner-Kauffman remembered.
When Swartz wasn’t working on robocalls and performing other menial tasks for the DNC, he was observing how campaign technology worked and thinking about ways to make it work better. "We were crashing at a friend’s place, talking until five in the morning afterwards, talking about how technology could do something vastly more powerful and politically impactful," Wikler recalled. The election ended, and Wikler tried to persuade Swartz to remain in the capital for a few days to attend Stinebrickner-Kauffman's birthday party. Swartz declined and returned to Cambridge to revisit his own powerful and politically impactful utilitarian project.
In November of 2010, he found a wiring and telephony closet in the basement of the Dorrance Building—also known as Building 16—jacked his laptop directly into the campus network, and resumed his downloading. Swartz had refined his tactics: the script no longer triggered any of JSTOR's download thresholds. (The revolution will, after all, be A/B tested.) The downloads weren't detected until late December.
"I am starting to feel like they [MIT] need to get a hold of this situation and right away or we need to offer to send them some help (read FBI)," an aggravated JSTOR staffer wrote on December 26. At the time, MIT's libraries had closed, and the school's librarians were on budget furlough until January and unable to do any work in the meantime. Thus Ellen Finnie Duranceau of MIT didn't receive any of JSTOR's increasingly frantic e-mails until January 3. On January 4, she noted that MIT was unlikely to be able to identify the culprit. "I wish I could say otherwise," she wrote, "because I realize that JSTOR would like more information and would like us to track the downloaded content to the source."
But by the time Duranceau sent JSTOR the disappointing news, an MIT network engineer had already traced the downloads to a network switch in the Building 16 wiring closet. When he entered the closet, the engineer immediately noticed something odd. Though MIT exclusively uses light blue internet cables, an off-white cable was plugged into the network switch, leading from the switch to an object concealed under a cardboard box. Lifting the box, the engineer discovered Swartz's laptop. He called one of his colleagues. Then he called the MIT police.
The MIT police, concluding that the investigation required specialised skills that they did not themselves possess, promptly called a Cambridge police detective named Joseph Murphy, who belonged to a regional computer-crime task force. Murphy drove to the scene, accompanied by two other task-force members: a Boston police officer named Tim Laham and a Secret Service agent named Michael Pickett. They arrived at the Dorrance Building's basement around 11:00 a.m., and soon the wiring closet was outfitted with a motion- activated camera connected to the campus security network and devices that would log the laptop's download activity and alert MIT officials if the computer was removed from the network. At 3:26 p.m. on January 4, the camera captured footage of Swartz entering the closet to check on the laptop and swap out hard drives. The police had a face. Now they needed a name.
"I’ve just had an update," Duranceau wrote to her JSTOR contact on January 5. "The investigation has moved beyond MIT and is now being handled by law enforcement, including federal law enforcement." Duranceau was referring to Agent Pickett, who would subsequently become very active in the investigation. "If you have the time, I would appreciate if you would take a look at a new development that came to our attention yesterday," Pickett wrote in an e-mail to the US Attorney’s Office in Boston on January 5, 2011, noting that the task force had discovered a laptop at MIT downloading valuable technical journals, and that the laptop matched the description of one that had been stolen from the MIT Student Center days before. (It wasn't actually the same laptop.) "I would like to get your opinion on what offenses the suspect could be charged with in this case and what evidence would best support prosecution."
Explore
Fair re-use? Lo and Behold: Reveries of the Connected World 'This house proposes that we nationalise Uber'The next morning, Assistant US Attorney Stephen Heymann, the Boston office's resident computer-crime specialist—responded to the agent with a blank e-mail and a terse subject line: "Please call. Steve." That same morning, the investigators made plans to remove Swartz's laptop from the closet, dust for fingerprints, and image its hard drive. But Swartz returned before they had a chance to do so. He entered the closet shortly after noon on January 6, and this time, he covered his face with a bicycle helmet, as if he suspected that cameras had been installed. Though MIT police captain Jay Perault was watching the video feed in real time, Swartz was too quick for him: in less than two minutes, he disconnected the laptop, retrieved it, and left the closet before officers could scramble to the scene. "It is gone, he just left—my guys are looking for him," an MIT officer wrote at 12:55 p.m.
From there, things moved quickly. A couple hours later, MIT police captain Albert Pierce spotted someone who resembled the man in the video riding his bicycle up Massachusetts Avenue toward Central Square, away from MIT. When Pierce approached, Aaron Swartz informed him that he didn't talk to strangers. Pierce showed Swartz his badge, but Swartz wasn't impressed, retorting, "MIT Police were not 'real cops.'" Then Swartz dropped his bicycle and ran.
Pierce chased after him, but could not keep pace, so he followed Swartz in his car. Jay Perault joined the chase, accompanied by Michael Pickett. Swartz was two blocks from his apartment when his pursuers ran him down on Lee Street. They surrounded Swartz in a parking lot. They chased him through the cars. They caught him, handcuffed him, and brought him to the station.
Swartz's lawyer later bailed him out of custody. "I think I saw him on the day he was arrested," Ben Wikler remembered. "He was just totally, totally freaked-out. White as a ghost." Earlier that morning, Swartz had posted on Twitter a quote by the philosopher Willard Van Orman Quine: "'Ouch' is a one-word sentence which a man may volunteer from time to time by way of laconic comment on the passing show."