This year, two designs – one proposed and one built – for elevated cycletracks, which create bicycle highways above street level, have gained considerable media attention. They highlight questions at the heart of urban design: Should cities blend or separate transportation options? How can cities best mitigate the hazards created when cars, bikes, mass transit, […] [...]
The Louisiana Museum of Modern Art in Denmark is considered a milestone of modern Danish design, noted for its synthesis of art, architecture, and landscape. Now, a new exhibition from artist Olafur Eliasson seeks to blur these boundaries even further with Riverbed, an exhibition that transforms “the entire South Wing into a rocky landscape.” Riverbed […] [...]
An Award-Winning Landscape Embraces Bay Views – Houzz, August 2014 “Landscape architect Scott Lewis repeats the sentiments of many architects and designers talking about their projects when he says that his favorite part of this project was witnessing its transformation. ‘I know what it looked like before,’ he says.” Placemaking Done Right: Three Successful Approaches – Planetizen, […] [...]
On Aug. 5, the Federal Communications Commission announced the bulk release of the comments from its largest-ever public comment collection. We’ve spent the last three weeks cleaning and preparing the data and leveraging our experience in machine learning and natural language processing to try and make sense of the hundreds-of-thousands of comments in the docket. Here is a high-level overview, as well as our cleaned version of the full corpus which is available for download in the hopes of making further research easier.
Our first exploration uses natural language processing techniques to identify topical keywords within comments and use those keywords to group comments together. We analyzed a corpus of 800,959 comments. Some key findings:
We estimate that less than 1 percent of comments were clearly opposed to net neutrality.
At least 60 percent of comments submitted were form letters written by organized campaigns (484,692 comments); while these make up the majority of comments, this is actually a lower percentage than is common for high-volume regulatory dockets.
At least 200 comments came from law firms, on behalf of themselves or their clients.
Below is an interactive visualization that lets you explore these groupings and view individual comments within the groups.
In-depth exploration of the topical keywords revealed several prominent recurring themes, both in form letter and non-form letter submissions (see below for a more detailed exploration of form letter submissions). Among the most common:
Around two-thirds of commenters objected to the idea of paid priority for Internet traffic, or division of Internet traffic into separate speed tiers. This topic was discussed in many independent comments, as well as form letter campaigns organized by the Nation, Battle for the Net, CREDO Action, Daily Kos and Free Press. Common keywords in this group included “slow/fast lane,” “pay to play,” “wealthy,” “divide” and “Netflix.”
About the same number of comments, including submissions from form letter campaigns organized by the Nation, Badass Digest, CREDO Action, Daily Kos and Free Press, asked the FCC to reclassify ISPs as common carriers under the 1934 Communications Act. Common keywords in these comments included “common carrier,” “(re)classify,” “authority” and “Title II” (a part of the act that might grant the FCC this authority). A smaller portion of commenters advocated a regulatory strategy with a similar effect but a different legal basis, relying on section 706 of the 1996 Telecommunications Act.
The subject of Internet access as an essential freedom comprised more than half of comments included in form letters from the Nation, Battle for the Net, CREDO Action and Daily Kos. Common topic words included “important,” “vitally,” “economy,” “essential,” “resource” and “cornerstone.”
Almost half of comments, including form letters from Electronic Frontier Foundation, the Nation, Battle for the Net, Daily Kos and Free Press, discussed the economic impact, or the impact on small businesses and innovation, of the end of net neutrality. Typical terms in these comments included “work,” “competition,” “startup,” “kill,” “barrier” and “entry.”
Around 40 percent of comments, including campaign letters from EFF, Battle for the Net and Daily Kos, discussed the importance of consumer choice, or the impact of regulations on consumer fees. Topic words included “access,” “choice,” “entertainment,” “fee,” “content,” “extort” and “extract.”
About one-third of comments, including those in Battle for the Net’s campaign, discussed the importance of competition among ISPs. Frequent terms included “monopoly” and “competition,” “Comcast,” “Verizon” and “Warner.”
Several form letters either from the Daily Kos or of unknown provenance (combined with non-form letters) advocated treating broadband providers like a public utility. About 15 percent of comments discussed this topic.
A small number of comments (around 5 percent, including letters from Stop Net Neutrality and a Tea Partier blog) had anti-regulation messages. Interestingly, some of these comments seemed to emphasize freedom for consumers while others advocated freedom for ISPs, two positions seemingly at odds with one another.
Additionally, a couple of topics came up in significant enough numbers of comments to be noteworthy despite not occurring in any of the form letter campaigns. These included comments calling for the resignation of FCC Chairman Tom Wheeler or other FCC commissioners or staff (about 2500 comments), and people either mentioning John Oliver by name or using the words “dingo” or “f*ckery,” again typically directed at Tom Wheeler, comprising about 1500 comments, and likely motivated by usage of these terms in Oliver’s net neutrality segment.
Wait, where are the 1.1 million comments?
The comments were originally released by the FCC as six continuous XML files, with two caveats:
First, mailed comments postmarked prior to July 18 are still being scanned and entered into the ECFS and may not be reflected in the files. We will post an updated XML file when they are completed, so stay tuned.
We haven’t received word of any updates since the original release.
Second, certain handwritten comments may not be searchable. For this reason, source links to these comments are included in the files.
More than 500 comments had text fields which were blank. Our guess is that these may correspond to handwritten comments.
The XML files contained 446,719 records. Many of these contained a single comment each, but some contained multitudes. We wrote custom processing scripts to break up the multiple-comment records, revealing the total count of 801,781 comments. Of these, some were discarded as unparseable or too long (both Les Misérables and War and Peace were submitted as comments), leaving the final count at 800,959 comments.
Detecting expert submissions
After speaking with policy experts from the Open Technology Institute and Public Knowledge, we learned some interesting details about comment submission. While most public comments were submitted using a simplified form or via email, experienced submitters made use of a more complex form. Comments submitted by these “experts” were marked in the data, giving us an easy way to isolate them.
Once isolated, they provided the basis for training a piece of artificial intelligence software called a text classifier. We trained the classifier to detect expert language based on examples from submissions that we knew were from experts. It was then able to read comments submitted through the simple form or via email and tell us whether or not each was likely to have been written by an expert. The classifier found approximately 6,700 such comments. Approximately 3,900 of these were form letters with this basic structure:
To Chairman Tom Wheeler and the FCC Commissioners To the FCC Please build any net neutrality argument upon solid legal standing. Specifically, this means reclassifying broadband under Title II of the Telecommunications Act of 1934. 706 authority from the Telecommunications Act has been repeatedly struck down in court after legal challenges by telecom companies. Take the appropriate steps to prevent this from happening again. Sincerely, *XXXX*
While this was almost certainly penned by an expert, we’re considering it a non-expert submission, because it seems to have been part of a broader organized campaign. Of the remaining 2,846 comments, 567 of them contain at least 200 words, which we feel is an appropriate heuristic to apply to expert submissions. In summary, our back-of-the-envelope estimate of the number of expert submissions is 600, or 0.08 percent of the 800,959 comments analyzed.
We searched within the topical groupings that powered the visualization above to find groups of comments with very low amounts of text variation from one comment to another, yielding a similar result (though using different technology better suited to the extreme size of this docket) to the form letter detection visualizations employed in our Docket Wrench tool. After manual review of these groups, we estimate that at least 20 separate form letter writing campaigns drove submissions to this docket, ranging in size from a few hundred comments to more than 100,000 and together comprising almost 500,000 comments, or about 60 percent of the corpus that we examined. We made a cursory attempt at trying to find the organizations that orchestrated each form letter writing campaign. In the interactive visualization below, we’ve shown each group, along with its sponsoring organization if we were able to find it. The visualization is color-coded by whether each group appears to support or oppose net neutrality (the lone opposing group is difficult to see, but is shown in red near the center):
While form letters do appear to make up the majority of the comments, it’s actually surprising how many of the submitted comments seemed not to have been driven by form letter writing campaigns. In previous analyses of high-volume dockets, we’ve found that it’s not unusual for form letter contributions to make up in excess of 90 percent of a docket’s total submissions, with the percentage of comments coming from form letter campaigns being well-correlated with the total number of comments received. The two largest dockets in Docket Wrench, the Department of State Keystone XL rulemaking and the Internal Revenue Service docket on political activity undertaken by social welfare organizations, both from earlier this year, are each dominated by form letter comments, with more than 75 percent of the comments in each having been classified as form letter submissions by our detection systems.
It’s difficult to know why, exactly, more members of the public apparently wrote letters themselves in this rulemaking than is typical for large dockets. It could be an indicator of a genuinely higher level of personal investment and interest in this issue, or perhaps this docket drew organizers who employed different “get out the comment” techniques than we have seen in the past.
Even within the form letters, we see evidence of various kinds of innovation in terms of the way form letter campaigns have been run. EFF’s campaign gives submitters several opportunities to choose from a menu of options at various points within the text, for example. More intriguingly, several groups of comments that we were unable to attribute show subtle textual variations that don’t seem to alter the meaning of the text in the way that EFF’s do. These groups appear to all be about the same size, leading us to believe that a single overall population of users might have been solicited to submit comments and was then automatically uniformly segmented in some fashion. This could have been to test which versions of the comment text got the most users to submit (along the lines of the A/B testing commonly used in software development). It could also perhaps be an effort to foil exactly the kind of automated grouping tools we (and some federal agencies) might employ to make large volumes of comments like this one easier to review.
Finally, while comments submitted as part of form letter campaigns are similar to one another, it’s important to note that they’re not identical. Many submitters take the opportunity to personalize their comment beyond what was supplied by the campaign’s template language. How exactly they vary is an interesting question, and worth pursuing.
If you’re interested in doing your own analysis with this data, you can download our cleaned-up versions below. We’ve taken the six XML files released by the FCC and split them out into individual files in JSON format, one per comment, then compressed them into archives, one for each of XML file. Additionally, we’ve taken several individual records from the FCC data that represented multiple submissions grouped together, and split them out into individual files (these JSON files will have hyphens in their filenames, where the value before the hyphen represents the original record ID). This includes email messages to email@example.com, which had been aggregated into bulk submissions, as well as mass submissions from CREDO Mobile, Sen. Bernie Sanders’ office and others. We would be happy to answer any questions you may have about how these files were generated, or how to use them.
We’ve only just scratched the surface of what could be learned from such a rich dataset. Here are some other promising avenues of investigation that have occurred to us. If you pursue them, please let us know! Bonus points for research that comes complete with links to open source code and data.
How do commenters augment the template responses provided by form letter campaigns? What do they add, delete or modify? What consistently stays intact?
Do models of non-form submissions surface topics that we haven’t found? What about models of expert submissions?
How are individual words related to one another? Eg, what modifiers are used for terms like “ISP,” “Wheeler,” “Internet,” etc.
Looking at email addresses, which domains are most popular?
How often are key political figures or elements of government mentioned?
Which other services or utilities is broadband Internet compared with, and how often?
How do commenters break out by gender? (This is more difficult than it seems, even if you’re using the way fun Genderize API. Often the commenter’s real name can only be found in the body of the comment itself, not in the “applicant” field)
To help get you started, we’ve released all of the code we used to do our analysis in a GitHub repository, and it depends on entirely on open-source tools.
We’d like to thank Michael Weinberg and his colleagues at Public Knowledge, and Sarah Morris of the New America Foundation’s Open Technology Institute for their invaluable advice in better understanding this data. We’d also like to thank Radim Řehůřek, maintainer of the gensim library, which was crucial to our text analysis.
Keep reading for today’s look at #OpenGov news, events, and analysis, including hesitancy at the IRS, open data in Connecticut, and Tom Steyer spending in California.
Before the Congressional recess CIA Director John Brennan was feeling pressure to resign over revelations that the agency spied on Senate staffers. The momentum has slowed over the break, but key members of Congress have reiterated their calls for him to step down. (The Hill)
In honor of the return of the NFL this week, a profile of the Sports Fans Coalition. The group focuses its lobbying on proposals that would limit the power of major sports interests and improve the lot of fans. (Ars Technica)
Remember the news that the DEA paid an Amtrak secretary over $850,000 for information that it could have obtained for free? The Department of Justice does and has launched a probe into the situation. (Washington Post)
In the face of seemingly unending pressure from Congress and a growing workload, it appears that the IRS may be overwhelmed and a bit gun-shy when it comes to regulating tax-exempt “nonprofits” that are often used for political purposes. (Government Executive)
Australia’s Department of Immigration and Border Protection is committed to using free and open source technology. A spokesperson explained some of the reasons why in this post. (Future Gov)
TheyWorkForYou, mySociety’s parliamentary monitoring service allows users to set up topic alerts, making it easier to track specific issues as they are discussed in parliament. (mySociety)
State and Local News
Hartford, Connecticut launched a data portal to provide easier public access to crime, finance, housing, and public health data from across the city government. It is the first of its kind in Connecticut, according to city officials. (Hartford Courant)
Tom Steyer, the billionaire that has spent big to make the environment an issue in races around the country is reportedly turning his attention to the California State Senate, where he is preparing to target Democrats who might not agree with him. (Washington Post)
Events This WeekDo you want to track transparency news? You can follow the progress of relevant bills, court cases, and regulations using Scout. You can also get Today in #OpenGov sent directly to your preferred news reader. If you would like suggest an event, please email firstname.lastname@example.org by 7 am on the Monday prior to the event.
The FCC quickly acted on our petition, and issued a notice of proposed rulemaking just one week after receiving our petition. For those familiar with the FCC, the speed at which the agency moved on this was shocking.
And even better? They called for comments on whether to bring radio ad files online, too. Ours are here.
We’re still going through the comments, but opponents of disclosure generally argue that putting these files online will (1) be hard and (2) reveal company secrets. No matter how true (1) may be, it’s vastly less impressive than how difficult it is for the public to access these files. To get them in their current, paper version, requires showing up to the broadcast center (do you know where your satellite provider’s headquarters is?) during regular business hours, possibly paying a fee, and going through the files one by one, station by station. They’re in the best position to make this change, and they’re the only ones with the information. And, let’s be honest, who keeps paper records anymore, anyway? Number two is a bit of a slicker argument, saying that online publishing reveals too much about what the station selling ad time charges for ad time. This might make sense, if the files weren’t public anyway. In other words, if anyone is running from station to station, it isn’t the researcher or journalist or citizen — it’s the companies who make billions every year from ad spending.
Ultimately, the proof will be in the pudding. Rulemaking takes time — a lot more than a week — but the swift response is a welcome amount of attention to a critical issue. If the FCC does decide to require cable, satellite and radio companies to put their files online, we may have an unprecedented view into political spending across all major media platforms (the biggest by revenue is broadcast television) by the 2016 elections, which we’ll now call the oncoming storm.
Deleted photo via Politwoops. In this week’s roundup of deleted tweets from politicians archived by Politwoops, we examine a number of recent examples of messaging changes that came in the form of image deletions and replacements. We star… [...]
Calculating the social cost of carbon: In a report released August 25, the General Accountability Office (GAO) concluded that the White House’s analysis of the politically charged “social cost of carbon” (reported here by Sunlight) estimate passes muster. The review was requested by Sen. David Vitter, R-La., and Reps. Duncan Hunter, R-Calif. and John Culberson, R-Texas, all harsh critics of President Barack Obama’s efforts to combat climate change. Vitter and Culberson count the oil and gas industry among their major donors: Vitter has gotten more than $1 million; and Culberson, nearly $600,000. Hunter’s top contributing industries include manufacturers, who have sent his campaigns more than $85,000. The 32-page report documents the by-the-book process used by the White House coordinating with other agencies to calculate the estimate, while also disclosing “several limitations of the estimates and areas that the working group identified as being in need of additional research.” (Credit: Scout, Influence Explorer.)
Restricting common narcotic pain killers: Vicodin and other common hydrocodone combination products prescribed widely to patients for pain will now be considered ”class II” drugs subject to stricter regulation, according to a new final rule published by the Drug Enforcement Agency (DEA) on August 22. This action follow’s the Food and Drug Administration’s (FDA) recommendation last year that the drugs be reclassified because of increasing concern “about the abuse and misuse of opioid products, which have sadly reached epidemic proportions in certain parts of the United States.” While most of the commenters on the earlier proposed rule appeared to support the decision, not everyone was happy. For example, drug maker Actavis in April wrote the DEA, “if finalized, this action would impose significant and burdensome new regulatory controls and administrative, civil, and criminal sanctions on companies that handle….HCPs.” These comments were prepared by one of the lobbying firms hired by the company, Arent Fox, whose lobbyists include former Rep. Phil English, R-Pa. Actavis has been in the news lately as a poster company for the controversial tax avoidance strategy of “tax inversion,” where a company locates base operations overseas to avoid paying full U.S. tax rates. (Credit: Docket Wrench, Influence Explorer.)
Comments still rolling in on greenhouse gases, cigars: Last week the Environmental Protection Agency received more than 3,000 more comments on a proposal to regulate greenhouse gases emitted by existing power plants, described in our report here. Overall, Regulations.gov reports nearly 35,000 comments received since the proposal was issued in early July. Another top magnet for comments, garnering more than 4,700, was the FDA’s proposal to regulate cigars, as we reported here. (Credit: Docket Wrench.)
Disclaimer: The opinions expressed by the guest blogger and those providing comments are theirs alone and do not reflect the opinions of the Sunlight Foundation or any employee thereof. Sunlight Foundation is not responsible for the accuracy of any of… [...]
Keep reading for today’s look at #OpenGov news, events, and analysis, including a military grade lobbying effort, teaching OAS members about transparency, and balancing bike share.
An Executive Order, 12333, issued by President Reagan is one of the foundational documents of the current surveillance system. Some say that it allows a wide ranging and unconstitutional amount of data collection. (Ars Technica)
Last year stories emerged about an Iowa State Senator that was paid by Ron Paul’s Presidential campaign to switch his endorsement from Michelle Bachmann to Paul. Kent Sorenson denied the allegations at the time, both in public and in court. But this week, he pleaded guilty to accepting money from both campaigns and eventually switching sides when Paul offered him a better deal. (Washington Post)
The tragic events in Ferguson, Missouri publicized the military grade equipment being used in many police departments across the country. Now, police associations are gearing up for a major lobbying fight to save their access to military surplus like grenade launchers, automatic weapons, and heavily armored vehicles. (The Hill)
The Organization of American States is launching a virtual class to teach more than 200 officials from the region “strategies for open government in the Americas”. (NFOIC)
State and Local News
Email retention practices in Pennsylvania have advocates and archivists worried that seemingly innocuous, but potentially historically relevant emails may be deleted without a second thought. Currently, employees are encouraged to clean up their emails on a regular basis and archives are only kept for 5 days, putting decisions about the future value of these records in the hands of those that originally created them rather than an impartial professional. (Government Technology)
Bike share systems are becoming popular in cities around the world. Making sure that users can access and park bikes when they want to is creating some interesting problems for mathematicians, who are working up algorithms to ensure that systems are properly balanced, ensuring customer satisfaction and saving operators time and money. (Government Executive)
Do you want to track transparency news? You can follow the progress of relevant bills, court cases, and regulations using Scout. You can also get Today in #OpenGov sent directly to your preferred news reader. If you would like suggest an event, please email email@example.com by 7 am on the Monday prior to the event.