Views
THOMAS bulk data access
From OpenCongress Wiki
Contents |
Introduction
This wiki gathers information concerning public bulk access to information stored on THOMAS, a comprehensive Internet-accessible database that makes federal legislative information available to the public at no cost. THOMAS is operated by the Library of Congress and was launched in January of 1995 at the inception of the 104th Congress.
Quick Facts
- At least twice as many people access congressional legislative information through third party sources than directly through the THOMAS website. Major third party sources include GovTrack.us, OpenCongress.org, and Sunlight's Congress app for Android.
- Providing “bulk access to data” means releasing an entire database for use by others.
- GPO currently publishes 6 datasets in bulk (including the Federal Register); Data.gov (launched March 2010) has 400,000 datasets; New Jersey and New Hampshire publish legislative data in bulk.
- A coalition of organizations issues the major Open House Report calling on Congress to "embrace structured data by publishing the status of legislation and other information to the Web not only as it is now, but also in structured data formats." (May 2007) (http://bit.ly/HkPycb)
- The Explanatory Statement accompanying the Committee Print of the House Committee on Appropriations for Public Law 111-9 (March 2009) articulates Congress' support for bulk access to legislative information. (http://1.usa.gov/I2UvJG p. 1770)
- In 2008, the Library of Congress says it expected to report on the resources necessary to supply the public with raw legislative data within the first part of the calendar year. It established a bulk data task force that has never completed its deliberations. (http://bit.ly/A4c5le)
- Rep. Bill Foster introduced HR 6289 (in the 111th Congress) that would require some legislative data to be made available in bulk and create a THOMAS advisory committee. (Sep. 2010) (http://1.usa.gov/HZthAp)
- Congressional Facebook Hackathon endorses bulk access to legislative data as an action item: "Release Structured Machine-Readable Legislative Data: Providing legislative data in a bulk format to enable third-party developers to create more dynamic interfaces for legislative information." (November 2011) (http://1.usa.gov/ygzQpl)
- 30 organizations and companies call for bulk access to legislative data and the creation of an advisory committee. (April 6, 2012)
Blog Posts
- "Rep. Honda Speaks on Bulk Access on the House Floor" by Daniel Schuman (6/8/2012)
- "Major Transparency Milestone in Bulk Access Statement" by Daniel Schuman (6/6/2012)
- "Issa amendment denied, but leadership supports bulk access" by Matt Rumsey (6/6/2012)
- "Issa Offers #FreeTHOMAS Amendment to Leg Approps Bill" by Daniel Schuman (6/5/2012)
- "Media Spotlight on Congress Stalling Open Access to Legislation" by Nicko Margolies (6/5/2012)
- "Bulk Access Language Tweaked by Approps" by Daniel Schuman (6/5/2012)
- "#FreeTHOMAS" by Daniel Schuman (6/4/2012)
- "Bulk Access Developments after the H. Approps Hearing" by Daniel Schuman (6/1/2012)
- "THOMAS Talking Points" by Daniel Schuman (5/30/2012)
- "Appropriators May Undercut Legislative Transparency" by Daniel Schuman and Eric Mill (5/30/2012)
- "Full Committee Markup on Leg Approps Set for Thursday" by Daniel Schuman (5/24/2012)
- "Will the House's Leg Spending Bill Match Its Transparency Priorities?" by Daniel Schuman (5/24/2012)
- "Two Steps Forward on Improving Public Access to Legislative Information" by Daniel Schuman (5/18/2012)
- "Appropriators Should Consider Public Access to Leg Info at Friday Mark-up" by Daniel Schuman (5/17/2012)
- "News Without Transparency: House Passes Bridge BIll After an Earmark Debate" by Matt Rumsey and Melanie Buck (5/10/2012)
- "Improve Public Access to Legislative Information" by Daniel Schuman (4/10/2012)
- "Help improve public access to Congressional/legislative information #FDLP" by James Jacobs (3/28/2012)
- "GovTrack Users Want Better Transparency From Congress" by Josh Tauburer (3/16/2012)
- "Tell Congress to Open Up" by Nicole Aro (3/12/2012)
- "Government Transparency “To Do” Your Government Transparency 'To-Do'" by Jim Harper (3/12/2012)
- "Partners in Data Transparency: Parliaments and Non-Profits" by Daniel Schuman (3/1/2012)
- "Put THOMAS on the Fast Track" by Daniel Schuman (2/9/2012)
- "Benchmarks for Measuring Success for Legislative Data Transparency" by Daniel Schuman (2/2/2012)
- "Bulk Data at the House Legislative Data Conference" by John Wonderlich (2/2/2012)
- "Liberate OpenGovData Now" by David Moore (2/1/2012)
- "In #HackWeTrust - The House of Representatives Opens Its Doors to Transparency Through Technology" by Daniel Schuman (12/8/2011)
- "House Holding Wonk-a-thon on Public Access to Congressional Info This Thursday" by Daniel Schuman (12/5/2011)
- "Sunlight Testimony: Bulk Access to THOMAS and Access to CRS Reports" by Daniel Schuman (12/5/2011)
- "Read the Bill 2.0" by Daniel Schuman (11/14/2010)
- "Rep. Foster Introduces Bill To Improve THOMAS" by Daniel Schuman (9/30/2010)
- "Apps for THOMAS: 3 Wishes" by Daniel Schuman (7/29/2010)
- "Birds of a Feather: What's in the DISCLOSE Bills" by Daniel Schuman (5/3/2010)
- "Tip of the Hat to THOMAS" by Daniel Schuman (1/6/2010)
- "House Leg Branch Appropriations Review" by John Wonderlich (6/27/2009)
- "Legislative Databases recommendation makes it to House Leg Branch Appropriations markup" by Josh Tauburer (4/14/2008)
- "Congressman Honda on the Open House cause" by Josh Tauburer (2/1/2008)
- Discussion on the Open House Project email list (link) (11/14/2007)
- "Mash-ups for government transparency" by Josh Tauburer (1/25/2007)
- "Finding Bills Online" by Paul Blumenthal (1/9/2007)
Policy Documents and Gov't Resources
Government Resources
- "House Leaders Back Bulk Access to Legislative Information" Speaker Boehner Press Office (6/5/2012)
- Amendment Offered to H.R. 5882, by Rep. Issa (R-CA) (6/5/2012)
- Appropriations Committee Report 112-511 on Legislative Branch Appropriations Bill, 2013 to accompany H.R. 5882 (6/1/2012)
- "House Committee on Appropriations, Omnibus Act, 2009, Committee Print of the House Committee on Appropriations H.R. 1105 / Public Law 111-8." See Book G, explanatory statement on Congressional Research Service Salaries and Expenses, the paragraph starting with the phrase "Public Access to Legislative Data" (or page 10 of this PDF) (March 2009). Key language:
Public Access to Legislative Data.--There is support for enhancing public access to legislative documents, bill status, summary information, and other legislative data through more direct methods such as bulk data downloads and other means of no-charge digital access to legislative databases. The Library of Congress, Congressional Research Service, and Government Printing Office and the appropriate entities of the House of Representatives are directed to prepare a report on the feasibility of providing advanced search capabilities. This report is to be provided to the Committees on Appropriations of the House and Senate within 120 days of the release of Legislative Information System 2.0.
- Congressional Facebook Hackathon endorses bulk access to legislative data as an action item in this report
- "Annual Report of the Congressional Research Service of the Library of Congress for Fiscal Year 2009" (January 2010). See page 20.
- "Remarks from the Public Printer of the United States" (October 19, 2009)
Civil Society Organization Resources
- 30 Organizations Send Letters to Appropriators and Rulemakers regarding bulk access to THOMAS (April 10, 2012)
- Comments Submitted for the Record by Joshua Tauburer for House Committee on Appropriations Subcommittee on the Legislative Branch regarding bulk data for legislative information (Febuary 6, 2012)
- Comments Submitted for the Record by the Sunlight Foundation for the House Committee on Appropriations Subcommittee on the Legislative Branch Hearing (February 6, 2012)
- Comments Submitted for the Record by the Sunlight Foundation for the House Committee on Appropriations Subcommittee on the Legislative Branch Hearing Regarding Bulk Access to THOMAS data (May 11, 2011)
- Open House Project Report: "Congressional Information & the Internet: A Collaborative Examination of the House of Representatives and Internet Technology" Chapter 3: Legislation Database (May 8, 2007)
News Stories
- "In Support of Legislative Transparency" Google Public Policy Blog (6/15/2012)
- "Federal News Minute" WNEW-FM - Washington, D.C. (6/8/2012)
- "Congressional data may soon be easier to use online" Washington Post (6/8/2012)
- "Rep. Crenshaw backs down, loses control over bulk data issue" GovTrack.us (6/7/2012)
- "A week ago, we wrote about Congress..." Skimmer Hat (6/6/2012)
- "Free THOMAS!" Fierce Government (6/5/2012)
- "House Appropriations trims legislative agencies budget request" Fierce Government (6/5/2012)
- "Report May Hinder Goal of Open Congress" Roll Call (6/5/2012)
- "Of, By and For: A Short Legislative History of THOMAS, the Spirit of the Law, and Elle Woods" Lulu in the Library (6/5/2012)
- "Rep. Crenshaw thinks American public can’t be trusted with overseeing Congress" GovTrack.us (6/4/2012)
- "Can we stop talking about accountability for a minute? Please?" Legal Information Institute - Cornell University Law School (6/2/2012)
- "House Appropriators May Limit Public Availability of Pending Bills" Slashdot (6/1/2012)
- "For Transparency Advocates, the Honeymoon with House Republicans May Be Over" Tech President (6/1/2012)
- "Hill may freeze THOMAS in digital past" Washington Examiner (5/31/2012)
- "Transparency group decries legislative data bulk download prohibition" Fierce Government IT (5/31/2012)
- "Congress Refuses to #FreeTHOMAS Open Congress" Open Congress (5/17/2012)
- "Open government advocates seek greater access to congressional data" Federal News Radio.Com (4/16/2012)
- "GovTrack users want better transparency from Congress" GovTrack.us (4/16/2012)
- "US Agency Takes 'Private' Approach to Streamlining IT Procurement" E-Commerce Times (4/14/2012)
- "Your Government Transparency 'To Do'" Washington Watch (4/12/2012)
- "Transparency Groups Call for THOMAS bulk downloads" Fierce Government IT (4/11/2012)
- "Transparency Groups Say THOMAS website is outdated" Federal Computer Week (4/10/2012)
- "An API for Federal Legislation? Congress Wants Your Opinion" Threat Level (3/5/2009)
- "Congressional Data Mining: Coming Soon?" Mother Jones (3/5/2009)
- "Bulk Data Downloads: A Breakthrough in Government Transparency O'Reilly Radar (3/4/2009)
- "Lawmakers favor outside access to legislative data Government Executive (1/23/2008)
Additional Resources
- "Government: Do you really need an API" by Eric Mill (3/21/2012)
- Sites that use GovTrack Data (list)
- THOMAS RSS feeds (link)
- How often is THOMAS updated (link)
- Josh Tauburer on Civic Technology (link)
- House of Representatives Adopts Standards for Electronic posting of House and committee documents and data (committee resolution as PDF) (document naming conventions as PDF)
- House of Represnetatives launches transparency portal docs.house.gov
- Library of Congress letter to Committee on House Administration on THOMAS (4/31/2008)
The History of THOMAS Generally
- "Congress on the Internet: New Web Server Organizes Online Information" Library of Congress Information Bulletin (1/25/1995) - Announces the creation of THOMAS and includes introductory remarks at Jan. 5 launch event by then-Speaker Gingrich
- Access to Government Information on the Internet Interpersonal Computing and Technology Journal (10/1993) - Discusses the precursor to THOMAS, the Library of Congress Information System (LOCIS)
- "The Hill on the Net: Congress Enters the Information Age," by Chris Casey (1996) - Has history of creation of THOMAS.
States that provide bulk access to legislative data
- New Hampshire
- New Jersey
- The Sunlight Foundation scrapes and provides bulk access to [50 of 50 state legislative data]
Congress's Bulk Data Task Force Questions
Page 18 of the Leg Approps Report (Hrpt 511):
http://appropriations.house.gov/uploadedfiles/crpt-112hrpt511.pdf
"The GPO and Congress are moving toward the use of XML as the data standard for legislative information. The House and Senate are creating bills in XML format and are moving toward creating other congressional documents in XML for input to the GPO. At this point, however, the challenge of authenticating downloads of bulk data legislative data files in XML remains unresolved, and there continues to be a range of associated questions and issues: Which Legislative Branch agency would be the provider of bulk data downloads of legislative information in XML, and how would this service be authorized. How would ‘‘House’’ information be differentiated from ‘‘Senate’’ information for the purposes of bulk data downloads in XML? What would be the impact of bulk downloads of legislative data in XML on the timeliness and authoritativeness of congressional information? What would be the estimated timeline for the development of a system of authentication for bulk data downloads of legislative information in XML? What are the projected budgetary impacts of system development and implementation, including potential costs for support that may be required by third party users of legislative bulk data sets in XML, as well as any indirect costs, such as potential requirements for Congress to confirm or invalidate third party analyses of legislative data based on bulk downloads in XML? Are there other data models or alternative that can enhance congressional openness and transparency without relying on bulk data downloads in XML? The Committee directs the establishment of a task force composed of staff representatives of the Library of Congress, the Congressional Research Service, the Clerk of the House, the Government Printing Office, and such other congressional offices as may be necessary, to examine these and any additional issues it considers relevant and to report back to the Committee on Appropriations of the House and Senate. "
A coalition of organizations, including the Sunlight Foundation, drafted answers to those questions, which are available here.
Ideas for Upgrading THOMAS
Top Suggestions
- Bulk Access to THOMAS data
- Incorporate open data principles
Meta Suggestions
- Have regular roundtable discussions with members of public and government to discuss ideas for improving THOMAS
- Create THOMAS users group (email discussion?)
- Programmer access page: for XML access, RSS feeds, email sign ups, etc.
- Work to improve parsability of all search results; more structured data
- All bills in XML
- Singe page (no pagination) that lists every bill in Congress with status; updated daily on a new page (for scraping); preferably in a feed or XML format
- Create and make public unique IDs for commonly used entities (or draw upon those created by others)
- List of all Committees and Subcommittees Members
- Incorporate Senate Amendments (See S Res. 562)
- Consider redesign of site (look at LIS, GovTrak, OpenCongress for ideas + public)
- Provide more detailed history of how THOMAS came to be
Specific Suggestions
- Make Public Laws Searchable by law number and by name
- Allow for bill alerts system (email) for bills and topics
- Add short name of bill to weekly top 5 (plus link to archives)
- Allow highlighting of "hot" bills -- where there's some kind of legislative action
- Word/Phrase vs. Bill Number
- have search box handle both;
- allow search of entire bill text
- make selection of phrase vs number sticky
- Improve "related bills" -- run comparison of bill summaries/ text -- both in this Congress and over past Congresses
- Make easier to trace bills through, especially when there is a substitute
- e.g., HR 3200 became HR 3590
- Is legislation searchable by CRS tags? (Make available list of tags). Add tags to each bill, so can search for related bills.
- Organize front page of THOMAS around what's going on today in congress; with info on yesterday and upcoming
- Permalink: "save" on share/save tab is confusing; perhaps make its own link
- Daily Digest -- when send email, include contents of daily digest, not just link
- Increase size of search fields
- 3 organizing links:
- what's going on today -- running info from floor embedded into THOMAS
- what happened yesterday
- what's upcoming this week
- order plain language search for bills by topic + frequency and tags
- Is search boolean?
- want to be able to eliminate terms from search (the "not" function, e.g. Israel not steve)
- When in search result, there's a calendar, link to it automatically
Fun Suggestions
- Create twitter account to tweet whenever a bill is introduced (see OLRC) or goes to committee, enacted, etc.; tweet top five viewed bills
- Mobile version
THOMAS bulk data access - OpenCongress Wiki
