THOMAS bulk data access

From OpenCongress Wiki

Revision as of 13:40, August 4, 2012 by Johnwonderlich (Talk | contribs)
Jump to: navigation, search


This page is part of the Transparency Hub project.
Add what you know.

Contents

Introduction

This wiki gathers information concerning public bulk access to information stored on THOMAS, a comprehensive Internet-accessible database that makes federal legislative information available to the public at no cost. THOMAS is operated by the Library of Congress and was launched in January of 1995 at the inception of the 104th Congress.

Quick Facts

  • At least twice as many people access congressional legislative information through third party sources than directly through the THOMAS website. Major third party sources include GovTrack.us, OpenCongress.org, and Sunlight's Congress app for Android.
  • Providing “bulk access to data” means releasing an entire database for use by others.
  • GPO currently publishes 6 datasets in bulk (including the Federal Register); Data.gov (launched March 2010) has 400,000 datasets; New Jersey and New Hampshire publish legislative data in bulk.
  • A coalition of organizations issues the major Open House Report calling on Congress to "embrace structured data by publishing the status of legislation and other information to the Web not only as it is now, but also in structured data formats." (May 2007) (http://bit.ly/HkPycb)
  • The Explanatory Statement accompanying the Committee Print of the House Committee on Appropriations for Public Law 111-9 (March 2009) articulates Congress' support for bulk access to legislative information. (http://1.usa.gov/I2UvJG p. 1770)
  • In 2008, the Library of Congress says it expected to report on the resources necessary to supply the public with raw legislative data within the first part of the calendar year. It established a bulk data task force that has never completed its deliberations. (http://bit.ly/A4c5le)
  • Rep. Bill Foster introduced HR 6289 (in the 111th Congress) that would require some legislative data to be made available in bulk and create a THOMAS advisory committee. (Sep. 2010) (http://1.usa.gov/HZthAp)
  • Congressional Facebook Hackathon endorses bulk access to legislative data as an action item: "Release Structured Machine-Readable Legislative Data: Providing legislative data in a bulk format to enable third-party developers to create more dynamic interfaces for legislative information." (November 2011) (http://1.usa.gov/ygzQpl)
  • 30 organizations and companies call for bulk access to legislative data and the creation of an advisory committee. (April 6, 2012)


Blog Posts

Policy Documents and Gov't Resources

Government Resources

Public Access to Legislative Data.--There is support for enhancing public access to legislative documents, bill status, summary information, and other legislative data through more direct methods such as bulk data downloads and other means of no-charge digital access to legislative databases. The Library of Congress, Congressional Research Service, and Government  Printing Office and the appropriate entities of the House of  Representatives are directed to prepare a report on the feasibility of providing advanced search capabilities. This report is to be provided to the Committees on Appropriations of the House and Senate within 120 days of the release of Legislative Information System 2.0.

Civil Society Organization Resources

News Stories

Additional Resources

The History of THOMAS Generally

States that provide bulk access to legislative data

Parliaments with Bulk Data Access

Congress's Bulk Data Task Force Questions

Page 18 of the Leg Approps Report (Hrpt 511):

http://appropriations.house.gov/uploadedfiles/crpt-112hrpt511.pdf

"The GPO and Congress are moving toward the use of XML as the data standard for legislative information. The House and Senate are creating bills in XML format and are moving toward creating other congressional documents in XML for input to the GPO. At this point, however, the challenge of authenticating downloads of bulk data legislative data files in XML remains unresolved, and there continues to be a range of associated questions and issues: Which Legislative Branch agency would be the provider of bulk data downloads of legislative information in XML, and how would this service be authorized. How would ‘‘House’’ information be differentiated from ‘‘Senate’’ information for the purposes of bulk data downloads in XML? What would be the impact of bulk downloads of legislative data in XML on the timeliness and authoritativeness of congressional information? What would be the estimated timeline for the development of a system of authentication for bulk data downloads of legislative information in XML? What are the projected budgetary impacts of system development and implementation, including potential costs for support that may be required by third party users of legislative bulk data sets in XML, as well as any indirect costs, such as potential requirements for Congress to confirm or invalidate third party analyses of legislative data based on bulk downloads in XML? Are there other data models or alternative that can enhance congressional openness and transparency without relying on bulk data downloads in XML? The Committee directs the establishment of a task force composed of staff representatives of the Library of Congress, the Congressional Research Service, the Clerk of the House, the Government Printing Office, and such other congressional offices as may be necessary, to examine these and any additional issues it considers relevant and to report back to the Committee on Appropriations of the House and Senate. "

Ideas for Upgrading THOMAS

Top Suggestions

  • Bulk Access to THOMAS data
  • Incorporate open data principles

Meta Suggestions

  • Have regular roundtable discussions with members of public and government to discuss ideas for improving THOMAS
  • Create THOMAS users group (email discussion?)
  • Programmer access page: for XML access, RSS feeds, email sign ups, etc.
  • Work to improve parsability of all search results; more structured data
    • All bills in XML
    • Singe page (no pagination) that lists every bill in Congress with status; updated daily on a new page (for scraping); preferably in a feed or XML format
  • Create and make public unique IDs for commonly used entities (or draw upon those created by others)
    • List of all Committees and Subcommittees Members
  • Incorporate Senate Amendments (See S Res. 562)
  • Consider redesign of site (look at LIS, GovTrak, OpenCongress for ideas + public)
  • Provide more detailed history of how THOMAS came to be

Specific Suggestions

  • Make Public Laws Searchable by law number and by name
  • Allow for bill alerts system (email) for bills and topics
  • Add short name of bill to weekly top 5 (plus link to archives)
  • Allow highlighting of "hot" bills -- where there's some kind of legislative action
  • Word/Phrase vs. Bill Number
    • have search box handle both;
    • allow search of entire bill text
    • make selection of phrase vs number sticky
  • Improve "related bills" -- run comparison of bill summaries/ text -- both in this Congress and over past Congresses
    • Make easier to trace bills through, especially when there is a substitute
    • e.g., HR 3200 became HR 3590
  • Is legislation searchable by CRS tags? (Make available list of tags). Add tags to each bill, so can search for related bills.
  • Organize front page of THOMAS around what's going on today in congress; with info on yesterday and upcoming
  • Permalink: "save" on share/save tab is confusing; perhaps make its own link
  • Daily Digest -- when send email, include contents of daily digest, not just link
  • Increase size of search fields
  • 3 organizing links:
    • what's going on today -- running info from floor embedded into THOMAS
    • what happened yesterday
    • what's upcoming this week
  • order plain language search for bills by topic + frequency and tags
  • Is search boolean?
    • want to be able to eliminate terms from search (the "not" function, e.g. Israel not steve)
  • When in search result, there's a calendar, link to it automatically

Fun Suggestions

  • Create twitter account to tweet whenever a bill is introduced (see OLRC) or goes to committee, enacted, etc.; tweet top five viewed bills
  • Mobile version
Toolbox

OpenCongress is a joint project of the Participatory Politics Foundation and the Sunlight Foundation. Questions? Comments? Contact Us