THOMAS bulk data access

From OpenCongress Wiki

Jump to: navigation, search


This page is part of the Transparency Hub project.
Add what you know.

Contents

Introduction

This wiki gathers information concerning public bulk access to information stored on THOMAS, a comprehensive Internet-accessible database that makes federal legislative information available to the public at no cost. THOMAS is operated by the Library of Congress and was launched in January of 1995 at the inception of the 104th Congress.

Quick Facts

  • At least twice as many people access congressional legislative information through third party sources than directly through the THOMAS website. Major third party sources include GovTrack.us, OpenCongress.org, and Sunlight's Congress app for Android.
  • Providing “bulk access to data” means releasing an entire database for use by others.
  • GPO currently publishes 6 datasets in bulk (including the Federal Register); Data.gov (launched March 2010) has 400,000 datasets; New Jersey and New Hampshire publish legislative data in bulk.
  • A coalition of organizations issues the major Open House Report calling on Congress to "embrace structured data by publishing the status of legislation and other information to the Web not only as it is now, but also in structured data formats." (May 2007) (http://bit.ly/HkPycb)
  • The Explanatory Statement accompanying the Committee Print of the House Committee on Appropriations for Public Law 111-9 (March 2009) articulates Congress' support for bulk access to legislative information. (http://1.usa.gov/I2UvJG p. 1770)
  • In 2008, the Library of Congress says it expected to report on the resources necessary to supply the public with raw legislative data within the first part of the calendar year. It established a bulk data task force that has never completed its deliberations. (http://bit.ly/A4c5le)
  • Rep. Bill Foster introduced HR 6289 (in the 111th Congress) that would require some legislative data to be made available in bulk and create a THOMAS advisory committee. (Sep. 2010) (http://1.usa.gov/HZthAp)
  • Congressional Facebook Hackathon endorses bulk access to legislative data as an action item: "Release Structured Machine-Readable Legislative Data: Providing legislative data in a bulk format to enable third-party developers to create more dynamic interfaces for legislative information." (November 2011) (http://1.usa.gov/ygzQpl)
  • 30 organizations and companies call for bulk access to legislative data and the creation of an advisory committee. (April 6, 2012)


Blog Posts

Policy Documents and Gov't Resources

Government Resources

Civil Society Organization Resources

News Stories

Additional Resources

The History of THOMAS Generally

Launch of Beta.Congress.gov

States that provide bulk access to legislative data

Bulk Data

  • California - Daily File, Bill Information, California Codes, California Constitution, Statutes, Rules of the Legislature, and prior session information
    (but not legislator/ committee details) available as SQL files downloadable from a single location
  • New Hampshire - 100% of data available as zip file on site; within files is fixed width text
  • New Jersey - 100% of data available in bulk via FTP site; updated daily; DBF format
  • New Mexico - all bill data as zip file (but missing legislators, committee info)
  • North Carolina - Vote data and basic legislative data available as zip files, but missing certain data (e.g. sponsors)
  • The Sunlight Foundation scrapes and provides bulk access to [50 of 50 state legislative data]

APIs, but not Bulk

  • Kansas
  • Oregon
  • New York

Machine readable, but not bulk

  • Minnesota (XML)
  • Texas (XML)
  • Mississippi (XML)
  • Connecticut (csv)

Historical Resources on the Development of Congress' Legislative Information Systems

Legislative Language and Committee Reports

H. Rept. 103-517 (accompanying P.L. 103-283, Legislative Branch Appropriations Act, 1995)

The Committee on House Administration and the Senate Com-
mittee on Rules and Administration have indicated that there are
several instances where Congressional information systems may be
generating or tracking duplicate information. For example, there
are several data bases which maintain the status of legislation, in-cluding
House, Senate, and Library of Congress-operated systems.

The Committee directs the automation department of the Li-
brary of Congress to work with the Committees enumerated
with the objective of identifying and eliminating such redundancies. At
the conclusion of these consultations, the Committees on Appro-
priations would be pleased to receive a report of findings and rec-
ommendations.

S. Rept. 104-114 (accompanying Legislative Branch Appropriation 1996)

The Committee has recommended an administrative provision,
section 210, requesting the Library of Congress to develop and
maintain, in coordination with other appropriate legislative branch
entities, a single legislative information retrieval system to serve
the entire Congress. The purpose of this provision is to reduce the
cost of information support for the Congress by eliminating duplication
among systems which provide electronic access to legislative
information. House Report No. 103–517 directed the Library to conduct
a study to identify and eliminate such redundancies in congressional
information systems.

Based upon preliminary analysis and discussions, the Committee
sees opportunities to eliminate duplication and, therefore, directs
the Library to complete the study, and develop a plan for the Library
to create and maintain, in coordination with other appropriate
legislative branch entities, a single legislative information
retrieval system to serve the entire Congress.

To the extent possible, the Library’s system should be closely integrated
with other legislative branch systems that provide access
to information related to legislation. Such information includes, but
is not limited to, information originating from other legislative
branch support agencies and offices, publications of conference organizations,
and information by or about Members.

The officers of the Senate and House, as well as committees and
subcommittees, and any other legislative entities responsible for
the origination of legislative information, shall retain their authority
over and responsibility for the accuracy and integrity of their
information.

H. Rept 104-141 (accompanying the Legislative Branch Appropriations Bill, 1996)

The Committee has asked the Clerk of the House to investigate
methods for increasing electronic printing of House documents. The
proposal should be coordinated with the House entities (such as
committees, legislative and law revision counsels, etc.) who require
document printing and storage to carry out their legislative responsibilities,
and with GPO, and should be presented to the appropriate authorities for
approval before implementation. The position
of Assistant Clerk (FEC) has been eliminated in a reorganization
of the Clerk’s office; funds for that position, therefore, have not
been provided. Funds for subscriptions to the U.S. Code have also
been deleted from the Clerk’s budget. For those Members who require
office copies, the Code can be purchased from official expense
funds. Alternatively, the Code is available in the House library, at
the Library of Congress, on Internet through the ‘Thomas’ connection,
through GPO ‘Access’, another online service, and on CD–
ROM which is available from the Government Printing Office.
Closed captioning funds are not provided since the Committee has
been told that the contract will be renewed with FY 1995 funds.
Also, funds for contracting out stenographic reporting of Committee
hearings are provided in the Clerk’s budget ($800,000, a savings of
$300,000 below the amount provided in FY 1995.) It should be
noted that funding for the U.S. Code, stenographic contracting, and
newspaper subscriptions have formerly been carried in the ‘
‘Allowances and expenses’’ appropriating paragraph. The Clerk does not
control the use of these funds, but does the ordering or contracting
as a service to other House offices, a more convenient administrative
procedure. The Clerk should consult the users of such services
to determine their continued need and to fully inform the ultimate
consumers of their actual cost.

H. Rept. 104-212 (accompanying Legislative Branch Appropriations 1996)

Amendment numbered 32:

That the House recede from its disagreement to the amendment
of the Senate numbered 32, and agree to the same with an
amendment, as follows:

In lieu of the matter proposed by said amendment, insert:

SEC. 209.(a) The purpose of this section is to reduce the cost of
information support for the Congress by eliminating duplication
among systems which provide electronic access by Congress to legislative
information.

(b) As used in this section, the term ‘‘legislative information’’
means information, prepared within the legislative branch, consisting
of the text of publicly available bills, amendments, committee
hearings, and committee reports, the text of the Congressional
Record, data relating to bill status, data relating to legislative activity,
and other similar public information that is directly related to
the legislative process.

(c) Pursuant to the plan approved under subsection (d) and consistent
with the provisions of any other law, the Library of Congress
or the entity designated by that plan shall develop and maintain,
in coordination with other appropriate entities of the legislative
branch, a single legislative information retrieval system to serve the
entire Congress.

(d) The Library shall develop a plan for creation of this system,
taking into consideration the findings and recommendations of the
study directed by House Report No. 103–517 to identify and eliminate
redundancies in congressional information systems. This plan
must be approved by the Committee on Rules and Administration
of the Senate, the Committee on House Oversight of the House of
Representatives, and the Committees on Appropriations of the Senate
and the House of Representatives. The Library shall provide
these committees with regular status reports on the development of
the plan.

(e) In formulating its plan, the Library shall examine issues regarding
efficient ways to make this information available to the
public. This analysis shall be submitted to the Committees on Appropriations
of the Senate and the House of Representatives as well
as the Committee on Rules and Administration of the Senate, and
the Committee on House Oversight of the House of Representatives
for their consideration and possible action.

And the Senate agree to the same.

Amendment numbered 34:

That the House recede from its disagreement to the amendment
of the Senate numbered 34, and agree to the same with an
amendment, as follows:

Restore the matter stricken by said amendment, amended to
read as follows:

ADMINISTRATIVE PROVISION

SEC. 210. The fiscal year 1997 budget submission of the Public
Printer to the Congress for the Government Printing Office shall include
appropriations requests and recommendations to the Congress
that—

(1) are consistent with the strategic plan included in the
technological study performed by the Public Printer pursuant to
Senate Report 104–114;

(2) assure substantial progress toward maximum use of
electronic information dissemination technologies by all departments,
agencies, and other entities of the Government with respect
to the Depository Library Program and information dissemination
generally; and

(3) are formulated so as to require that any department,
agency, or other entity of the Government that does not make
such progress shall bear from its own resources the cost of its
information dissemination by other than electronic means.

And the Senate agree to the same.

H. Rept. 104-657 (accompanying Legislative Branch Appropriations bill 1997)

The Committee bill continues to stress the use of cost-effective
electronic format and telecommunications technologies. The agencies
of the Legislative Branch are striving toward a CyberCongress
mode whereby information can be shared more easily among the
agencies and with the public at large. It has been estimated that
the accompanying legislative branch appropriations bill contains
over $211 million for computer, telecommunications, and other information
processing operations and investments. These resources,
amounting to about 12.5% of the entire amount appropriated, in
clude the investments necessary to maintain an effective legislative
process during times of continued budget restraint while, at the
same time, continuing to develop capabilities that will facilitate information
exchange among agencies and the public.

This is not a small undertaking. In the House of Representatives,
funds are provided to equip Member, committee, and staff offices
with up-to-date computing and communications capabilities to facilitate
information processing within and between Congressional
offices, including district office locations. The THOMAS system at
the Library of Congress has made tremendous progress in making
Congressional information products available to both Congress and
the general public through Internet. The Library of Congress continues
to develop the technology for a digital library. The Government
Printing Office continues to upgrade their own electronic data
base, ACCESS, which also provides a great deal of legislative information
in direct access, on line format. The Superintendent of Documents
is pursuing a program to transition the Federal Depository
Library program to electronic format within a reasonable period of
time. In addition, the General Accounting Office has virtually completed
a ‘‘shared resources’’ project which facilitates audit and program
evaluation work done in the field by that agency. The Congressional
Research Service and Copyright Office are investing in
optical storage systems and other advanced technologies, and the
Architect of the Capitol continues to maintain the basic telecommunications
infrastructure ‘‘CAPNET’’, which provides the
communications pathway for legislative agencies to share this data
with each other.

These are only a few examples of the inexorable movement toward
CyberCongress. All of these and other related efforts are
funded in this appropriations bill. Much of the savings made necessary
due to the constraints on funding of legislative activities are
only possible because of the continued investments made in information
processing technology. This bill maintains the commitment
to going forward with the infrastructure necessary to utilize modern
telecommunications capabilities.

In a related matter, the Committee on House Oversight and the
Senate Committee on Rules and Administration have begun a process
to develop a common information dissemination system. The
Clerk of the House and the Secretary of the Senate have been
called upon to coordinate the project with the oversight of those
Committees and to ultimately propose the standards for a legislative
branch wide information system to the Committees for approval.
An open exchange of technology, projects, plans and developments
are crucial to the success of a legislative branch wide information
system. It is expected, therefore, that the following organizations
will be relied upon to participate and assist in all the efforts
of the Clerk and the Secretary: the Library of Congress, the
Government Printing Office, House Information Resources, the
Senate Computer Center, the General Accounting Office, the Congressional
Budget Office, and the Architect of the Capitol.

Section 209 of the Legislative Branch Appropriations Act, 1996,
directed the Library of Congress to develop a plan and supporting
analyses for this system. In so doing, the Library identified the
major programs under development in various parts of the legislative
branch as well as a significant amount of duplication. The
process begun by the oversight committees will enable the
strengths of each program to be recognized and integrated into a
system that will benefit Congress as a whole.

♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦

The Clerk of the House is encouraged to continue with efforts to
implement various print on demand capabilities related to legislative
documents, subject to appropriate approvals. In particular, the
Clerk should establish, in consultation with the Committee on
House Oversight and the Joint Committee on Printing, a formal
system in accordance with Title 44, United States Code, to expand
print on demand use in the House Document Room. The Clerk
should prepare a report for submission to the Committee on House
Oversight outlining the various print on demand goals, a timetable
for their implementation, and a projection of the benefits, costs,
and cost reductions associated with each.

The Clerk has indicated there has been a nearly 80 percent reduction
in requested document reprints from the Government
Printing Office for use in the House Document Room. The Committee
supports this cost saving effort and encourages the Clerk, in
consultation with the Joint Committee on Printing, to continue
these efforts to minimize reprinting where feasible.

The Clerk is also directed, in consultation with the Secretary of
the Senate, the Joint Committee on Printing, and the Government
Printing Office, to study and determine alternatives to the current
procedures being used for creating, formatting and transmitting
Committee and other House documents in preparation for printing.

As the Congress moves toward modernization of technology and
print-on-demand capability, alternatives to continued reliance upon
GPO details should be evaluated. It may be that in-house expertise
and technology can be used more cost-effectively. The Clerk will be
expected, after due consultation as noted above, to present recommendations
in the next appropriations cycle. Funds for this effort,
which should not exceed $100,000, may be derived from savings
in the Clerk’s budget, or elsewhere in the ‘‘salaries, officers
and employees’’ line item. If necessary, the Committee will consider
a reprogramming of funds presented in the customary manner.
Moreover, it is expected that the same staff resources can expedite
posting of committee legislative information on the THOMAS system.

The Clerk has also indicated that various steps are being taken
to establish common standard generalized markup language
(SGML) definitions for the creation of legislative documents in electronic
format. This is consistent with actions being taken throughout
the Legislative Branch. The Clerk should seek guidance from
the Committee on House Oversight, the Joint Committee on Printing,
Government Printing Office, House Information Resources, the
Secretary of the Senate, private industry, and other interested parties,
in establishing standards that are based upon past and ongoing
GPO, HIR, and Senate efforts. The overarching objective should
be the development of standards and systems that will be of common
use by the Clerk and other interested Legislative Branch entities.

♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦

The Committee has included $81,669,000 for printing and binding of congressional documents at the Government Printing Office
for use by Congress and by-law programs. The amount provided reflects a savings of $1,050,000 by converting the permanent, bound
Congressional Record to a CD–ROM format. The daily Congressional Record will continue to be distributed in the formats preferred by the recipients, i.e., paper or microfiche, and is also available electronically via the widely-accessible Internet distribution
network through the THOMAS system and the GPO ACCESS network. The permanent, paper-based bound Record, which is delayed
in production by 8 years at the present time, is a perfect candidate
for electronic format. Each set costs almost $12,000 to print and
bind, and is made available in limited quantities. CD–ROM’s can
be provided at a fraction of this cost and will be very flexible research tools in library or office settings, where the bound paper
sets are normally utilized. The bill provides $100,000 for a more
limited number of printed, permanent Records which can be produced from the less expensive CD–ROM format data base setup.
These copies can be distributed at the direction of the Joint Committee on Printing. For those offices and institutions that cannot
do without paper copies, CD–ROM’s can be printed by commercial
printing establisments at a much smaller cost than current charges
against the Congressional printing and binding appropriation.

The Committee has been informed that the conversion to CD–
ROM will expedite the availability of the permanent version of the
Congressional Record by several years, thereby making it available
much sooner than the current 8-year delay. The GPO is directed
to develop a plan that will minimize the time necessary to distribute this record of House and Senate debate. The plan should include the objectives and a time line for achieving the time savings.

Also, the GPO, in consultation with the Library of Congress, should plan to make the CD–ROM version of the permanent Record available on Internet to the broadest possible audience.
Both plans should be presented in the fiscal year 1998 budget
submission.

A general reduction of $1,051,000 has also been taken. The GPO,
in consultation with the Joint Committee on Printing, should review those materials which are non-legislative in nature now being
charged against this appropriation and determine the extent to
which House or Senate can provide direct reimbursement or reduce
the need for such material.


H. Rept. 104-733 (accompanying P.L. 104-53, Legislative Branch Appropriations Act, 1997)

Amendment No. 23: Deletes a provision proposed by the Senate
regarding an electronic information system. The managers on the
part of the House and Senate agree that the Congressional Research
Service, upon the request of the Senate Committee on Rules
and Administration, and in consultation with the Secretary of the
Senate and the heads of the appropriate offices and agencies of the
legislative branch, shall coordinate the development of an electronic
congressional legislative information and document retrieval
system to provide for the legislative information needs of the Senate
through the exchange and retrieval of information and documents
among legislative branch offices and agencies. The managers
on the part of the House and the Senate also agree that the Library
of Congress shall assist the Congressional Research Service
in supporting the Senate in this effort, and shall provide technical
staff and resources as may be necessary.

S. Rept. 105-16 (accompanying Supplementation Appropriations and Rescissions Act, 1997)

The Committee recommends the transfer of $5,000,000 from
funds available under the heading ‘‘Senate’’ to the Secretary of the
Senate, to be available through September 30, 2000, for development
and implementation of a comprehensive, Senatewide legislative
information system [LIS]. The accounts from which the transfers
occur are contingent upon the approval of the Committee on
Appropriations. Pursuant to section 8 of the Legislative Branch Appropriations
Act, 1997, the Secretary is required to develop and implement
LIS under the oversight of the Committee on Rules and
Administration.

H. Rept. 105-196 (accompanying H.R. 2209, Legislative Branch Appropriations Bill, 1998)

An open exchange of technology, projects, plans and developments
is crucial to the success of a legislative branch wide information
system. It is expected, therefore, that the following organizations
will continue to participate and assist in all the efforts of the
Clerk of the House and the Secretary of the Senate: the Library of
Congress, the Government Printing Office, House Information Resources,
the Senate Computer Center, the General Accounting Office,
the Congressional Budget Office, and the Architect of the Capitol.

The Committee on House Oversight and the Senate Committee
on Rules and Administration have begun a process to develop a
common information dissemination system. The Legislative Information
System (LIS) being developed by the Congressional Research
Service and the Library of Congress, when completed, will
replace the retrieval functions for legislative information systems
currently being operated by House Information Resources (HIR).
The Library and CRS must devote sufficient resources to accomplish
the following during FY1998:

Provide comparable functionality so that legacy retrieval systems
can be retired by 12/31/98;

Improve the productivity of Congressional staff by making significant
progress in implementing previously identified high
priority functionality; and

Improve the accuracy, usability, and timeliness of legislative
information retrieval.

S. Rept. 105-47 (accompanying S. 1019, Legislative Branch Appropriations for FY ending Sep. 30 1998)

In the conference report (H. Rept. 104–733) accompanying the
fiscal year 1997 legislative branch appropriation bill (Public Law
104–197), the Congressional Research Service was directed to coordinate,
and the Library of Congress was directed to provide technical
support for, the development of a legislative information retrieval
system to serve the Senate. The Senate has undertaken a
major program to rebuild its systems for creating and managing its
legislative information. This program, which will take several years
to complete, is being carried out by the Secretary of the Senate
with the oversight of the Senate Committee on Rules and Administration.
The retrieval system being coordinated by CRS and supported
by the Library is an integral part of that program. CRS and
the Library are, therefore, directed to continue their development
of the legislative retrieval system for the Senate in conjunction
with the Senate’s efforts to manage its legislative information more
efficiently.

♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ 

The Senate Committee on Rules and Administration and House
Oversight Committee, per the recommendation of the Secretary of
the Senate and the Clerk of the House, have approved the establishment
of a data standards program, including standard generalized
markup language [SGML] for data interchange of legislative
information. The purpose of this program is to ensure that the
preparation and exchange of legislative information is made more
efficient through the use of data standards. Once published these
standards will be used by all legislative branch agencies, including
GPO in transmitting and producing information which is utilized
in the legislative process. The Secretary of the Senate and Clerk
of the House will be responsible for updating and maintaining and
publishing the data interchange standards for legislative information.

S. Rept. 105-204 (accompanying Legislative Branch Appropriations Act, 1999)

In the conference report (H. Rept. 104–733) accompanying the
fiscal year 1997 legislative branch appropriation bill (Public Law
104–197), the Congressional Research Service was directed to coordinate,
and the Library of Congress was directed to provide technical
support, for the development of a legislative information retrieval
system to serve the Senate.

The Senate has undertaken a major program to rebuild its systems
for creating and managing its legislative information. Although
this program is going to take a number of years to complete,
the Senate is already realizing benefits from this program.
The Secretary of the Senate, with the technical support of the Sergeant
at Arms, is providing Senate offices floor amendments electronically
minutes after being introduced on the floor.

The retrieval system being designed and maintained to provide
a comprehensive legislative resource by the CRS and supported by
the Library is proving to be a valued recourse for Senate and congressional
office. CRS and the Library are, therefore, directed to
continue their development of the legislative retrieval system for
the Senate and provide an annual report outlining the strategic objective
of this initiative.

H. Conf. Rept. 105-734 (accompanying HR 4112, Legislative Branch Appropriations Bill for FY ending Sept. 30, 1999)

The conferees agree with language in the House report directing
the Library to develop measurements of the extent of the collections
security problem and with language in the Senate report urging
the Library to continue efforts to assist the Senate with a legislative
information retrieval system.

H. Rept. 106-635 (accompanying the Legislative Branch Appropriations Bill 2001)

Information security is a collective responsibility within the legislative
branch. The Clerk of the House in consultation with the Secretary
of the Senate shall consult with all legislative branch entities
that create or store legislative information in electronic form
and prepare standards and procedures for ensuring the security of
such information as well as for establishing a process to routinely
assess risks to the security of legislative information.

The Clerk in consultation with the Secretary shall submit proposals
for standards and procedures for approval to the Committee
on House Administration and the Senate Committee on Rules and
Administration, respectively, on a date to be specified by those
Committees. Upon approval, the Clerk, the Secretary, and the legislative
branch entities shall provide their plans to the House Committee
on Appropriations and Senate Committee on Appropriations.

The Library of Congress and the Government Printing Office
shall work with the Clerk and the Secretary to test, develop, and
implement, no later than January 3, 2001, systems that will enable
them to confirm the authenticity of such legislative information.

S. Rept. 107-37 (accompanying S. 1172, Legislative Branch Appropriations 2002)

The Committee recommends an appropriation of $8,571,000 for
expenses of the Office of the Secretary. The Committee has included
$7,000,000 for the Legislative Information System Augmentation
Project.

H. Rept. 110-98 (accompanying Legislative Branch Appropriations Bill, 2008)

Improved Access to Roll Call Information.—The Committee believes
the public could benefit from more easily accessible roll call
information. To that end, the Committee requests that the Chief
Administrative Officer work with the Clerk of the House and the
Library of Congress to study how, within the public House of Representatives
website and the THOMAS website, a joint system
might be developed to allow roll call searches by specific word, and
report back to the Committee on Appropriations of the House by
December 1, 2007.

Joint Explanatory Statement, House Committee on Appropriations, Omnibus Act, 2009 (accompanying H.R. 1105 / Public Law 111-8, Omnibus Appropriations Act of 2009)

See Book G, explanatory statement on Congressional Research Service Salaries and Expenses, the paragraph starting with the phrase "Public Access to Legislative Data" (or page 10 of this PDF) (March 2009).

Public Access to Legislative Data.--There is support for enhancing public access to legislative documents, bill status, summary information, and other legislative data through more direct methods such as bulk data downloads and other means of no-charge digital access to legislative databases. The Library of Congress, Congressional Research Service, and Government  Printing Office and the appropriate entities of the House of  Representatives are directed to prepare a report on the feasibility of providing advanced search capabilities. This report is to be provided to the Committees on Appropriations of the House and Senate within 120 days of the release of Legislative Information System 2.0.

H. Rept. 112-511 (accompanying HR 5882, Legislative Branch Appropriations Bill for 2013)

During the hearings this year, the Committee heard testimony
on the dissemination of congressional information products in Extensible
Markup Language (XML) format. XML permits data to be
reused and repurposed not only for print output but for conversion
into ebooks, mobile web applications, and other forms of content delivery
including data mashups and other analytical tools. The Com-
mittee has heard requests for the increased dissemination of congressional
information via bulk data download from non-governmental
groups supporting openness and transparency in the legislative
process. While sharing these goals, the Committee is also
concerned that Congress maintains the ability to ensure that its
legislative data files remain intact and a trusted source once they
are removed from the Government’s domain to private sites.

The GPO currently ensures the authenticity of the congressional
information it disseminates to the public through its Federal Digital
System and the Library Congress’s THOMAS system by the
use of digital signature technology applied to the Portable Document
Format (PDF) version of the document, which matches the
printed document. The use of this technology attests that the digital
version of the document has not been altered since it was authenticated
and disseminated by GPO. At this time, only PDF files
can be digitally signed in native format for authentication purposes.
There currently is no comparable technology for the application
and verification of digital signatures on XML documents.
While the GPO currently provides bulk data access to information
products of the Office of the Federal Register, the limitations on
the authenticity and integrity of those data files are clearly spelled
out in the user guide that accompanies those files on GPO’s Federal
Digital System.

The GPO and Congress are moving toward the use of XML as
the data standard for legislative information. The House and Senate
are creating bills in XML format and are moving toward creating
other congressional documents in XML for input to the GPO.

At this point, however, the challenge of authenticating downloads
of bulk data legislative data files in XML remains unresolved, and
there continues to be a range of associated questions and issues:
Which Legislative Branch agency would be the provider of bulk
data downloads of legislative information in XML, and how would
this service be authorized. How would ‘‘House’’ information be differentiated
from ‘‘Senate’’ information for the purposes of bulk data
downloads in XML? What would be the impact of bulk downloads
of legislative data in XML on the timeliness and authoritativeness
of congressional information? What would be the estimated
timeline for the development of a system of authentication for bulk
data downloads of legislative information in XML? What are the
projected budgetary impacts of system development and implementation,
including potential costs for support that may be required
by third party users of legislative bulk data sets in XML, as well
as any indirect costs, such as potential requirements for Congress
to confirm or invalidate third party analyses of legislative data
based on bulk downloads in XML? Are there other data models or
alternative that can enhance congressional openness and transparency
without relying on bulk data downloads in XML?

The Committee directs the establishment of a task force composed
of staff representatives of the Library of Congress, the Congressional
Research Service, the Clerk of the House, the Government
Printing Office, and such other congressional offices as may
be necessary, to examine these and any additional issues it considers
relevant and to report back to the Committee on Appropriations
of the House and Senate.

Congressional Hearings

Library of Congress: Ensuring Continuity and Efficiency During Leadership Transitions, Committee on House Administration (April 18, 2012)

Modernizing Information Delivery in the House, Committee on House Administration (June 16, 2011)

Oversight of the Clerk, Sergeant At Arms, Chief Administrative Officer, and Inspector General of the House of Representatives, Committee on House Administration (April 28, 2010) (notable for inclusion of Office of the Clerk's Semi-Annual Report, Office of the Sergeant At Arms Semiannual Report, CAO Semiannual Report and more)

Library of Congress IT Strategic Planning, Committee on House Administration (April 29, 2009)

Hearing on IT Assessment: A Ten Year Vision for Technology in the House, Committee on House Administration (September 27, 2006) (contains a CMF report: House IT Assessment Project)

Documents and Reports Prepared by Congress and Legislative Branch Support Agencies

  • Duplication Among Legislative Tracking Systems: Findings, A Report Prepared by the Library of Congress for the House and Senate Appropriations Committees Pursuant to House Report 103-517 and House Report 104-141, July 14, 1995
  • A Plan for a New Legislative Information System for the United States Congress, Prepared by the Library of Congress, February 16, 1996
  • The Legislative Information System Strategic Objective Report, FY2012

Ideas for Upgrading THOMAS

Top Suggestions

  • Bulk Access to THOMAS data
  • Incorporate open data principles

Meta Suggestions

  • Have regular roundtable discussions with members of public and government to discuss ideas for improving THOMAS
  • Create THOMAS users group (email discussion?)
  • Programmer access page: for XML access, RSS feeds, email sign ups, etc.
  • Work to improve parsability of all search results; more structured data
    • All bills in XML
    • Singe page (no pagination) that lists every bill in Congress with status; updated daily on a new page (for scraping); preferably in a feed or XML format
  • Create and make public unique IDs for commonly used entities (or draw upon those created by others)
    • List of all Committees and Subcommittees Members
  • Incorporate Senate Amendments (See S Res. 562)
  • Consider redesign of site (look at LIS, GovTrak, OpenCongress for ideas + public)
  • Provide more detailed history of how THOMAS came to be

Specific Suggestions

  • Make Public Laws Searchable by law number and by name
  • Allow for bill alerts system (email) for bills and topics
  • Add short name of bill to weekly top 5 (plus link to archives)
  • Allow highlighting of "hot" bills -- where there's some kind of legislative action
  • Word/Phrase vs. Bill Number
    • have search box handle both;
    • allow search of entire bill text
    • make selection of phrase vs number sticky
  • Improve "related bills" -- run comparison of bill summaries/ text -- both in this Congress and over past Congresses
    • Make easier to trace bills through, especially when there is a substitute
    • e.g., HR 3200 became HR 3590
  • Is legislation searchable by CRS tags? (Make available list of tags). Add tags to each bill, so can search for related bills.
  • Organize front page of THOMAS around what's going on today in congress; with info on yesterday and upcoming
  • Permalink: "save" on share/save tab is confusing; perhaps make its own link
  • Daily Digest -- when send email, include contents of daily digest, not just link
  • Increase size of search fields
  • 3 organizing links:
    • what's going on today -- running info from floor embedded into THOMAS
    • what happened yesterday
    • what's upcoming this week
  • order plain language search for bills by topic + frequency and tags
  • Is search boolean?
    • want to be able to eliminate terms from search (the "not" function, e.g. Israel not steve)
  • When in search result, there's a calendar, link to it automatically

Fun Suggestions

  • Create twitter account to tweet whenever a bill is introduced (see OLRC) or goes to committee, enacted, etc.; tweet top five viewed bills
  • Mobile version
Toolbox