Database Review: HathiTrust Digital Library

HathiTrust Digital Library

September 29, 2021

piadesangles@yahoo.com

HathiTrust Digital Library has been providing services since 2008 when it was founded. This database preserves and provides access to over 17 million digitized items. HathiTrust Digital Library is one of the services offered by HathiTrust, a non-profit collaborative of academic and research libraries offering reading access to the fullest extent allowable by U.S. copyright law, computational access for scholarly research, and other services like the “Shared Print Program,” a shared network of print collections with collective print retention, and the “Copyright Review Program,” dedicated to find and open public domain materials in the U.S and around the world.[1]

Publisher:

HathiTrust

https://www.hathitrust.org/

Publisher About Page:

https://www.hathitrust.org/about

History

The HathiTrust project started in 2008 when the University of Michigan proposed to build a shared digital repository with the libraries associated with the Committee on Institutional Cooperation (now the Big Ten Academy Alliance) and the University of California. HathiTrust quickly opened up to other members that wanted to archive and share their digital collections. HathiTrust includes digitized books and journal articles both copyrighted and public domain, digitized by Google, the Internet Archive, and Microsoft. The HathiTrust repository, conceived initially as a preservation environment, gradually developed access systems, such as a page-viewing and a collection-building application, an information website, and other applications like the catalog and full-text search. Since its beginning, many universities have joined HathiTrust, such as Cornell University Library, Dartmouth College Library, the New York Public Library, Princeton University, the Triangle Research Libraries Network (North Carolina), and the University of Virginia, and Yale University.[2]

Hathi means elephant in Hindi, an animal associated with memory, wisdom, and strength, representing the organization’s values[3]. These values are embedded in HathiTrust’s mission and goals.[4] The member libraries aim to build a comprehensive archive of published literature worldwide and develop shared strategies for managing and developing their digital and print holdings collaboratively. HathiTrust primarily serves the communities of its member libraries and institutions (faculty, staff, students). Still, the materials in HathiTrust are available to all to the extent permitted by law and contracts, providing the published record as a public good to users worldwide.

Object Types:

Books, manuscripts, journal articles, government documents, and other volumes

Location of Original Materials:

The digitized content provided through the HathiTrust Digital Library comes from various sources, including Google, the Internet Archive, Microsoft, and in-house member institution initiatives. Items in the public domain are in full view for everyone, and items held in copyright are searchable. According to Angelina Zaytsev, the Chair of HathiTrust User Support, “The main body of materials, over 13 million volumes, is comprised of scanned copies of printed books held in academic institutions around the world.”[5] Print copies of relevant works in HathiTrust must be owned currently or have been owned previously by the institution’s library system.

Exportable Material:

Members can download public domain works in their entirety and works made available under Creative Commons licenses. Guest users can download one page at a time of all public domain works or an entire work that doesn’t have download restrictions. Furthermore, HathiTrust remarks they do not have the authority to grant or deny permission to use images from volumes in the public domain or open access. It would help if you made your own assessment of the copyright or other legal concerns related to uses beyond those provided by HathiTrust for particular works.  

Titles List link

As of today, HathiTrust Digital Library contains digitized material that includes:

  • 17,490,052 total volumes
  • 8,425,057 book titles
  • 470,702 serial titles
  • 6,121,518,200 pages
  • 784 terabytes
  • 207 miles
  • 14,211 tons
  • 6,885,245 volumes (~39% of total) in the public domain[6]

Search Options

HathiTrust Digital Library is a digital preservation repository and access platform that provides digital content, a collection of millions of titles digitized from libraries worldwide. The digitalized content provided by HathiTrust delivers from various sources, including Google, the Internet Archive, Microsoft, and in-house member institution initiatives. Items in the public domain are in full view for everyone, and items held in copyright are searchable.

 This database offers three search options: primary catalog search, full-text search, and full-text advance search. The primary catalog search consists of phrase searching, which could be looking for an exact phrase using quotes or wildcards where you could use * or ? at the end or in the middle of a word respectively to search for alternate forms of a term. You could also explore the catalog using Boolean search to combine words with AND / OR following Boolean logic. If you want to browse the entire catalog, you can enter * by itself. The full-text search functions similarly to the catalog search, but instead of a “wildcard” option, it provides a multiple-term search.  Instead of using AND/ OR in the full-text advanced search, you can use the “all of these words” dropdown.[7]

In addition, you can search by OCLC number. Since HathiTrust records come from different libraries, some of them do not always contain an OCLC number.

HathiTrust also provides several features for searching in a book, such as: changing the order of the results; turn search term highlighting on the book pages on or off; resize the sidebar by dragging the Close Sidebar icon to make the search results broader or narrower; loading the search results in the sidebar so you can explore the search results while viewing the book pages at the same time. Furthermore, this database has searchable text for volumes in various non-Western scripts, including Russian, Greek, Hebrew, Chinese, Japanese, and Korean.

Access

HathiTrust currently accepts members from either academic and research libraries, university systems of libraries, or Consortia of libraries. In addition, HathiTrust considers members from non-profit, non-U.S. research libraries on a case-by-case basis. Not-for-profit organizations are entities, such as universities, colleges, museums, research centers, and agencies that operate one or more libraries, such as the University of Pennsylvania, Getty Research Institute, New York Public Library, Library of Congress. On the other hand, university systems are multi-institution, not-for-profit higher education systems, operating one or more libraries.[8]

Information from Publisher

The HathiTrust website is self-explanatory and informative, and they encourage feedback to improve the service. The Help section of the digital library will lead the member step-by-step through the search process.

https://www.hathitrust.org/help_digital_library

Citing

HathiTrust users are encouraged to cite and link to digital content and are free to do so without asking for permission. Depending on the source of the digitized work, licenses or other contractual terms may restrict the further distribution or other uses. HathiTrust recommends consulting the “Access and Use” statement included in each item (in the sidebar to the left of the viewing area, next to the “Copyright” heading) for volume-specific information. Moreover, the website indicates that where uses of public domain volumes are permitted or appropriate permissions have been secured, HathiTrust should be attributed as the source of the digital images with the addition of “courtesy of HathiTrust” to the citation, including links to the digital images when possible.[9]  

https://www.hathitrust.org/access_use

Review

In 1938 H. G. Wells said, “The time is close at hand when any student, in any part of the world, will be able to sit with his projector in his study at his or her convenience to examine any book, any document, in a replica.”[10] HathiTrust Digital Library’s commitment to sharing human knowledge by facilitating access to books and articles to people worldwide is getting us closer to Wells’ prediction. Unlike Google Books and the Internet Archive, HathiTrust is not a corporation but an enterprise that believes that sharing information is the responsibility of research libraries.[11] Also different from Google Books and other digital repositories, HathiTrust is dedicated to preserving digitized data. Furthermore, another aspect that differentiates HathiTrust from other digitalized libraries is that, since the very beginning, HathiTrust has incorporated access to individuals with disabilities which is included in its mission statement and goals. Finally, HathiTrust Digital Library has played a central role these past two years when member libraries have experienced interruption during the Coronavirus crisis. Its Emergency Temporary Access Service allows member library patrons to obtain lawful access to specific digital materials in HathiTrust that correspond to physical books held by their library.

Other Reviews

Simone Schloss (Gottesman Libraries Teachers College, Columbia University)

https://blog.library.tc.columbia.edu/b/24490-Open-access-resources-Part-1-HathiTrust-and-the-Internet-Archive-Digital-Libraries

“The HathiTrust Digital Library, and the Internet Archive: Digital Library of Free and Borrowable Books are two valuable Open Access sources of online materials to bookmark and regularly consult for expanded access to digital resources!”[12]

Heather Christenson (California Digital Library)

https://www-proquest-com.mutex.gmu.edu/scholarly-journals/hathitrust-research-library-at-web-scale/docview/862156370/se-2?accountid=14541

‘Because of the size of the HathiTrust repository and the depth of the collaboration involved, the participating libraries are uniquely positioned to leverage technical infrastructure and collective expertise for digital preservation, services, and collection management on an unprecedented scale.”[13]

“Since HathiTrust metadata originates from partner libraries, the libraries have a more direct opportunity to resolve errors, collectively explore how the original cataloging of print volumes can be enhanced and extended to digital volumes, and experiment with optimally integrating bibliographic metadata with full text for search purposes.”[14]

Kevin O’Brien (Library of the Health Sciences, University of Illinois at Chicago, Chicago, IL)

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3066577/

“Although the core of HathiTrust’s content overlaps with the content produced in the Google Books scanning project, HathiTrust does contain items not included in the Google Books database. This content comes from libraries participating in the HathiTrust that were not involved in the Google Books undertaking.”[15]

“HathiTrust is distinguished from Google Books in its commitment to bibliographic standards for item records, sophisticated search options, long-term preservation efforts, and orientation toward cooperative national and international academic institutional endeavors.”[16]

Gita Gunatilleke (freelance librarian and reviewer)

https://doi-org.mutex.gmu.edu/10.5260/chara.13.4.43

“HathiTrust has certainly shown that libraries themselves can collectively work together for the common good and best results, and deliver to the end user what would be the ultimate benefits of new technology and that as more and more institutions join as partners the common good initiative can become truly global.”[17]

Notes


[1] “About,” HathiTrust, September 28, 2021, https://www.hathitrust.org/statistics_about.

[2] Kevin O’Brien, “HathiTrust,” Journal of the Medical Library Association 99, no. 2 (2011): 177, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3066577/.

[3] Iris Xie and Krystyna Matusiak, Discover Digital Libraries: Theory and Practice. Saint Louis: Elsevier, 2016, Retrieved fromhttps://learning-oreilly-com.mutex.gmu.edu/library/view/discover-digital-libraries/9780124201057/B978012417112100017X/B978012417112100017X.xhtml.

[4] “Mission and Goals,” HathiTrust, September 25, 2021, https://www.hathitrust.org/mission_goals.

[5] Angelina Zaytsev, “HathiTrust and a Mission for Accessibility,” The journal of electronic publishing 18, no. 3 (2015), doi:10.3998/3336451.0018.304.

[6] “Statistics and Visualizations,” HathiTrust, September 24, 2021, https://www.hathitrust.org/statistics_visualizations.

[7] “Searching the Collection,” Hathitrust, September 28, 2021, https://www.hathitrust.org/help_digital_library#searchhelp.

[8] “Eligibility Agreements,” HathiTrust, September 28, 2021, https://www.hathitrust.org/eligibility_agreements.

[9] Access and Use Policies, HathiTrust, September 20, 2021, https://www.hathitrust.org/access_use.

[10] Iris Xie and Krystyna Matusiak, Discover Digital Libraries: Theory and Practice.

[11] Heather Christenson, “HathiTrust: A Research Library at Web Scale,” Library Resources & Technical Services 55, no. 2 (04, 2011): 98, https://www-proquest-com.mutex.gmu.edu/scholarly-journals/hathitrust-research-library-at-web-scale/docview/862156370/se-2?accountid=14541.

[12] Simone Schloss, “Open Access resources (Part 1): HathiTrust and the Internet Archive Digital Libraries,” Gottesman Libraries Teacher College (blog), Columbia University, September 29, 2021, https://blog.library.tc.columbia.edu/b/24490-Open-access-resources-Part-1-HathiTrust-and-the-Internet-Archive-Digital-Libraries.

[13] Heather Christenson,” HathiTrust: A Research Library at Web Scale”, 95.

[14] Ibid, 99.

[15] Kevin O’Brien, “HathiTrust,” Journal of the Medical Library Association.

[16] Ibid.

[17] Gita Gunatilleke, “HathiTrust,” Charleston advisor 13, no. 4 (2012): 43–46,

https://doi-org.mutex.gmu.edu/10.5260/chara.13.4.43.

Leave a comment

Your email address will not be published.

css.php