Most LDC corpora are governed by the LDC User Agreement for Non-Members; some corpora require the execution of specific user licenses.
1 to 8 of 8 Results
Jun 15, 2023
Graff, David; Cieri, Christopher; Strassel, Stephanie, 2023, "TDT3 Multilanguage Text Version 2.0",, Borealis, V1
Introduction Topic Detection and Tracking (TDT) refers to automatic techniques for finding topically related material in streams of data such as newswire and broadcast news. The TDT3 corpus was created to support three TDT3 tasks: to find topically homogeneous sections (segmentat...
May 18, 2022
Weischedel, Ralph; Palmer, Martha; Marcus, Mitchell; Eduard, Hovy; Pradhan, Sameer; Ramshaw, Lance; Xue, Nianwen; Taylor, Ann; Kaufman, Jeff; Franchini, Michelle; El-Bachouti, Mohammed; Belvin, Robert; Houston, Ann, 2022, "OntoNotes Release 5.0",, Borealis, V1
Introduction OntoNotes Release 5.0 is the final release of the OntoNotes project, a collaborative effort between BBN Technologies, the University of Colorado, the University of Pennsylvania and the University of Southern Californias Information Sciences Institute. The goal of the...
May 18, 2022
Chinchor, Nancy, 2022, "Message Understanding Conference (MUC) 7",, Borealis, V1
Introduction Message Understanding Conference (MUC) 7 was produced by Linguistic Data Consortium (LDC) catalog number LDC2001T02 and ISBN 1-58563-205-8. In the 1990s, the MUC evaluations funded the development of metrics and statistical algorithms to support government evaluation...
May 18, 2022
Carlson, Lynn; Marcu, Daniel; Okurowski, Mary Ellen, 2022, "RST Discourse Treebank",, Borealis, V1
Introduction Rhetorical Structure Theory (RST) Discourse Treebank was developed by researchers at the Information Sciences Institute (University of Southern California), the US Department of Defense and the Linguistic Data Consortium (LDC). It consists of 385 Wall Street Journal...
May 18, 2022
Ng, Hwee Tou; Lee, Hian Beng, 2022, "DSO Corpus of Sense-Tagged English",, Borealis, V1
Introduction This corpus contains sense-tagged word occurrences for 121 nouns and 70 verbs which are among the most frequently occurring and ambiguous words in English. These occurrences are provided in about 192,800 sentences taken from the Brown corpus and the Wall Street Journ...
May 18, 2022
Lander, T, 2022, "CSLU: 22 Languages Corpus",, Borealis, V1
Introduction This file contains documentation on the CSLU: 22 Languages v 1.2, Linguistic Data Consortium (LDC) catalog number LDC2005S26 and ISBN 1-58563-361-5. Produced by Center for Spoken Language Understanding and distributed by the Linguistic Data Consortium, the 22 Languag...
May 18, 2022
Lander, T, 2022, "CSLU: Foreign Accented English Release 1.2",, Borealis, V1
Introduction This file contains documentation on CSLU: Foreign Accented English Release 1.2, Linguistic Data Consortium (LDC) catalog number LDC2006S38 and isbn 1-58563-392-5. CSLU: Foreign Accented English Release 1.2 consists of continuous speech in English by native speakers o...
May 18, 2022
Baayen, R H.; Piepenbrock, R; Gulikers, L, 2022, "CELEX2",, Borealis, V1
Introduction This corpus contains ASCII versions of the CELEX lexical databases of English (Version 2.5), Dutch (Version 3.1) and German (Version 2.0). CELEX was developed as a joint enterprise of the University of Nijmegen, the Institute for Dutch Lexicology in Leiden, the Max P...
