Package tf

logo

Text-Fabric

A corpus of ancient texts and (linguistic) annotations represents a large body of knowledge. Text-Fabric makes that knowledge accessible to programmers and non-programmers.

Text-Fabric is machinery for processing such corpora as annotated graphs. It treats corpora and annotations as data, much like big tables, but without loosing the rich structure of text, such as embedding and multiple representations. It deals with text in a state where all markup is gone, but where the complete logical structure still sits in the data.

Whether a corpus comes from plain texts, OCR output, databases, XML, TEI: Text-Fabric has support to convert it to single column files, where each file corresponds with a feature of the text.

The Python library tf can be used to collect a bunch of features and display it as an annotated text. What ties the features together are natural numbers, that serve to anchor the elementary positions in the text as well as the relevant structures within the text.

When Text-Fabric loads a dataset of features, you can instruct it to get the features from anywhere. That means it supports workflows where annotations are produced by third parties and can be used against the original corpus, without additional work. It also facilitates mappings between ongoing versions of the corpus, so that annotations made on older versions can be ported to newer versions without redoing the annotation creation.

Straight to …

Author

Author: Dirk Roorda

Cite Text-Fabric as DOI: 10.5281/zenodo.592193.

Acknowledgements

Text-Fabric is a matter of putting a few good ideas by others into practice.

While I wrote most of the code, a product like Text-Fabric is unthinkable without the contributions of avid users that take the trouble to give feedback and file issues, and have the zeal and stamina to hold on when things are frustrating and bugs overwhelming, and give encouragement when they are happy.

In particular thanks to

  • Andrea Scharnhorst
  • Cale Johnson
  • Camil Staps
  • Christian Høygaard-Jensen
  • Christiaan Erwich
  • Cody Kingham
  • Ernst Boogert
  • Eliran Wong
  • Gyusang Jin
  • Henk Harmsen
  • James Cuénod
  • Johan de Joode
  • Kyoungsik Kim
  • Martijn Naaijer
  • Stephen Ku
  • Wido van Peursen

More resources

Tutorials:


Papers:


Presentations:

Hands on with Dead Sea Scrolls, Old Babylonian Tablets, and the Q'uran (Lorentz Leiden 2020)

Text-Fabric in Context (Lorentz Leiden 2020)

Data Analysis in Ancient Corpora (Cambridge 2019, with Cody Kingham)

Text-Fabric as IKEA logistics (Copenhagen 2017)

Here is a motivational presentation, given just before SBL 2016 in the Lutheran Church of San Antonio.

Expand source code Browse git
"""
.. include:: docs/main/top.md
"""

Sub-modules

tf.about

Documents …

tf.advanced

Advanced API …

tf.app

Start the advanced API of TF …

tf.cheatsheet

A. Advanced API …

tf.clean

Clean …

tf.client

Layered Search …

tf.convert

Various forms of data interchange …

tf.core

Core API of TF …

tf.dataset

Dataset operations …

tf.fabric

Fabric …

tf.lib

Uitility functions …

tf.parameters

Parameters …

tf.search

Guidance for searching …

tf.server

Local TF-data and web server

tf.volumes

Volume operations …

tf.writing

Writing systems support …