Module tf.about.code

Code organization

The code base of TF can be divided into a few major parts, each with their own, identifiable task.

Some parts of the code are covered by unit tests (tf.about.tests).

There is also a count of code lines per module and language

Base

(tf.core) The core API is responsible for:

Feature management
TF data consists of feature files. TF must be able to load them, save them, import / export from MQL.
Provide an API
TF must offer an API for handling its data in applications. That means: feature lookup, containment lookup, text serialization.
Pre-computation
In order to make its API work efficiently, TF has to pre-compute certain compiled forms of the data.

(tf.search.search) TF contains a search engine based on templates, which are little graphs of nodes and edges that must be instantiated against the corpus.

Search versus MQL
The template language is inspired by MQL, but has a different syntax. It is both weaker and stronger than MQL.
Search versus hand coding

Search templates are the most accessible way to get at the data, easier than hand-coding your own little programs.

The underlying engine is quite complicated. Sometimes it is faster than hand coding, sometimes (much) slower.

Advanced

(tf.advanced) TF contains an advanced API geared to auto-downloading corpus data and displaying corpus materials in useful ways.

Dynamic Web interface

(tf.browser) TF contains a browser interface for interacting with your corpus without programming.

The web interface lets you fire queries (search templates) to TF and interact with the results:

  • expanding rows to pretty displays;
  • condensing results to various container types;
  • exporting results as PDF and CSV.

This interface be served by a local web server provided with data from a TF app. (tf.browser.start, tf.browser.kernel and tf.browser.web).

Static Web interface

(tf.client) There is also a static browser interface: you can build a set of static HTML pages with Javascript files out of a corpus, which has a search interface of a different kind than tf.search.

Volumes and collections

(tf.volumes) Machinery to support the idea that a TF dataset is a work that consists of volumes. Volumes and collections of volumes can be loaded without loading the whole work while still maintaining a profound connection with the whole work through additional features such as owork. See also tf.about.volumes.

Dataset

(tf.dataset) Machinery to manipulate datasets as a whole.

Convert

(tf.convert) There is support for conversion to and from MQL and for converting arbitrary structured data (such as database dumps or TEI files) to TF (tf.convert.mql, tf.convert.walker).

There is also some support for round-trips of TF data into other annotation tools and back (tf.convert.recorder).

Writing

(tf.writing) TF supports several writing systems by means of transliterations and conversions between them and UNICODE.

NER

(tf.ner and tf.browser.ner) This is machinery and an interface to do Named Entity Recognition, based on patterns you specify. These patterns can be supplied ad hoc in a web interface, or systematically in a spreadsheet.

Expand source code Browse git
"""
.. include:: ../docs/about/code.md
"""