Changes in this major version¶
Consult the tutorials after changes
When we change the API, we make sure that the tutorials shows off all possibilities.
See the app-specific tutorials in annotation.
When TF apps have been updated, they will be autoloaded to the newest version provided you call the app as follows:
use('appName', ... )
This will get you the newest stable version. To get the newest unstable version:
What's going on
See the issue list on GitHub.
Queued for next release:
Support for workflows where TF data is exported to be annotated by other tools whose results are to be imported as TF features. The first step has been set: the Recorder.
Fix a bug spotted by Robert Voogdgeert: in search templates with qunatifiers: if the line before the quantifier is not an atom line but a feature line, TF crashes. Not anymore. The fix is at the syntactical level of queries. I have tested most known queries and they gave identical results as before.
Following a suggestion by Camil Staps:
In search templates, the comment sign
% does not have to be at the start of a line, it may also be indented by white space. Still, you cannot use % to comment out trailing parts of lines after non-blank parts.
When TF wants to fetch data from GitHub, but cannot get connection, it will give some sort of message as to why.
Something new: Recorder, a device to export plain text from TF in such a way that the position of nodes in that text is stored. Then you can annotate the plain text in some tool, e.g. BRAT, and after that, the Recorder can turn those annotations into TF features.
It is not documented yet, but this notebook shows you a complete examnple.
Added fonts for the upcoming NENA corpus with TF app by Cody Kingham.
Updated docs for app writers.
All queries go a tad faster. Additional small fixes.
Performance tweaks in querying. Especially long running queries perform better. The query planning can now handle multiple relationships of the kind
a < b and
b < c.
a was searched, including the ones after
c, and they then failed. Now the ones after
c are not tried anymore.
Yet the gain is not as high as I had hoped, because finding the right
b turns out to be tricky. The machinery for getting that in place and then walking in the right direction worked, but was so costly itself, that it defeated the purpose of a performance gain.
Have a look at some profiling results.
The performance of the new feature comparison relations turned out to be bad. They have been greatly improved. Now they are workable. But it is possible that new cases will turn up with a bad performance.
Main thing in this release: new relations in queries, based on feature comparison, as asked for by Oliver Glanz. For more info: see #50
1 2 3
phrase word .nu. word
which gives the pairs of words in phrases that agree in
nu (= grammatical number), provided both words are marked for number.
1 2 3
phrase word .nu#nu. word
which gives the pairs of words in phrases that disagree in
nu, provided both words are marked for number.
1 2 3
phrase word .nu=prs_nu. word
which gives the pairs of words in phrases of which the number of the first word agrees with the number of the pronominal suffix of the second word, provided feature
nu is present on the first word and feature
prs_nu is present on the second word.
These are only examples, the new relations work for any combination of node features.
You can also test on
< if the node features are integer valued.
And for string valued features, you can also reduce the values before comparing by means of a regular expression, which specifies the parts of the value that will be stripped.
See also the docs, jump to Based on node features.
The working of
silent=True has been fine-tuned (i.e. it is easier to silence TF in more cases.) There is also a
silent parameter for the walker conversion.
info() function always checks whether it should be silent or not. There is a new
warning() function that is silent if
silent='deep'. So you can use
warning() to issue messages that you do not want to be silenced by
The biggest addition is a new
tf.compose package with operators to manipulate TF data sets:
TF.loadAll()function to load all features in one go.
- New method
items()for all features, which yields all pairs in the mapping of the feature one by one. See [../Api/Features.md#generics-for-features].
- tweaks in edge spinning (part of the search engine), but no real performance improvements
- nothing in TF relies on Python's
globmodule anymore, which turned out to miss file names with characters such as
[ ]in it.
Fixed a bug in fabric.py spotted by Ernst Boogert, where there was a confusion between
If a TF-app needs to import its own modules, there is the risk of conflicts when several TF-apps get loaded in the same program and they import modules with the same name. TF offers a function
loadModule() by which an app can dynamically load a module, and this function makes sure that the imported module gets an app-dependent internal name.
Some queries perform much better now. Especially the ones with
== (same slots),
&& (overlapping slots), and
:: (same boundaries).
The performance of the machinery has been tuned with new parameters, and all BHSA queries in the tutorials have been tested.
There was a pair of queries in searchGaps that either took 9 seconds or 40, randomly. Now it is consistently 9 seconds.
See searchRough at the end where the performance parameters are tweaked.
cv.activeTypes() in the walker conversion (requested by Ernst Boogert).
Another 20% of the original memory footprint has been shaved off. Method: using arrays instead of tuples for sequences of integers.
Optimization: the memory footprint of the features has been reduced by ca 30%. Method: reusing readonly objects with the same value.
The BHSA now needs 2.2 GB of RAM, instead of the 3.4 before.
Bug fixes: * silent means silent again in
A.use() * the walk converter will not stop if there is no structure configured
Added more checks for the new structure API when using the walk converter. Made the pre-computing for structure more robust.
T API has been extended with structure types. Structure types is a flexible sectioning system with unlimited levels. It can be configured next to the more rigid sections that
T already supported.
The rigid system is meant to be used by the TF browser for chunking up the material in decent portions.
The new, flexible system is meant to reflect the structure of the corpus, and will give you means to navigate the copus accordingly.
- You can ask the meta data of any feature by
TF.features['featureName'].metaData. That is not new. You can get it also by
F.featureName.meta, for node features and
E.featureName.metafor edge features. Both only work for loaded features. This is a bit more crisp. Thanks to Ernst Boogert for bringing this up.
- In the TF browser, in the control where you select a book/document/scroll: the chosen item disappeared from the view if you narrowed down the list by typing a capital letter. Fixed.
Big improvement on
T.text(). It now accepts one or more nodes of arbitrary types and produces text for them all.
Largely backward compatible, in that:
- it takes the same arguments
- when it produced sensisble results, it will produce the same results
- when it produced nothing, it will now produce sensible things, in many cases.
You have to use the
descend parameter a lot less.
See the docs
There is an extra
cv.occurs() function to check whether a feature actually occurs in the result data.
cv.meta(feature) without more arguments deletes the feature from the metadata,
Added the option
force=True to the
cv.walk() function, to continue conversion after errors.
Added punctation geresh and gershayim to the Hebrew mapping from unicode to ETCBC transcription. The ETCBC transcription only mapped the accents but not the punctuation characters of these.
Fixed a bug in
cv.meta() in the conversion walker.
The walker conversion module has an extra check: if you assign features to None, it will be reported.
There is an extra
cv.meta() function to accomodate a use case brought in by Ernst Boogert.
Small addition to search templates. You could already use edges in search by means of the relational operator
that look for
m such that there is an
edgeFeature edge from
m, and likewise
for edges in the opposite direction.
Now you can also use
that look for
m such that there is an
edgeFeature edge from
m, or from
n, or both.
See the docs
This corresponds to
See also the Banks example.
Small but important fix in the display logic of the
pretty() function. The bug is not in the particular TF-apps that partly implementt
pretty(), but in the generic
tf.applib.display library that implements the other part.
Thanks to Gyusang Jin, Christiaan Erwich and Cody Kingham for spottting it.
I wrote an account of the bug and its fixing in this notebook.
Small fix in reporting of the location of data being used.
Simplified sharing: pushing to GitHub is enough. It is still recommended to make a release on GitHub now and them, but it is not necessary.
use() function and the calling of the TF browser undergo an API change.
When calling up data and a TF-app, you can go back in history: to previous releases and previous commits, using a
You can specify the checkout parameter separately for
- the TF-app code (so you can go back to previous instantiations of the TF-app)
- the main data of the app plus its standard data modules
- every data-module that you include by means of the
The values of the checkout parameters tell you to use data that is:
clone: locally present under
~/githubin the appropriate place
local: locally present under
~/text-fabric-datain the appropriate place
latest: from the latest online release
hot: from the latest online commit
'': (default): from the latest online release, or if there are no releases, from the latest online commit
2387abc78f9de...: a concrete commit hash found on GitHub (under Commits)
v1.3: a release tag found on GitHub (under Releases)
Or consult the repo notebook.
API deletion (backwards incompatible):¶
-c when calling the TF browser have been removed.
These parameters were all-or-nothing, they were applied TF app code, main data, and included data modules.
In most cases, just do not use the checkout parameters at all. Then the TF-app will be kept updated, and you keep using the newest data.
If you want to producing fixed output, not influenced by future changes, run TF once with a particular version or commit, and after that supply the value
local as long as you wish.
If you are developing data yourself, place the data in your repository under
~/github, and use the value
clone for checkout.
If you create your own features and want to share them, it is no longer needed to zip the data and attach it to a newly created release on GitHub. Just pushing your repo to GitHub is sufficient.
Still it is a good practice to make a release every now and then.
Even then, you do not need to attach your data as a binary. But, if you have much data or many files, doing so makes the downloading more efficient for the users.
There is a new utility function
checkoutRepo(), by which you can maintain a local copy of any subdirectory of any repo on Github.
This is yet another step in making your scholarly work reproducible.
Fix in query parsing¶
sentence <: sentence
caused TF to complain erroneously about disconnected components. You had to say
1 2 3
s1:sentence s2:sentence s1 <: s2
instead. That workaround is not needed anymore.
Thanks to Oliver Glanz for mentioning this behaviour.
The TF browser now displays the total number of results clearly.
Small fix in Excel export when called by the TF kernel.
Small fix: a TF app that did not define its own text-formats caused an error. Now the generic TF applib is robust against this.
E.feature.b() so that it gives precedence to outgoing edges.
Further tweaks in layout of
API addition for
E.feature.b() gives the symmetrical closure of the edges under
feature. That means it combines the results of
E.feature.t(). In plain speech:
E.feature.b(m) collects the nodes that have an incoming edge from
m and the nodes that have an outgoing edge to
TF.save()an now write to any absolute location by means of the optional parameter
- The markdown display in online notebooks showed many spurious
</span>. This is a bug in the Markdown renderer used by Github and NBViewer. It happens if table cells have doubly nested
<span>elements. It did not show up in local notebooks. In order to avoid it, TF does no longer work with the Markdown renderer. Instead, it produces output in HTML and uses the HTML renderer in notebooks. That fixes the issue.
- When using
A.export()to export data to Excel-friendly CSV files, some node types will get their text exported, and some just a label. It depended on whether the node type was a section or not. Now it depends on whether the node type is small or big. We export text for small node types. A node type is small if it is not bigger than the condense type. This behaviour is now the same as for pretty displays.
- Changes in font handling
- New flag in
full=False. See the docs
- When looking for data in
lgc=Truemode, TF will report clearly when data cannot be found in local github clones. In such cases TF will look for an online release of the repo with the desired data attached. Before it was not clear enough that TF was looking online, despite the
lgcflag, because of missing data. So if you misspelled a module path, you got messages that did not point you to the root cause.
- Some fixes in the plain display having to do with the passage label.
When converting a new corpus, Old Babylonian Letters (cuneiform), I tuned the conversion module a bit. Several improvements in the conversion program. Better warnings for potential problems. Several other small changes have been applied here and there.
When querying integer valued features with inequality conditions, such as
an unpleasant error was raised if not all words have a level, or if some words have level
That has been fixed now.
Missing values and
None values always cause the
< comparisons to be
Bug fix in data pre-computation. The bug was introduced in version 7.4.2.
If you have been running that version or a newer one, you might need to recompute your features. Here is how.
Manually: delete the
.tf directory in
~/github/.../.../tf/version or in
This directory is hidden on the Mac and Linux and you can make it visible by pressing
Cmd+Shift+. on the Mac, or you can navigate to this directory in a terminal and do
ls -al (Mac and Linux).
The other method can be used in a Jupyter notebook:
1 2 3
from tf.app import Fabric A = use(...) TF.clearCache
After this, restart the notebook, and run it again, except the
If you are still pre 7.4.2, you're out of trouble. You can upgrade to 7.4.5
Added checks to the converter for section structure.
A much simpler implementation of conversions from source data to Text-Fabric. Especially the code that the conversion writer has to produce is simplified.
Small fixes in the token converter.
Easier conversion of data sources to TF: via an intermediate token stream. For more info: see #45
Make sure it works.
Feature display within pretty displays: a newline in a feature value will cause a line break in the display, by means of a
Small fix in oslots validation. You can save a data set without the oslots feature (a module). The previous release wrongly flagged a oslots validation error because of a missing oslots feature.
That has been remedied.
If the oslots feature is not valid, weird error messages used to occur when TF tried to load a dataset containing it. The oslots feature was loaded, but the computing of derived data threw a deep error.
When TF saves the oslots feature it checks whether it is valid: It should map all non-slot nodes and only non-slot nodes to slots.
So, right after you have converted a data source to TF you can check whether the oslots is valid, during
And further down the line, if you somehow have let a faulty oslots pass, and try to load a dataset containing such a oslots feature, TF checks whether the range of nodes mapped by oslots does not have holes in it.
If so, it generates a clear error and stops processing.
Moved the app tutorials from the annotation/app-appName repos into a new annotation/tutorials repo.
The reason: the app-appName are used for downloading the app code. It should not be burdened with extra material, which is also often updated, giving rise to many spurious redownloads of the app code.
Additionally, for education purposes it is handy to have the tutorials for all apps inside one repo. For example, to use in a Microsoft Azure environment.
Better browsing for corpora with very many top level sections, such as Uruk.
For more info: see #38
Small fixes in the core: the Text API can now work with corpora with only two levels of sections, such as the Quran.
Arabic transcription functions
TF-browser: Fixed a performance bottleneck in showing passages. The computation of the highlights took too much time if the query in question has many results.
plain() representation NBconvert has a glitch. We can prevent that by directly outputting the plain representation as HTML, instead of going through Markdown. Fixed that.
The TF browser could not fiund its templates, because I had forgotten to include the template files in the Python package. (More precisely, I had renamed the templates folder from
views, which was included, to
templates, and I had forgotten to adapt the
Glitch in the Uruk app: it imports other modules, but because of the dynamic way it is imported itself, a trick is needed to let it import its submodules correctly.
- Text-Fabric has moved house from
- The TF-apps have been moved to separate repos with name
app-xxxx within annotation
- The tutorials have been moved from the repos that store the corpus data to the
The TF-browser exports an Excel export of results. Now you can also export to Excel from a notebook, using
Jump to the tutorial: exportExcel
For more info: see #38
Web framework: Bottle => Flask
Plain display in Uruk
The plain display of lines and cases now outputs their ATF source, instead of merely
line 1 or
??? abstract "Further code reorganization Most Python files are now less than 200 lines, although there is still a code file of more than 1000 lines.
- Fix broken links to the documentation of the TF API members, after the incantation.
- Fix in the Uruk lineart option: it could not be un-checked.
- The TF kernel/server/website is also fit to be served over the internet
- There is query result highlighting in passage view (like in SHEBANQ)
- Various tweaks
TF app API
prettySetup()has been replaced with
displayReset(), by which you can configure a whole bunch of display parameters selectively. Display
- All display functions (
pretty plain prettyTuple plainTuple show table) accept a new optional parameter
withPassagewhich will add a section label to the display. This parameter can be regulated in
A.search()accepts a new optional parameter:
sort=...by which you can ask for canonically sorted results (
True), custom sorted results (pass your onw key function), or unsorted results (
- New functions
A.sectionStrFromNode()which give the passage string of any kind of node, if possible. Section support for apps
- The function
A.plain()now responds to the
highlightsparameter: you can highlight material inside plain displays. A.plain and display tutorial
- New function
T.sectionTuple(n)which gives the tuple of section nodes in which
nis embedded T.sectionTuple
- Modified function
T.sectionFromNode(n, fillup=False)It used to give a tuple (section1, section2, section3), also for nodes of type section1 and section2 (like book and chapter). The new behaviour is the same if
fillup=True. But if
fillup=False(default), it returns a 1-tuple for section1 nodes and a 2-tuple for section2 nodes. T.sectionFromNode
- New API member
sortKeyTupleto sort tuples of nodes in the canonical ordering. sortKeyTuple
- The code to detect the file name and path of the script/notebook you are running in, is inherently brittle. It is unwise to base decisions on that. This code has been removed from TF. So TF no longer knows whether you are in a notebook or not. And it will no longer produce links to the online notebook on GitHub or NBViewer.
- Various other fixes
The entry points and paths from superficial to in-depth information have been adapted. Writing docs is an uphill battle.
Under the hood
As TF keeps growing, the need arises over and over again to reorganize the code, weed out duplicate pieces of near identical functionality, and abstract from concrete details to generic patterns. This release has seen a lot of that.
- Queries in the TF browser are limited to three minutes, after that a graceful error message is shown.
- Other small fixes.
- You can use custom sets in queries in the TF browser
- Reorganized the docs of the individual apps, took the common parts together
- New functions
- In the BHSA, feature values on the atom-types and subphrases are now shown too, and that includes extra features from foreign data sets
- The feature listing after the incantation in a notebook now lists the loaded modules in a meaningful order.
- Small fixes in
- Internal reorgnization of the code
- Documentation updates (but internal docs are still lagging behind)
- Fixed messages and logic in finding data and checking for updates (thanks to feedback of Christian Høygaard-Jensen)
- Fixed issue #30
- Improved the doc links under features after the incantation.
- Typos in the documentation
Just before SBL Denver, two years after SBL San Antonio, where I started writing Text-Fabric, here is major version 7.
Here is what is new:
- you can call in "foreign data": tf feature files made by yourself and other researchers;
- the foreign data shows up in the text-fabric browser;
- all features that are used in a query, show up in the pretty displays in the TF browser, also the foreign features;
- there is a command to prepare your own data for distribution via GitHub;
- the incantantion is simpler, but ut has changed in a backwards-incompatible way;
- after the incantation, for each feature it is shown where it comes from.
Under the hood:
- apps (bhsa, peshitta, syrnt, uruk) have been refactored thoroughly;
- a lot of repeated code inside apps has been factored out
- it is easier to turn corpora into new text-fabric apps.
Quick start: the new share
See the advanced guide for concrete and detailed hints how to make most of this version.