Module tf.cheatsheet
A.
Advanced API
Initialization, configuration, meta data, and linking
A = use('org/repo')
-
start up and load a corpus from a repository and deliver its API.
-
See
tf.about.usefunc
A.hoist(globals())
-
Make the API handles
F
,E
,T
,L
etc available in the global scope. A.load(features)
-
Load an extra bunch of features.
A.featureTypes(show=True)
-
show for which types each feature is defined
A.showContext(...)
-
show app settings
A.header(allMeta=False)
-
show colophon
A.showProvenance(...)
-
show provenance of code and data
A.webLink(n, ...)
-
hyperlink to node
n
on the web A.flexLink("pages") A.flexLink("tut")
-
hyperlink to app tutorial and documentation
A.isLoaded(features=None)
-
Show information about loaded features
A.footprint()
-
Show memory footprint per feature
Displaying
A.specialCharacters()
-
show all hard-to-type characters in the corpus in a widget
A.showFormats()
-
show all text formats and their definitions
A.dm(markdownString)
-
display markdown string in notebook
A.dh(htmlString)
-
display HTML string in notebook
A.method(option1=value1, option2=value2, ...)
-
Many of the following methods accept these options as keyword arguments:
A.displayShow(...)
-
show display options
A.displayReset(...)
-
reset display options
A.displaySetup(...)
-
set up display options
A.table(results, ...)
-
plain rendering of tuple of tuples of node
A.plainTuple(tup, ...)
-
plain rendering of tuple of node
A.plain(node, ...)
-
plain rendering of node
A.show(results, ...)
-
pretty rendering of tuple of tuples of node
A.prettyTuple(tup, ...)
-
pretty rendering of tuple of node
A.pretty(node, ...)
-
pretty rendering of node
A.unravel(node, ...)
-
convert a graph to a tree
A.getCss()
-
get the complete CSS style sheet for this app
Search (high level)
A.search(...)
-
search, collect and deliver results, report number of results
Sections and Structure
A.nodeFromSectionStr(...)
-
lookup node for section heading
A.sectionStrFromNode(...)
-
lookup section heading for node
A.structureStrFromNode(...)
-
lookup structure heading for node
Volumes and collections
See also tf.about.volumes
.
A.getVolumes()
-
list all volumes of this dataset
A.extract(volumes, ...)
-
export volumes based on a volume specification
A.collect(volumes, ...)
-
collect several volumes into a new collection
Export to Excel
A.export(results, ...)
-
export formatted data
Logging
A.dm(markdownString)
-
display markdown string in notebook
A.dh(htmlString)
-
display HTML string in notebook
A.version
-
version number of data of the corpus.
The following methods work also for TF.
instead of A.
:
A.banner
-
banner of the TF program.
A.isSilent()
-
report the verbosity of TF
A.silentOn(deep=False)
-
make TF (deeply) silent from now on.
A.silentOff()
-
make TF talkative from now on.
A.setSilent(silent)
-
set the verbosity of TF.
A.indent(level=None, reset=False)
-
Sets up indentation and timing of following messages
A.info(msg, tm=True, nl=True, ...)
-
informational message
A.warning(msg, tm=True, nl=True, ...)
-
warning message
A.error(msg, tm=True, nl=True, ...)
-
error message
N. F. E. L. T. S. C.
Core API
N.
Nodes
Read about the canonical ordering here: tf.core.nodes
.
N.walk()
-
generator of all nodes in canonical ordering
N.sortNodes(nodes)
-
sorts
nodes
in the canonical ordering N.otypeRank[nodeType]
-
ranking position of
nodeType
N.sortKey(node)
-
defines the canonical ordering on nodes
N.sortKeyTuple(tup)
-
extends the canonical ordering on nodes to tuples of nodes
N.sortKeyChunk(node)
-
defines the canonical ordering on node chunks
F.
Node features
Fall()
-
all loaded feature names (node features only)
F.fff.v(node)
-
get value of node feature
fff
F.fff.s(value)
-
get nodes where feature
fff
hasvalue
F.fff.freqList(...)
-
frequency list of values of
fff
F.fff.items(...)
-
generator of all entries of
fff
as mapping from nodes to values F.fff.meta
-
meta data of feature
fff
Fs('fff')
-
identical to
F.ffff
, usable if name of feature is variable
Special node feature otype
Maps nodes to their types.
F.otype.v(node)
-
get type of
node
F.otype.s(nodeType)
-
get all nodes of type
nodeType
F.otype.sInterval(nodeType)
-
gives start and ending nodes of
nodeType
F.otype.items(...)
-
generator of all (node, type) pairs.
F.otype.meta
-
meta data of feature
otype
F.otype.maxSlot
-
the last slot node
F.otype.maxNode
-
the last node
F.otype.slotType
-
the slot type
F.otype.all
-
sorted list of all node types
E.
Edge features
Eall()
-
all loaded feature names (edge features only)
E.fff.f(node)
-
get value of feature
fff
for edges from node E.fff.t(node)
-
get value of feature
fff
for edges to node E.fff.freqList(...)
-
frequency list of values of
fff
E.fff.items(...)
-
generator of all entries of
fff
as mapping from edges to values E.fff.b(node)
-
get value of feature
fff
for edges from and to node E.fff.meta
-
all meta data of feature
fff
Es('fff')
-
identical to
E.fff
, usable if name of feature is variable
Special edge feature oslots
Maps nodes to the set of slots they occupy.
E.oslots.items(...)
-
generator of all entries of
oslots
as mapping from nodes to sets of slots E.oslots.s(node)
-
set of slots linked to
node
E.oslots.meta
-
all meta data of feature
oslots
L.
Locality
L.i(node, otype=...)
-
go to intersecting nodes
L.u(node, otype=...)
-
go one level up
L.d(node, otype=...)
-
go one level down
L.p(node, otype=...)
-
go to adjacent previous nodes
L.n(node, otype=...)
-
go to adjacent next nodes
T.
Text
T.text(node, fmt=..., ...)
-
give formatted text associated with node
Sections
Rigid 1 or 2 or 3 sectioning system
T.sectionTuple(node)
-
give tuple of section nodes that contain node
T.sectionFromNode(node)
-
give section heading of node
T.nodeFromSection(section)
-
give node for section heading
Structure
Flexible multilevel sectioning system
T.headingFromNode(node)
-
give structure heading of node
T.nodeFromHeading(heading)
-
give node for structure heading
T.structureInfo()
-
give summary of dataset structure
T.structure(node)
-
give structure of
node
and all in it. T.structurePretty(node)
-
pretty print structure of
node
and all in it. T.top()
-
give all top-level structural nodes in the dataset
T.up(node)
-
gives parent of structural node
T.down(node)
-
gives children of structural node
S.
Search (low level)
Preparation
S.search(query, limit=None)
-
Query the TF dataset with a template
S.study(query, ...)
-
Study the query in order to set up a plan
S.showPlan(details=False)
-
Show the search plan resulting from the last study.
S.relationsLegend()
-
Catalog of all relational devices in search templates
Fetching results
S.count(progress=None, limit=None)
-
Count the results, up to a limit
S.fetch(limit=None, ...)
-
Fetches the results, up to a limit
S.glean(tup)
-
Renders a single result into something human readable.
Implementation
S.tweakPerformance(...)
-
Set certain parameters that influence the performance of search.
C.
Computed data components.
Access to pre-computed data: Computeds
.
All components have just one useful attribute: .data
.
Call()
-
all pre-computed data component names
Cs('ccc')
-
identical to
C.ccc
, usable if name of component is variable C.levels.data
-
various statistics on node types
C.order.data
-
the canonical order of the nodes (
tf.core.nodes
) C.rank.data
-
the rank of the nodes in the canonical order (
tf.core.nodes
) C.levUp.data
-
feeds the
Locality.u()
function C.levDown.data
-
feeds the
Locality.d()
function C.boundary.data
-
feeds the
Locality.p()
andLocality.n()
functions C.characters.data
-
frequency list of characters in a corpus, separately for all the text formats
C.sections.data["sec1"] C.sections.data["sec2"]
-
feeds the section part of
tf.core.text
C.sections.data["seqFromNode"] C.sections.data["nodeFromSeq"]
-
maps tuples of heading nodes to their corresponding tuples of sequence numbers and vice versa. Only if there are 3 section levels.
C.structure.data
-
feeds the structure part of
tf.core.text
TF.
Dataset
Loading
TF = Fabric(locations=dirs, modules=subdirs, volume=None, collection=None, silent="auto")
-
Initialize API on work or single volume or collection of a work from explicit directories. Use
use()
instead wherever you can. See alsotf.about.volumes
. TF.isLoaded(features=None)
-
Show information about loaded features
TF.explore(show=True)
-
Get features by category, loaded or unloaded
TF.loadAll(silent="auto")
-
Load all loadable features.
TF.load(features, add=False)
-
Load a bunch of features from scratch or additionally.
TF.ensureLoaded(features)
-
Make sure that features are loaded.
TF.makeAvailableIn(globals())
-
Make the members of the core API available in the global scope
TF.ignored
-
Which features have been overridden.
TF.footprint()
-
Show memory footprint per feature
Volumes
See also tf.about.volumes
.
TF.getVolumes()
-
list all volumes of this dataset
TF.extract(volumes, ...)
-
export volumes based on a volume specification
TF.collect(volumes, ...)
-
collect several volumes into a new collection
Saving and Publishing
TF.save(nodeFeatures={}, edgeFeatures={}, metaData={},,...)
-
Save a bunch of newly generated features to disk.
A.publishRelease(increase, message=None, description=None,,...)
-
Commit the dataset repo, tag it, release it, and attach the complete zipped data to it.
Export to ZIP
cd ~/backend/org/repo tf-zipall
-
store the complete corpus data in a file complete.zip
A.zipAll()
-
store the complete corpus data in a file complete.zip
from tf.app import collect collect(backend, org, repo)
-
same as
A.zipAll()
above, assuming the data is in a Github clone
House keeping
TF.version
-
version number of TF.
TF.clearCache()
-
clears the cache of compiled TF data
from tf.clean import clean
clean()
-
clears the cache of compiled TF data
Volume support
TF datasets per volume or collection of a work.
See also tf.about.volumes
.
from tf.volumes import getVolumes getVolumes(volumeDir)
-
List volumes in a directory.
from tf.volumes import extract extract(work, volumes, ...)
-
Extracts volumes from a work
from tf.volumes import collect collect(volumes, work, ...)
-
Collects several volumes into a new collection
Dataset Operations
from tf.dataset import modify modify(source, target, ...)
-
Modifies a TF dataset into a new TF dataset
from tf.dataset import Versions Versions(api, va, vb, slotMap)
-
Extends a slot mapping between versions of a TF dataset to a complete node mapping
Data Interchange
Custom node sets for search
from tf.lib import readSets
from tf.lib import writeSets
readSets(sourceFile)
-
reads a named sets from file
writeSets(sets, destFile)
-
writes a named sets to file
Export to Excel
A.export(results, ...)
-
export formatted data
Interchange with external annotation tools
from tf.convert.addnlp import NLPipeline
NLPipeline()
-
generate plain text, feed into NLP, ingest results
from convert.recorder import Recorder
Recorder()
-
generate annotatable plain text and import annotations
XML / TEI import
from tf.convert.xml import XML
X = XML(...)
-
convert XML source to full-fledged TF dataset plus app but no docs; put in your own conversion code, if you wish; see Greek New Testament
from tf.convert.tei import TEI
T = TEI(...)
-
convert TEI source to full-fledged TF dataset plus app plus docs
WATM export
from tf.app import use
from tf.convert.watm import WATM
A = use(...) WA = WATM(A, ns, ...) WA.makeText() WA.makeAnno() WA.writeAll() WA.testAll()
-
convert TF dataset to text tokens and annotations in JSON format, for consumption by TextRepo/AnnoRepo of KNAW/HuC Digital Infrastructure. See Mondriaan Proeftuin Suriano Letters TransLatin Corpus
from tf.convert.watm import WATMS
W = WATM(org, repo, backend, ns, ...) W.produce()
-
convert series of TF datasets to WATM
NLP import
in order to use this, install Spacy, see tf.tools.myspacy
from tf.convert.addnlp import addTokensAndSentences
newVersion = addTokensAndSenteces(A)
-
add NLP output from Spacy to an existing TF dataset. See the docs how this is broken down in separate steps.
pandas
export
A.exportPandas()
-
export dataset as
pandas
data frame
MQL interchange
TF.exportMQL(mqlDb, exportDir=None) A.exportMQL(mqlDb, exportDir=None)
-
export loaded dataset to MQL
from tf.convert.mql import importMQL TF = importMQL(mqlFile, saveDir)
-
convert MQL file to TF dataset
Walker conversion
from tf.convert.walker import CV
cv = CV(TF)
-
convert structured data to TF dataset
Exploding
from tf.convert.tf import explode
explode(inLocation, outLocation)
-
explode TF feature files to straight data files without optimizations
TF App development
A.reuse()
-
reload configuration data
from tf.advanced.find import loadModule
mmm = loadModule("mmm", *args)
-
load specific module supporting the corpus app
~/mypath/myname/app/config.yaml
-
settings for a TF App
Layered search
(these work on the command-line if TF is installed)
tf-make {dataset} {client} ship
-
generate a static site with a search interface in client-side JavaScript and publish it to GitHub pages. If
{client}
is left out, generate all clients that are defined for this dataset. Clients are defined in theapp-{dataset}
repo, underlayeredsearch
. More commands here. tf-make {dataset} serve
-
serve the search interfaces defined for
{dataset}
locally.
More commands here.
Annotation tools
(these work in the TF browser and in Jupyter Notebooks)
Named Entity Annotation
tf {org}/{repo} --tool=ner
-
Starts the TF browser for the corpus in org/repo and opens the manual annotation tool.
NE = A.makeNer()
-
Sets up the 'manual' annotation API for the corpus in
A
. -
More info and examples in
tf.about.annotate
.
Command-line tools
(these work on the command-line if TF is installed)
tf {org}/{repo} tf {org}/{repo}
-
Starts the TF browser for the corpus in org/repo.
tf-zipall
-
Zips the TF dataset located by the current directory, with all its additional data modules, but only the latest version, so that it can be attached to a release on GitHub / GitLab.
tf-zip {org}/{repo}
-
Zips the TF dataset in org/repo so that it can be attached to a release on GitHub / GitLab.
tf-nbconvert {inDirectory} {outDirectory}
-
Converts notebooks in
inDirectory
to HTML and stores them inoutDirectory
. tf-xmlschema analysis {schema}.xsd
-
Analyses an XML schema file and extracts meaningful information for processing the XML that adheres to that schema.
tf-fromxml
-
When run in a repo it finds an XML source and converts it to TF. The resulting TF data is delivered in the repo. There is a hook to put your own conversion code in.
tf-fromtei
-
When run in a repo it finds a TEI source and converts it to TF. The resulting TF data is delivered in the repo.
tf-addnlp
-
When run in the repo of a TF dataset, it adds NLP output to it after running Spacy to get them.
Expand source code Browse git
"""
.. include:: docs/main/cheatsheet.md
"""