Skip to content

E/F - (Edge-)Features

Features

Features

TF can give you information of all features it has encountered.

TF.featureSets
1
TF.featureSets
Description

Returns a dictionary with keys nodes, edges, configs, computeds.

Under each key there is the set of feature names in that category.

So you can easily test whether a node feature or edge feature is present in the dataset you are working with.

configs

These are config features, with metadata only, no data. E.g. otext.

computeds

These are blocks of precomputed data, available under the C. API, see below.

May be unloaded

The sets do not indicate whether a feature is loaded or not. There are other functions that give you the loaded node features (Fall()) and the loaded edge features (Eall()).

TF.features

A dictionary of all features that TF has found, whether loaded or not. Under each feature name is all info about that feature.

Do not print!

If a feature is loaded, its data is also in the feature info. This can be an enormous amount of information, and you can easily overwhelm your notebook if you print it.

The best use of this is to get the metadata of features:

1
TF.features['otype'].metaData

This works for all features that have been found, not just otype, whether the feature is loaded or not.

Use F

If the feature is loaded, use

1
F.otype.meta

or for any other loaded feature than otype.

Generics for features

Features are mappings

Every feature is logically a mapping from nodes to values.

A feature object gives you methods that you can pass a node and that returns its value for that node.

It is easiest to think of all features as a dictionary keyed by nodes.

However, some features have an optimized representation, and do not have a dictionary underneath.

But you can still iterate over the data of a feature as if it were a dictionary.

F.feature.items()
1
2
F.part_of_speech.items()
E.similarity.items()

A generator that yields the items of the feature, seen as a mapping. It does not yield entries for nodes without values, so this gives you a rather efficient way to iterate over just the feature data, instead of over all nodes.

If you need this repeatedly, or you need the whole dictionary, you can store the result as follows:

1
data = dict(F.part_of_speech.items())

Node features

F

The node features API is exposed as F (Fs) or Feature (FeatureString).

Fall() aka AllFeatures()
1
2
Fall()
AllFeatures()
Description

Returns a sorted list of all usable, loaded node feature names.

F.feature aka Feature.feature
1
2
F.part_of_speech
Feature.part_of_speech
Description

Returns a sub-api for retrieving data that is stored in node features. In this example, we assume there is a feature called part_of_speech.

Tricky feature names

If the feature name is not a valid python identifier, you can not use this function, you should use Fs instead.

Fs(feature) aka FeatureString(feature)
1
2
3
4
Fs(feature)
FeatureString(feature)
Fs('part-of-speech')
FeatureString('part-of-speech')
Description

Returns a sub-api for retrieving data that is stored in node features.

feature

In this example, in line 1 and 2, the feature name is contained in the variable feature.

In lines 3 and 4, we assume there is a feature called part-of-speech. Note that this is not a valid name in Python, yet we can work with features with such names.

Both methods have identical results

Suppose we have just issued feature = 'pos'. Then the result ofFs(feature)andF.pos` is identical.

In most cases F works just fine, but Fs is needed in two cases:

  • if we need to work with a feature whose name is not a valid Python name;
  • if we determine the feature we work with dynamically, at run time.
Simple forms

In the sequel we'll give examples based on the simple form only.

F.feature.meta
1
F.part_of_speech.meta

The dictionary of meta data found at the start of the part_of_speech.tf file.

F.feature.v(node)
1
F.part_of_speech.v(node)
Description

Get the value of a feature, such as part_of_speech for a node.

node

The node whose value for the feature is being retrieved.

F.feature.s(value)
1
2
F.part_of_speech.s(value)
F.part_of_speech.s('noun')
Description

Returns a generator of all nodes in the canonical order with a given value for a given feature. This is an other way to walk through nodes than using N().

value

The test value: all nodes with this value are yielded, the others pass through.

nouns

The second line gives you all nodes which are nouns according to the corpus.

F.feature.freqList()
1
F.part_of_speech.freqList(nodeTypes=None)
Description

Inspect the values of feature (in this example: part_of_speech) and see how often they occur. The result is a list of pairs (value, frequency), ordered by frequency, highest frequencies first.

nodeTypes

If you pass a set of nodeTypes, only the values for nodes within those types will be counted.

F.otype

otype is a special node feature and has additional capabilities.

Description
  • F.otype.slotType is the node type that can fill the slots (usually: word)
  • F.otype.maxSlot is the largest slot number
  • F.otype.maxNode is the largest node number
  • F.otype.all is a list of all otypes from big to small (from books through clauses to words)
  • F.otype.sInterval(otype) is like F.otype.s(otype), but instead of returning you a range to iterate over, it will give you the starting and ending nodes of otype. This makes use of the fact that the data is so organized that all node types have single ranges of nodes as members.

Edge features

E

The edge features API is exposed as E (Es) or Edge (EdgeString).

Eall() aka AllEdges()
1
2
Eall()
AllEdges()
Description

Returns a sorted list of all usable, loaded edge feature names.

E.feature aka Edge.feature
1
2
E.head
Feature.head
Description

Returns a sub-api for retrieving data that is stored in edge features. In this example, we assume there is a feature called head.

Tricky feature names

If the feature name is not a valid python identifier, you can not use this function, you should use Es instead.

Es(feature) aka EdgeString(feature)
1
2
3
4
Es(feature)
EdgeString(feature)
Es('head')
EdgeString('head')
Description

Returns a sub-api for retrieving data that is stored in edge features.

feature

In this example, in line 1 and 2, the feature name is contained in the variable feature.

In lines 3 and 4, we assume there is a feature called head.

Both methods have identical results

Suppose we have just issued feature = 'head'. Then the result ofEs(feature)andE.pos` is identical.

In most cases E works just fine, but Es is needed in two cases:

  • if we need to work with a feature whose name is not a valid Python name;
  • if we determine the feature we work with dynamically, at run time.
Simple forms

In the sequel we'll give examples based on the simple form only.

E.feature.meta
1
E.head.meta

The dictionary of meta data found at the start of the head.tf file.

E.feature.f(node)
1
E.head.f(node)
Description

Get the nodes reached by feature-edges from a certain node. These edges must be specified in feature, in this case head. The result is an ordered tuple (again, in the canonical order. The members of the result are just nodes, if head describes edges without values. Otherwise the members are pairs (tuples) of a node and a value.

If there are no edges from the node, the empty tuple is returned, rather than None.

node

The node from which the edges in question start.

E.feature.t(node)
1
E.head.t(node)
Description

Get the nodes reached by feature-edges to a certain node. These edges must be specified in feature, in this case head. The result is an ordered tuple (again, in the canonical order. The members of the result are just nodes, if feature describes edges without values. Otherwise the members are pairs (tuples) of a node and a value.

If there are no edges to n, the empty tuple is returned, rather than None.

node

The node to which the edges in question go.

E.feature.b(node)
1
E.head.b(node)
Description

Get the nodes from and to a certain node by a feature-edge. These edges must be specified in feature, in this case head. The result is an ordered tuple (again, in the canonical order. The members of the result are just nodes, if head describes edges without values. Otherwise the members are pairs (tuples) of a node and a value.

If there are no edges from or to the node, the empty tuple is returned, rather than None.

node

The node from which the edges in question start or to which they go. Think of both, hence the b.

symmetric closure

The .b() methods gives the symmetric closure of a set of edges: if there is an edge between n and m, this method will produce it, no matter the direction of the edge.

Some edge sets are semantically symmetric, for example similarity. If n is similar to m, then m is similar to n.

But if you store such an edge feature completely, half of the data is redundant. So you do not have to do that, you only need to store one of the edges between n and m (it does not matter which one), and E.sim.b() will nevertheless produce the complete results.

conflicting values

If your set of edges is not symmetric, and edges carry values, it might very well be the case that edges between the same pair of nodes carry different values for the two directions.

In that case, the .b() method gives precedence to the edges that depart from a node.

Suppose we have

1
2
n == value=4 ==> m
m == value=6 ==> n
then
1
2
E.b(n) = (m, 4)
E.b(m) = (n, 6)

E.feature.freqList()
1
E.op.freqList(nodeTypesFrom=None, nodeTypesTo=None)
Description

If the edge feature has no values, simply return the number of node pairs between an edge of this kind exists.

If the edge feature does have values, we inspect them and see how often they occur. The result is a list of pairs (value, frequency), ordered by frequency, highest frequencies first.

nodeTypesFrom

If not None, only the values for edges that start from a node with type within nodeTypesFrom will be counted.

nodeTypesTo

If not None, only the values for edges that go to a node with type within nodeTypesTo will be counted.

E.oslots

oslots is a special edge feature and is mainly used to construct other parts of the API. It has less capabilities, and you will rarely need it. It does not have .f and .t methods, but an .s method instead.

Description

E.oslots.s(node) Gives the sorted list of slot numbers linked to a node, or put otherwise: the slots that support that node.

node

The node whose slots are being delivered.