Module tf.core.edgefeature
Mappings from edges to values.
Every edge feature is logically a mapping from pairs of nodes to values, string or integer.
A feature object gives you methods that you can pass a node and that returns its value for that node.
It is easiest to think of all edge features as a dictionary keyed by nodes.
The values are either sets or dictionaries.
If the value is a set, then the elements are the second node in the pair
and the value is None
.
If the value is a dictionary, then the keys are the second node in the pair,
and the value is the value that the edge feature assigns to this pair.
However, some features have an optimised representation, and do not have a dictionary underneath.
But you can still iterate over the data of a feature as if it were a
dictionary: EdgeFeature.items()
Expand source code Browse git
"""
Mappings from edges to values.
Every edge feature is logically a mapping from pairs of nodes to values,
string or integer.
A feature object gives you methods that you can pass a node and that returns
its value for that node.
It is easiest to think of all edge features as a dictionary keyed by nodes.
The values are either sets or dictionaries.
If the value is a set, then the elements are the second node in the pair
and the value is `None`.
If the value is a dictionary, then the keys are the second node in the pair,
and the value is the value that the edge feature assigns to this pair.
However, some features have an optimised representation, and do not have
a dictionary underneath.
But you can still iterate over the data of a feature as if it were a
dictionary: `tf.core.edgefeature.EdgeFeature.items`
"""
import collections
from .helpers import makeInverse, makeInverseVal
class EdgeFeatures:
pass
class EdgeFeature:
"""Provides access to (edge) feature data.
For feature `fff` it is the result of `E.fff` or `Es('fff')`.
"""
def __init__(self, api, metaData, data, doValues):
self.api = api
self.meta = metaData
"""Metadata of the feature.
This is the information found in the lines starting with `@`
in the `.tf` feature file.
"""
self.doValues = doValues
if type(data) is tuple:
self.data = data[0]
self.dataInv = data[1]
else:
self.data = data
self.dataInv = (
makeInverseVal(self.data) if doValues else makeInverse(self.data)
)
def items(self):
"""A generator that yields the items of the feature, seen as a mapping.
This gives you a rather efficient way to iterate over
just the feature data.
If you need this repeatedly, or you need the whole dictionary,
you can store the result as follows:
data = dict(E.fff.items())
"""
return self.data.items()
def f(self, n):
"""Get outgoing edges *from* a node.
The edges are those pairs of nodes specified in the feature data,
whose first node is the `n`.
Parameters
----------
node: integer
The node **from** which the edges in question start.
Returns
-------
set | tuple
The nodes reached by the edges **from** a certain node.
The members of the result are just nodes, if this feature does not
assign values to edges.
Otherwise the members are tuples of the destination node and the
value that the feature assigns to this pair of nodes.
If there are no edges from the node, the empty tuple is returned,
rather than `None`.
"""
if n not in self.data:
return ()
Crank = self.api.C.rank.data
if self.doValues:
return tuple(sorted(self.data[n].items(), key=lambda mv: Crank[mv[0] - 1]))
else:
return tuple(sorted(self.data[n], key=lambda m: Crank[m - 1]))
def t(self, n):
"""Get incoming edges *to* a node.
The edges are those pairs of nodes specified in the feature data,
whose second node is the `n`.
Parameters
----------
node: integer
The node **to** which the edges in question connect.
Returns
-------
set | tuple
The nodes where the edges **to** a certain node start.
The members of the result are just nodes, if this feature does not
assign values to edges.
Otherwise the members are tuples of the start node and the
value that the feature assigns to this pair of nodes.
If there are no edges to the node, the empty tuple is returned,
rather than `None`.
"""
if n not in self.dataInv:
return ()
Crank = self.api.C.rank.data
if self.doValues:
return tuple(
sorted(self.dataInv[n].items(), key=lambda mv: Crank[mv[0] - 1])
)
else:
return tuple(sorted(self.dataInv[n], key=lambda m: Crank[m - 1]))
def b(self, n):
"""Query *both* incoming edges to, and outgoing edges from a node.
The edges are those pairs of nodes specified in the feature data,
whose first or second node is the `n`.
Parameters
----------
node: integer
The node **from** which the edges in question start or
**to** which the edges in question connect.
Returns
-------
set | dict
The nodes where the edges **to** a certain node start.
The members of the result are just nodes, if this feature does not
assign values to edges.
Otherwise the members are tuples of the start node and the
value that the feature assigns to this pair of nodes.
If there are no edges to the node, the empty tuple is returned,
rather than `None`.
Notes
-----
!!! hint "symmetric closure"
This method gives the *symmetric closure* of a set of edges:
if there is an edge between `n` and `m`, this method will deliver
its value, no matter the direction of the edge.
!!! example "symmetric edges"
Some edge sets are semantically symmetric, for example *similarity*.
If `n` is similar to `m`, then `m` is similar to `n`.
But if you store such an edge feature completely,
half of the data is redundant.
By virtue of this method you do not have to do that, you only need to store
one of the edges between `n` and `m` (it does not matter which one),
and `E.fff.b(n)` will nevertheless produce the complete results.
!!! caution "conflicting values"
If your set of edges is not symmetric, and edges carry values, it might
very well be the case that edges between the same pair of nodes carry
different values for the two directions.
In that case, this method gives precedence to the edges that
*depart* from the node to those that go *to* the node.
!!! example "conflicting values"
Suppose we have
n == value=4 ==> m
m == value=6 ==> n
then
E.b(n) = (m, 4)
E.b(m) = (n, 6)
"""
if n not in self.data and n not in self.dataInv:
return ()
Crank = self.api.C.rank.data
if self.doValues:
result = {}
if n in self.dataInv:
result.update(self.dataInv[n].items())
if n in self.data:
result.update(self.data[n].items())
return tuple(sorted(result.items(), key=lambda mv: Crank[mv[0] - 1]))
else:
result = set()
if n in self.dataInv:
result |= self.dataInv[n]
if n in self.data:
result |= self.data[n]
return tuple(sorted(result, key=lambda m: Crank[m - 1]))
def freqList(self, nodeTypesFrom=None, nodeTypesTo=None):
"""Frequency list of the values of this feature.
Inspect the values of this feature and see how often they occur.
If the feature does not assign values, return the number of node pairs
in this edge.
If the edge feature does have values, inspect them and see
how often they occur.
The result is a list of pairs `(value, frequency)`, ordered by `frequency`,
highest frequencies first.
Parameters
----------
nodeTypesFrom: set of string, optional None
If you pass a set of node types here, only the values for edges
that start *from* a node with such a type will be counted.
nodeTypesTo: set of string, optional None
If you pass a set of node types here, only the values for edges
that go *to* a node with such a type will be counted.
Returns
-------
tuple of 2-tuple
A tuple of `(value, frequency)`, items, ordered by `frequency`,
highest frequencies first.
"""
if nodeTypesFrom is None and nodeTypesTo is None:
if self.doValues:
fql = collections.Counter()
for (n, vals) in self.data.items():
for val in vals.values():
fql[val] += 1
return tuple(sorted(fql.items(), key=lambda x: (-x[1], x[0])))
else:
fql = 0
for (n, ms) in self.data.items():
fql += len(ms)
return fql
else:
fOtype = self.api.F.otype.v
if self.doValues:
fql = collections.Counter()
for (n, vals) in self.data.items():
if nodeTypesFrom is None or fOtype(n) in nodeTypesFrom:
for (m, val) in vals.items():
if nodeTypesTo is None or fOtype(m) in nodeTypesTo:
fql[val] += 1
return tuple(sorted(fql.items(), key=lambda x: (-x[1], x[0])))
else:
fql = 0
for (n, ms) in self.data.items():
if nodeTypesFrom is None or fOtype(n) in nodeTypesFrom:
for m in ms:
if nodeTypesTo is None or fOtype(m) in nodeTypesTo:
fql += len(ms)
return fql
Classes
class EdgeFeature (api, metaData, data, doValues)
-
Provides access to (edge) feature data.
For feature
fff
it is the result ofE.fff
orEs('fff')
.Expand source code Browse git
class EdgeFeature: """Provides access to (edge) feature data. For feature `fff` it is the result of `E.fff` or `Es('fff')`. """ def __init__(self, api, metaData, data, doValues): self.api = api self.meta = metaData """Metadata of the feature. This is the information found in the lines starting with `@` in the `.tf` feature file. """ self.doValues = doValues if type(data) is tuple: self.data = data[0] self.dataInv = data[1] else: self.data = data self.dataInv = ( makeInverseVal(self.data) if doValues else makeInverse(self.data) ) def items(self): """A generator that yields the items of the feature, seen as a mapping. This gives you a rather efficient way to iterate over just the feature data. If you need this repeatedly, or you need the whole dictionary, you can store the result as follows: data = dict(E.fff.items()) """ return self.data.items() def f(self, n): """Get outgoing edges *from* a node. The edges are those pairs of nodes specified in the feature data, whose first node is the `n`. Parameters ---------- node: integer The node **from** which the edges in question start. Returns ------- set | tuple The nodes reached by the edges **from** a certain node. The members of the result are just nodes, if this feature does not assign values to edges. Otherwise the members are tuples of the destination node and the value that the feature assigns to this pair of nodes. If there are no edges from the node, the empty tuple is returned, rather than `None`. """ if n not in self.data: return () Crank = self.api.C.rank.data if self.doValues: return tuple(sorted(self.data[n].items(), key=lambda mv: Crank[mv[0] - 1])) else: return tuple(sorted(self.data[n], key=lambda m: Crank[m - 1])) def t(self, n): """Get incoming edges *to* a node. The edges are those pairs of nodes specified in the feature data, whose second node is the `n`. Parameters ---------- node: integer The node **to** which the edges in question connect. Returns ------- set | tuple The nodes where the edges **to** a certain node start. The members of the result are just nodes, if this feature does not assign values to edges. Otherwise the members are tuples of the start node and the value that the feature assigns to this pair of nodes. If there are no edges to the node, the empty tuple is returned, rather than `None`. """ if n not in self.dataInv: return () Crank = self.api.C.rank.data if self.doValues: return tuple( sorted(self.dataInv[n].items(), key=lambda mv: Crank[mv[0] - 1]) ) else: return tuple(sorted(self.dataInv[n], key=lambda m: Crank[m - 1])) def b(self, n): """Query *both* incoming edges to, and outgoing edges from a node. The edges are those pairs of nodes specified in the feature data, whose first or second node is the `n`. Parameters ---------- node: integer The node **from** which the edges in question start or **to** which the edges in question connect. Returns ------- set | dict The nodes where the edges **to** a certain node start. The members of the result are just nodes, if this feature does not assign values to edges. Otherwise the members are tuples of the start node and the value that the feature assigns to this pair of nodes. If there are no edges to the node, the empty tuple is returned, rather than `None`. Notes ----- !!! hint "symmetric closure" This method gives the *symmetric closure* of a set of edges: if there is an edge between `n` and `m`, this method will deliver its value, no matter the direction of the edge. !!! example "symmetric edges" Some edge sets are semantically symmetric, for example *similarity*. If `n` is similar to `m`, then `m` is similar to `n`. But if you store such an edge feature completely, half of the data is redundant. By virtue of this method you do not have to do that, you only need to store one of the edges between `n` and `m` (it does not matter which one), and `E.fff.b(n)` will nevertheless produce the complete results. !!! caution "conflicting values" If your set of edges is not symmetric, and edges carry values, it might very well be the case that edges between the same pair of nodes carry different values for the two directions. In that case, this method gives precedence to the edges that *depart* from the node to those that go *to* the node. !!! example "conflicting values" Suppose we have n == value=4 ==> m m == value=6 ==> n then E.b(n) = (m, 4) E.b(m) = (n, 6) """ if n not in self.data and n not in self.dataInv: return () Crank = self.api.C.rank.data if self.doValues: result = {} if n in self.dataInv: result.update(self.dataInv[n].items()) if n in self.data: result.update(self.data[n].items()) return tuple(sorted(result.items(), key=lambda mv: Crank[mv[0] - 1])) else: result = set() if n in self.dataInv: result |= self.dataInv[n] if n in self.data: result |= self.data[n] return tuple(sorted(result, key=lambda m: Crank[m - 1])) def freqList(self, nodeTypesFrom=None, nodeTypesTo=None): """Frequency list of the values of this feature. Inspect the values of this feature and see how often they occur. If the feature does not assign values, return the number of node pairs in this edge. If the edge feature does have values, inspect them and see how often they occur. The result is a list of pairs `(value, frequency)`, ordered by `frequency`, highest frequencies first. Parameters ---------- nodeTypesFrom: set of string, optional None If you pass a set of node types here, only the values for edges that start *from* a node with such a type will be counted. nodeTypesTo: set of string, optional None If you pass a set of node types here, only the values for edges that go *to* a node with such a type will be counted. Returns ------- tuple of 2-tuple A tuple of `(value, frequency)`, items, ordered by `frequency`, highest frequencies first. """ if nodeTypesFrom is None and nodeTypesTo is None: if self.doValues: fql = collections.Counter() for (n, vals) in self.data.items(): for val in vals.values(): fql[val] += 1 return tuple(sorted(fql.items(), key=lambda x: (-x[1], x[0]))) else: fql = 0 for (n, ms) in self.data.items(): fql += len(ms) return fql else: fOtype = self.api.F.otype.v if self.doValues: fql = collections.Counter() for (n, vals) in self.data.items(): if nodeTypesFrom is None or fOtype(n) in nodeTypesFrom: for (m, val) in vals.items(): if nodeTypesTo is None or fOtype(m) in nodeTypesTo: fql[val] += 1 return tuple(sorted(fql.items(), key=lambda x: (-x[1], x[0]))) else: fql = 0 for (n, ms) in self.data.items(): if nodeTypesFrom is None or fOtype(n) in nodeTypesFrom: for m in ms: if nodeTypesTo is None or fOtype(m) in nodeTypesTo: fql += len(ms) return fql
Instance variables
var meta
-
Metadata of the feature.
This is the information found in the lines starting with
@
in the.tf
feature file.
Methods
def b(self, n)
-
Query both incoming edges to, and outgoing edges from a node.
The edges are those pairs of nodes specified in the feature data, whose first or second node is the
n
.Parameters
node
:integer
- The node from which the edges in question start or to which the edges in question connect.
Returns
set | dict
-
The nodes where the edges to a certain node start. The members of the result are just nodes, if this feature does not assign values to edges. Otherwise the members are tuples of the start node and the value that the feature assigns to this pair of nodes.
If there are no edges to the node, the empty tuple is returned, rather than
None
.
Notes
symmetric closure
This method gives the symmetric closure of a set of edges: if there is an edge between
n
andm
, this method will deliver its value, no matter the direction of the edge.symmetric edges
Some edge sets are semantically symmetric, for example similarity. If
n
is similar tom
, thenm
is similar ton
.But if you store such an edge feature completely, half of the data is redundant. By virtue of this method you do not have to do that, you only need to store one of the edges between
n
andm
(it does not matter which one), andE.fff.b(n)
will nevertheless produce the complete results.conflicting values
If your set of edges is not symmetric, and edges carry values, it might very well be the case that edges between the same pair of nodes carry different values for the two directions.
In that case, this method gives precedence to the edges that depart from the node to those that go to the node.
conflicting values
Suppose we have
n == value=4 ==> m m == value=6 ==> n
then
E.b(n) = (m, 4) E.b(m) = (n, 6)
def f(self, n)
-
Get outgoing edges from a node.
The edges are those pairs of nodes specified in the feature data, whose first node is the
n
.Parameters
node
:integer
- The node from which the edges in question start.
Returns
set | tuple
-
The nodes reached by the edges from a certain node. The members of the result are just nodes, if this feature does not assign values to edges. Otherwise the members are tuples of the destination node and the value that the feature assigns to this pair of nodes.
If there are no edges from the node, the empty tuple is returned, rather than
None
.
def freqList(self, nodeTypesFrom=None, nodeTypesTo=None)
-
Frequency list of the values of this feature.
Inspect the values of this feature and see how often they occur.
If the feature does not assign values, return the number of node pairs in this edge.
If the edge feature does have values, inspect them and see how often they occur. The result is a list of pairs
(value, frequency)
, ordered byfrequency
, highest frequencies first.Parameters
nodeTypesFrom
:set
ofstring
, optionalNone
- If you pass a set of node types here, only the values for edges that start from a node with such a type will be counted.
nodeTypesTo
:set
ofstring
, optionalNone
- If you pass a set of node types here, only the values for edges that go to a node with such a type will be counted.
Returns
tuple
of2-tuple
- A tuple of
(value, frequency)
, items, ordered byfrequency
, highest frequencies first.
def items(self)
-
A generator that yields the items of the feature, seen as a mapping.
This gives you a rather efficient way to iterate over just the feature data.
If you need this repeatedly, or you need the whole dictionary, you can store the result as follows:
data = dict(E.fff.items())
def t(self, n)
-
Get incoming edges to a node.
The edges are those pairs of nodes specified in the feature data, whose second node is the
n
.Parameters
node
:integer
- The node to which the edges in question connect.
Returns
set | tuple
-
The nodes where the edges to a certain node start. The members of the result are just nodes, if this feature does not assign values to edges. Otherwise the members are tuples of the start node and the value that the feature assigns to this pair of nodes.
If there are no edges to the node, the empty tuple is returned, rather than
None
.
class EdgeFeatures
-
Expand source code Browse git
class EdgeFeatures: pass