Module tf.volumes.collect
Collect
Usage
from tf.volumes import collect
collect(
(
location1,
location2,
),
workLocation,
)
collect(
(
(name1, location1),
(name2, location2),
),
workLocation,
volumeType=None,
volumeFeature=None,
featureMeta=None,
**otext,
)
Expand source code Browse git
"""
# Collect
## Usage
``` python
from tf.volumes import collect
collect(
(
location1,
location2,
),
workLocation,
)
```
``` python
collect(
(
(name1, location1),
(name2, location2),
),
workLocation,
volumeType=None,
volumeFeature=None,
featureMeta=None,
**otext,
)
```
"""
import collections
from ..parameters import OTYPE, OSLOTS, OVOLUME, OWORK, OINTERF, OINTERT, OMAP
from ..core.fabric import FabricCore
from ..core.timestamp import Timestamp, SILENT_D, DEEP
from ..core.helpers import getAllRealFeatures
from ..core.files import unexpanduser as ux, dirEmpty, dirRemove, fileNm, dirNm
DEBUG = False
GENERATED = set(
"""
writtenBy
dateWritten
version
volume
""".strip().split()
)
TM = Timestamp()
indent = TM.indent
info = TM.info
warning = TM.warning
error = TM.error
setSilent = TM.setSilent
isSilent = TM.isSilent
def collect(
volumes,
workLocation,
volumeType=None,
volumeFeature=None,
mergeTypes=None,
featureMeta=None,
silent=SILENT_D,
overwrite=False,
):
"""Creates a collection out of a number of volumes.
The volumes are individual TF datasets, the work is a new TF data set.
You may pass as many volume data sources as you want.
The work will be the union of all nodes of the volumes,
rearranged according to their types, where node types with the
same names will be merged.
The slots of the work are the concatenation of the slots of the
volumes, which must all have the same slot type.
The node and edge features will be remapped, so that they have
the same values in the work as they had in the individual
volumes.
!!! caution "inter-volume edges"
The edge features of each volume only contain edges between nodes
in that volume. But the work as a whole may have had edges
between nodes of different volumes.
These can be restored from two extra features that may exist in the
volumes: `ointerfrom` and `ointerto`.
See also `tf.volumes.extract`.
The volumes may contain a feature `owork` which maps each node in a volume
to the corresponding node in the work.
Some non-slot nodes in the work may have slots in multiple volumes.
!!! hint "Lexeme nodes"
Think of lexeme nodes that have slots for all occurrences of that lexeme.
When a work is split into volumes, the lexeme nodes map to
separate lexeme nodes in each volume where these lexemes occur.
When we collect volumes into works, we want to merge these lexeme
nodes again.
When non-slot nodes across volumes have the same value for their `owork` feature,
they will be merged into the work. That means: only one node will be created
in the work, and the slots of that node will be the union of the
slots these nodes have in the individual volumes.
!!! caution "Overlapping slots"
It is an error if volumes have overlapping slots.
Overlapping slots are those whose values of `owork` are identical.
A feature `ovolume` will be created which maps each node of the work
to the corresponding node(s) in the individual volume(s).
Optionally, nodes corresponding to the volumes themselves will be
added to the work.
Care will be taken of the metadata of the features and the contents
of the `otext.tf` feature, which consists of metadata only.
All details of the work can be steered by means of parameters.
You can use this function to recombine volumes that have been obtained
by extracting them from a work by means of `tf.volumes.extract`.
In this case, there is no need to pass `volumeType` and `volumeFeature`.
Parameters
----------
volumes: dict or tuple of (string or tuple)
You can either pass just the locations of the volumes,
or you can give them a name and pass `(name, location)` instead,
or pass them as a dictionary with names as keys and locations as values.
If you do not give names to volumes, their locations will be used as name.
However, names are only used if you pass `volumeType` and /or `volumeFeature`.
!!! caution "Disjointness"
A collection can not contain the same volume more than once.
workLocation: string
The directory into which the feature files of the work will be written.
overwrite: boolean, optional None
If True, the target collection will be
be created and will replace any existing collection / volume of the same name.
If None, the collection will only be created if it does not exist.
No check will be performed as to whether an existing collection
is equal to what would have been created by this call.
If False, refuses to proceed if the collection directory already exists.
volumeType, volumeFeature: string, optional None
If a string value for one of these is passed,
a new node type will be added to the work,
with one new node for each volume: the volume node.
There will also be a new feature, that assigns the name of a volume
to the node of that volume.
The name of the new node type is the value of `volumeType`
if it is a non-empty string, else it is the value of `volumeFeature`.
The name of the new feature is `volumeFeature`
if it is a non-empty string, else it is the value of `volumeType`.
!!! caution "`volumeType` must be fresh"
It is an error if the `volumeType` is a node type that already
occurs in one of the volumes.
!!! note "`volumeFeature` may exist"
The `volumeFeature` may already exist in one or more volumes.
In that case the new feature values for nodes of `volumeType` will
just be added to it.
Example
-------
collect(
dict(
banks='banks/tf/0.2',
river='banks/tf/0.4',
),
'riverbanks/tf/1.0',
volumeType='volume',
volumeFeature='vol',
)
This results in a work with nodes and features from the volumes
found at the indicated places on your file system.
After combination, the volumes are visible in the work as nodes
of type `volume`, and the feature `vol` provides the names `banks` and `river`
for those nodes.
featureMeta: dict, optional None
The meta data of the volumes involved will be merged.
If feature metadata of the same feature is encountered in different volumes,
and if volumes specify different values for the same keys,
the different values will be stored under a key with the name of
the volume appended to the key, separated by a `!`.
The special metadata field `valueType` will just be reduced
to one single value `str` if some volumes have it as `str` and others as `int`.
If the volumes assign the same value type to a feature, that value type
will be assigned to the combined feature.
If you want to assign other meta data to specific features,
or pass meta data for new features that originate from the merging process,
you can pass them in the parameter `featureMeta` as in the following example,
where we pass meta data for a feature called `level` with integer values.
The contents of the `otext.tf` features are also metadata,
and their contents will be merged in exactly the same way.
So if the section / structure specifications and the formats are not
the same for all volumes, you will see them spread out
in fields qualified by the volume name with a `!` sign between
the key and the volume.
But you can add new specifications explicitly,
as meta data of the `otext` feature.
by passing them as keyword arguments.
They will be passed directly to the combined `otext.tf` feature
and will override anything with the same key
that is already in one of the volumes.
silent: string, optional tf.core.timestamp.SILENT_D
See `tf.core.timestamp.Timestamp`
Returns
-------
boolean
Whether the creation was successful.
All features in the resulting collection will get a metadata key
`volume` with as value the name of the collection and its component volumes.
Example
-------
collect(
dict(
banks='banks/tf/0.2',
river='banks/tf/0.4',
),
'riverbanks/tf/1.0',
featureMeta=dict(
level=dict(
valueType='int',
description='level of a section node',
),
),
)
Example
-------
collect(
dict(
banks='banks/tf/0.2',
river='banks/tf/0.4',
),
'riverbanks/tf/1.0',
featureMeta=dict(
otext=dict(
volumeType='volume',
volumeFeature='vol',
sectionTypes='volume,chapter,line',
sectionFeatures='title,number,number',
),
),
silent=SILENT_D,
)
This will give rise to something like this (assuming that `banks` and
`rivers` have some deviating material in their `otext.tf`:
@config
@compiler=Dirk Roorda
@dateWritten=2019-05-20T19:12:23Z
@fmt:line-default={letters:XXX}{terminator}
@fmt:line-term=line#{terminator}
@fmt:text-orig-extra={letters}{punc}{gap}
@fmt:text-orig-full={letters}
@fmt:text-orig-full!banks={letters}{punc}
@fmt:text-orig-full!rivers={letters}{gap}
@name=Culture quotes from Iain Banks
@purpose=exposition
@sectionFeatures=title,number,number
@sectionFeatures!banks=title,number,number
@sectionFeatures!rivers=number,number,number
@sectionTypes=volume,chapter,line
@sectionTypes!banks=book,chapter,sentence
@sectionTypes!rivers=chapter,sentence,line
@source=Good Reads
@status=with for similarities in a separate module
@structureFeatures!banks=title,number,number,number
@structureFeatures!rivers=title,number,number
@structureTypes!banks=book,chapter,sentence,line
@structureTypes!rivers=book,chapter,sentence
@url=https://www.goodreads.com/work/quotes/14366-consider-phlebas
@version=0.2
@writtenBy=Text-Fabric
@writtenBy=Text-Fabric
@dateWritten=2019-05-28T10:55:06Z
!!! caution "inter-version edges"
Features with names starting in `omap@` contain node maps from
older to newer versions.
These will be excluded from collection.
"""
collection = fileNm(workLocation)
loc = ux(dirNm(workLocation))
if not dirEmpty(workLocation):
proceed = True
good = True
if overwrite is None:
info(
f"Collection {collection} already exists and will not be recreated",
tm=False,
)
proceed = False
else:
if overwrite:
dirRemove(workLocation)
info(f"Collection {collection} exists and will be recreated", tm=False)
else:
good = False
proceed = False
error(
f"Collection {collection} already exists at {loc}",
tm=False,
)
if not good or not proceed:
return good
if volumeType:
if not volumeFeature:
volumeFeature = volumeType
else:
if volumeFeature:
volumeType = volumeFeature
TFs = {}
apis = {}
getOworks = {}
slotType = None
allSlots = set()
volumeMap = {}
volumeMapI = {}
maxSlotW = None
maxNodeW = None
metaData = collections.defaultdict(dict)
nodeFeatures = {}
edgeFeatures = {}
allNodeFeatures = set()
allEdgeFeatures = set()
volumeOslots = {}
fromWork = {}
good = True
if type(volumes) is not dict:
volProto = {}
volNames = set()
for loc in volumes:
if type(loc) is str:
name = fileNm(loc)
else:
(name, loc) = loc
if name in volNames:
error(f"Volume {name} is already part of the collection")
good = False
else:
volNames.add(name)
volProto[name] = loc
volumes = volProto
if not good:
return False
volumeIndex = {}
for (name, loc) in volumes.items():
seenName = volumeIndex.get(loc, None)
if seenName:
error(f"Volume {seenName} at location {loc} reoccurs as volume {name}")
good = False
volumeIndex[loc] = name
if not good:
return False
def loadVolumes():
for (name, loc) in volumes.items():
info(f"Loading volume {name:<60} from {ux(loc)} ...")
TFs[name] = FabricCore(locations=loc, silent=silent)
apis[name] = TFs[name].loadAll(silent=silent)
return True
def getMetas():
info("inspect metadata ...")
meta = collections.defaultdict(
lambda: collections.defaultdict(lambda: collections.defaultdict(set))
)
volumeStr = ",".join(volumes)
collectionMeta = f"{collection}:{volumeStr}"
for (feat, keys) in (featureMeta or {}).items():
if feat.startswith(OMAP):
continue
meta[feat]["collection"][collectionMeta] = {"+"}
for (key, value) in keys.items():
if value is not None:
meta[feat][key][value] = {""}
if volumeFeature:
meta[volumeFeature]["valueType"]["str"] = {""}
meta[volumeFeature]["description"][f"label of {volumeType}"] = {""}
for name in volumes:
allFeatures = getAllRealFeatures(apis[name])
for (feat, fObj) in TFs[name].features.items():
if feat not in allFeatures:
continue
meta[feat]["collection"][collectionMeta].add(name)
thisMeta = fObj.metaData
for (k, v) in thisMeta.items():
meta[feat][k][v].add(name)
if fObj.isConfig:
continue
if fObj.isEdge:
allEdgeFeatures.add(feat)
meta[feat]["edgeValues"][fObj.edgeValues].add(name)
else:
allNodeFeatures.add(feat)
for (feat, ks) in meta.items():
for (k, vs) in ks.items():
isGenerated = k in GENERATED
if k == "valueType":
if len(vs) > 1:
warning(
(
f"WARNING: {feat}: "
"valueType varies in volumes; will be str"
),
tm=False,
)
else:
metaData[feat][k] = sorted(vs)[0]
elif len(vs) == 1:
metaData[feat][k] = list(vs)[0]
else:
hasCombinedValue = False
for (v, names) in vs.items():
for name in names:
if name == "+" or name == "" and not isGenerated:
hasCombinedValue = True
key = k
elif name == "":
key = k
else:
key = f"{k}!{name}"
metaData[feat][key] = v
if not hasCombinedValue and not isGenerated:
warning(
f"WARNING: {feat}.{k} metadata varies across volumes",
tm=False,
)
metaData[OVOLUME] = {k: m for (k, m) in metaData[OTYPE].items()}
metaData[OVOLUME]["description"] = (
"mapping from a node in the work to the volume it comes from"
" and its corresponding node there"
)
metaData[OVOLUME]["valueType"] = "str"
indent(level=0)
info("metadata sorted out")
return True
def checkTypes():
nonlocal slotType
slotTypes = set()
otherTypes = set()
clashes = set()
good = True
info("check nodetypes ...")
indent(level=1, reset=True)
for name in volumes:
info(f"volume {name}", tm=False)
api = apis[name]
if not api:
good = False
continue
C = api.C
nTypeInfo = C.levels.data
for (t, (nType, av, nF, nT)) in enumerate(nTypeInfo):
if nType == volumeType:
clashes.add(name)
if t == len(nTypeInfo) - 1:
slotTypes.add(nType)
else:
otherTypes.add(nType)
if len(slotTypes) > 1:
slotRep = ", ".join(sorted(slotTypes))
error(f"Multiple slot types: {slotRep}", tm=False)
good = False
commonTypes = slotTypes & otherTypes
if len(commonTypes):
error(
"Some node types are slots in one volume and non-slots in another",
tm=False,
)
error(", ".sorted(commonTypes), tm=False)
good = False
if clashes:
clashRep = ", ".join(f"{name}" for name in clashes)
error(
f"Volume type {volumeType} occurs inside volumes {clashRep}",
tm=False,
)
good = False
if good:
slotType = list(slotTypes)[0]
else:
return False
indent(level=0)
info("node types ok")
return True
def collectNodes():
info("Collect nodes from volumes ...")
indent(level=1, reset=True)
nodesByType = collections.defaultdict(list)
fromWork.clear()
sW = 0
good = True
# check whether volumes have overlapping slots
# we use the original work slots, provided by the `owork` feature
# if it exists; otherwise we cannot know whether slots are overlapping
info("Check against overlapping slots ...")
indent(level=2, reset=True)
for name in volumes:
api = apis[name]
E = api.E
getOwork = (
None
if api.isLoaded(features=OWORK, pretty=False)[OWORK] is None
else api.Fs(OWORK)
)
if getOwork:
getOwork = getOwork.v
getOworks[name] = getOwork
maxSlotI = E.oslots.maxSlot
info(f"{name:<60}: {maxSlotI:>8} slots", tm=False)
volumeOslots[name] = set(range(sW + 1, sW + 1 + maxSlotI))
overlap = 0
for sV in range(1, maxSlotI + 1):
sW += 1
nameSv = (name, sV)
if getOwork:
sOW = getOwork(sV)
if sOW:
if sOW in fromWork:
overlap += 1
fromWork[sOW] = None
allSlots.add(sW)
volumeMap[sW] = nameSv
volumeMapI[nameSv] = sW
if overlap:
error(f"Overlapping slots: {overlap}")
good = False
indent(level=1)
if overlap == 0:
info("no overlap")
if not good:
return False
info("Group non-slot nodes by type")
indent(level=2, reset=True)
for name in volumes:
api = apis[name]
E = api.E
F = api.F
fOtypeData = F.otype.data
maxSlotI = E.oslots.maxSlot
maxNodeI = E.oslots.maxNode
info(f"{name:<60}: {maxSlotI + 1:>8}-{maxNodeI:>8}", tm=False)
for nV in range(maxSlotI + 1, maxNodeI + 1):
nType = fOtypeData[nV - maxSlotI - 1]
nodesByType[nType].append((name, nV))
nW = sW
indent(level=1)
info("Mapping nodes from volume to / from work ...")
indent(level=2)
for (nType, nodes) in nodesByType.items():
startW = nW
for nameNv in nodes:
(name, nV) = nameNv
getOwork = getOworks[name]
if getOwork:
nOW = getOwork(nV)
if nOW in fromWork:
fromWork[nOW].append(nameNv)
continue
else:
fromWork[nOW] = []
nW += 1
volumeMap[nW] = nameNv
volumeMapI[nameNv] = nW
info(
f"{nType:<20}: {startW + 1:>8} - {nW:>8}",
tm=False,
)
nVolumeNodes = len(volumes) if volumeFeature else 0
nNodesW = len(volumeMap) + nVolumeNodes
indent(level=1)
info(f"The new work has {nNodesW} nodes of which {len(allSlots)} slots")
nonlocal maxSlotW
nonlocal maxNodeW
maxSlotW = sW
maxNodeW = nW
indent(level=0)
info("collection done")
return True
def remapFeatures():
info("remap features ...")
indent(level=1, reset=True)
# node features
otype = {}
nodeFeatures[OTYPE] = otype
ovolume = {}
nodeFeatures[OVOLUME] = ovolume
# edge features
oslots = {}
edgeFeatures[OSLOTS] = oslots
ointerf = {}
ointert = {}
fOtypeDatas = {}
eOslotsDatas = {}
maxSlots = {}
maxNodes = {}
nodeFeatureDatas = {}
edgeFeatureDatas = {}
fromWork.clear()
for name in volumes:
api = apis[name]
allFeatures = getAllRealFeatures(api)
for (ointer, OINTER) in ((ointerf, OINTERF), (ointert, OINTERT)):
if api.isLoaded(features=OINTER, pretty=False)[OINTER] is None:
continue
interSource = api.Fs(OINTER).data
interData = {}
ointer[name] = interData
for (nW, interEdgesStr) in interSource.items():
interEdges = interEdgesStr.split(";")
for interEdge in interEdges:
(mW, feat, doValues, isInt, val) = interEdge.split(",")
doValues = doValues == "v"
isInt = isInt == "i"
dest = interData.setdefault(feat, {}).setdefault(
nW, {} if doValues else set()
)
if doValues:
dest[mW] = int(val) if isInt else val
else:
dest.add(mW)
fOtypeDatas[name] = api.F.otype.data
eOslotsDatas[name] = api.E.oslots.data
maxSlots[name] = api.E.oslots.maxSlot
maxNodes[name] = api.E.oslots.maxNode
nodeFeatureDatas[name] = {
feat: api.Fs(feat).data
for feat in api.Fall()
if feat != OTYPE and feat in allFeatures
}
edgeFeatureDatas[name] = {
feat: (
api.Es(feat).doValues,
api.Es(feat).data,
api.Es(feat).dataInv,
)
for feat in api.Eall()
if not feat.startswith(OMAP) and feat != OSLOTS and feat in allFeatures
}
for (nW, (name, nV)) in volumeMap.items():
ovolume[nW] = f"{name},{nV}"
maxSlotI = maxSlots[name]
otype[nW] = (
slotType if nW <= maxSlotW else fOtypeDatas[name][nV - maxSlotI - 1]
)
if nW > maxSlotW:
oslots[nW] = {
volumeMapI[(name, sV)]
for sV in eOslotsDatas[name][nV - maxSlotI - 1]
}
if nW in fromWork:
for (name2, nV2) in fromWork[nW]:
oslots[nW] |= {
volumeMapI[(name2, sV)]
for sV in eOslotsDatas[name2][nV2 - maxSlots[name2] - 1]
}
for (feat, featD) in nodeFeatureDatas[name].items():
val = featD.get(nV, None)
if val is not None:
nodeFeatures.setdefault(feat, {})[nW] = val
for (feat, (doValues, featF, featT)) in edgeFeatureDatas[name].items():
valData = featF.get(nV, None)
if valData is not None:
value = {} if doValues else set()
for tV in valData:
tW = volumeMapI[(name, tV)]
if doValues:
val = valData.get(tV, None)
value[tW] = val
else:
value.add(tW)
if value:
edgeFeatures.setdefault(feat, {})[nW] = value
# add inter-volume edges
# the ointerf and ointert features have in their values a node
# from the original work.
# We have to infer the volume and the corresponding node in that volume.
# And then we can get the corresponding node in the collected work.
# If this sounds like a detour that can be cut short:
# the new collection we are making does not have to be identical
# to the original work.
# It could very well be that we have extracted all volumes from a work
# and are now collecting only certain volumes for the new work.
# Indeed, when splitting the original work into volumes,
# it might have been the case that certain top-level sections
# of the original work do not end up in one of the volumes.
getOworkWI = {}
# get the mapping from nodes in the original work
# to nodes in all volumes
for name in volumes:
getOwork = getOworks[name]
if getOwork:
maxNode = maxNodes[name]
for nV in range(1, maxNode):
nW = getOwork(nV)
getOworkWI[nW] = (name, nV)
# OUTGOING edges
for (name, interData) in ointerf.items():
for (feat, interFeatData) in interData.items():
doValues = edgeFeatureDatas[name][feat][0]
thisEdgeFeature = edgeFeatures.setdefault(feat, {})
for (fW, interValueData) in interFeatData.items():
for tOW in interValueData:
if tOW not in getOworkWI:
# edge goes outside the new work
continue
else:
(nameT, tV) = getOworkWI[tOW]
tW = volumeMapI[(nameT, tV)]
dest = thisEdgeFeature[feat].setdefault(
fW, {} if doValues else set()
)
if doValues:
dest[tW] = interValueData[tOW]
else:
dest.add(tW)
# INCOMING edges
for (name, interData) in ointert.items():
for (feat, interFeatData) in interData.items():
doValues = edgeFeatureDatas[name][feat][0]
thisEdgeFeature = edgeFeatures.setdefault(feat, {})
for (tW, interValueData) in interFeatData.items():
for fOW in interValueData:
if fOW not in getOworkWI:
# edge comes from outside the new work
continue
else:
(nameF, fV) = getOworkWI[fOW]
fW = volumeMapI[(nameF, fV)]
dest = thisEdgeFeature[feat].setdefault(
fW, {} if doValues else set()
)
if doValues:
dest[tW] = interValueData[tOW]
else:
dest.add(tW)
if volumeFeature:
nW = maxNodeW
for name in volumes:
nW += 1
nodeFeatures.setdefault(volumeFeature, {})[nW] = name
nodeFeatures[OTYPE][nW] = volumeType
edgeFeatures[OSLOTS][nW] = volumeOslots[name]
indent(level=0)
info("remapping done")
return True
def writeTf():
info("write work as TF data set")
TF = FabricCore(locations=workLocation, silent=DEEP)
good = TF.save(
metaData=metaData,
nodeFeatures=nodeFeatures,
edgeFeatures=edgeFeatures,
silent=SILENT_D if DEBUG else DEEP,
)
indent(level=0)
if not good:
return False
info("writing done")
return good
def process():
indent(level=0, reset=True)
if not loadVolumes():
return False
if not getMetas():
return False
if not checkTypes():
return False
if not collectNodes():
return False
if not remapFeatures():
return False
if not writeTf():
return False
info("done")
return True
wasSilent = isSilent()
setSilent(silent)
result = process()
setSilent(wasSilent)
return result
Functions
def collect(volumes, workLocation, volumeType=None, volumeFeature=None, mergeTypes=None, featureMeta=None, silent='auto', overwrite=False)
-
Creates a collection out of a number of volumes.
The volumes are individual TF datasets, the work is a new TF data set.
You may pass as many volume data sources as you want.
The work will be the union of all nodes of the volumes, rearranged according to their types, where node types with the same names will be merged.
The slots of the work are the concatenation of the slots of the volumes, which must all have the same slot type.
The node and edge features will be remapped, so that they have the same values in the work as they had in the individual volumes.
inter-volume edges
The edge features of each volume only contain edges between nodes in that volume. But the work as a whole may have had edges between nodes of different volumes. These can be restored from two extra features that may exist in the volumes:
ointerfrom
andointerto
.See also
tf.volumes.extract
.The volumes may contain a feature
owork
which maps each node in a volume to the corresponding node in the work. Some non-slot nodes in the work may have slots in multiple volumes.Lexeme nodes
Think of lexeme nodes that have slots for all occurrences of that lexeme. When a work is split into volumes, the lexeme nodes map to separate lexeme nodes in each volume where these lexemes occur. When we collect volumes into works, we want to merge these lexeme nodes again.
When non-slot nodes across volumes have the same value for their
owork
feature, they will be merged into the work. That means: only one node will be created in the work, and the slots of that node will be the union of the slots these nodes have in the individual volumes.Overlapping slots
It is an error if volumes have overlapping slots. Overlapping slots are those whose values of
owork
are identical.A feature
ovolume
will be created which maps each node of the work to the corresponding node(s) in the individual volume(s).Optionally, nodes corresponding to the volumes themselves will be added to the work.
Care will be taken of the metadata of the features and the contents of the
otext.tf
feature, which consists of metadata only.All details of the work can be steered by means of parameters.
You can use this function to recombine volumes that have been obtained by extracting them from a work by means of
tf.volumes.extract
. In this case, there is no need to passvolumeType
andvolumeFeature
.Parameters
volumes
:dict
ortuple
of(string
ortuple)
-
You can either pass just the locations of the volumes, or you can give them a name and pass
(name, location)
instead, or pass them as a dictionary with names as keys and locations as values. If you do not give names to volumes, their locations will be used as name. However, names are only used if you passvolumeType
and /orvolumeFeature
.Disjointness
A collection can not contain the same volume more than once.
workLocation
:string
- The directory into which the feature files of the work will be written.
overwrite
:boolean
, optionalNone
- If True, the target collection will be be created and will replace any existing collection / volume of the same name. If None, the collection will only be created if it does not exist. No check will be performed as to whether an existing collection is equal to what would have been created by this call. If False, refuses to proceed if the collection directory already exists.
volumeType
,volumeFeature
:string
, optionalNone
-
If a string value for one of these is passed, a new node type will be added to the work, with one new node for each volume: the volume node. There will also be a new feature, that assigns the name of a volume to the node of that volume.
The name of the new node type is the value of
volumeType
if it is a non-empty string, else it is the value ofvolumeFeature
.The name of the new feature is
volumeFeature
if it is a non-empty string, else it is the value ofvolumeType
.volumeType
must be freshIt is an error if the
volumeType
is a node type that already occurs in one of the volumes.volumeFeature
may existThe
volumeFeature
may already exist in one or more volumes. In that case the new feature values for nodes ofvolumeType
will just be added to it.Example
collect( dict( banks='banks/tf/0.2', river='banks/tf/0.4', ), 'riverbanks/tf/1.0', volumeType='volume', volumeFeature='vol', )
This results in a work with nodes and features from the volumes found at the indicated places on your file system. After combination, the volumes are visible in the work as nodes of type
volume
, and the featurevol
provides the namesbanks
andriver
for those nodes. featureMeta
:dict
, optionalNone
-
The meta data of the volumes involved will be merged. If feature metadata of the same feature is encountered in different volumes, and if volumes specify different values for the same keys, the different values will be stored under a key with the name of the volume appended to the key, separated by a
!
.The special metadata field
valueType
will just be reduced to one single valuestr
if some volumes have it asstr
and others asint
. If the volumes assign the same value type to a feature, that value type will be assigned to the combined feature.If you want to assign other meta data to specific features, or pass meta data for new features that originate from the merging process, you can pass them in the parameter
featureMeta
as in the following example, where we pass meta data for a feature calledlevel
with integer values.The contents of the
otext.tf
features are also metadata, and their contents will be merged in exactly the same way.So if the section / structure specifications and the formats are not the same for all volumes, you will see them spread out in fields qualified by the volume name with a
!
sign between the key and the volume.But you can add new specifications explicitly, as meta data of the
otext
feature. by passing them as keyword arguments. They will be passed directly to the combinedotext.tf
feature and will override anything with the same key that is already in one of the volumes. silent
:string
, optionalSILENT_D
- See
Timestamp
Returns
boolean
-
Whether the creation was successful.
All features in the resulting collection will get a metadata key
volume
with as value the name of the collection and its component volumes.
Example
collect( dict( banks='banks/tf/0.2', river='banks/tf/0.4', ), 'riverbanks/tf/1.0', featureMeta=dict( level=dict( valueType='int', description='level of a section node', ), ), )
Example
collect( dict( banks='banks/tf/0.2', river='banks/tf/0.4', ), 'riverbanks/tf/1.0', featureMeta=dict( otext=dict( volumeType='volume', volumeFeature='vol', sectionTypes='volume,chapter,line', sectionFeatures='title,number,number', ), ), silent=SILENT_D, )
This will give rise to something like this (assuming that
banks
andrivers
have some deviating material in theirotext.tf
:@config @compiler=Dirk Roorda @dateWritten=2019-05-20T19:12:23Z @fmt:line-default={letters:XXX}{terminator} @fmt:line-term=line#{terminator} @fmt:text-orig-extra={letters}{punc}{gap} @fmt:text-orig-full={letters} @fmt:text-orig-full!banks={letters}{punc} @fmt:text-orig-full!rivers={letters}{gap} @name=Culture quotes from Iain Banks @purpose=exposition @sectionFeatures=title,number,number @sectionFeatures!banks=title,number,number @sectionFeatures!rivers=number,number,number @sectionTypes=volume,chapter,line @sectionTypes!banks=book,chapter,sentence @sectionTypes!rivers=chapter,sentence,line @source=Good Reads @status=with for similarities in a separate module @structureFeatures!banks=title,number,number,number @structureFeatures!rivers=title,number,number @structureTypes!banks=book,chapter,sentence,line @structureTypes!rivers=book,chapter,sentence @url=<https://www.goodreads.com/work/quotes/14366-consider-phlebas> @version=0.2 @writtenBy=Text-Fabric @writtenBy=Text-Fabric @dateWritten=2019-05-28T10:55:06Z
inter-version edges
Features with names starting in
omap@
contain node maps from older to newer versions. These will be excluded from collection.