Module tf.browser.ner.settings
Corpus dependent setup of the annotation tool.
To see how this fits among all the modules of this package, see
tf.browser.ner.annotate
.
Expand source code Browse git
"""Corpus dependent setup of the annotation tool.
To see how this fits among all the modules of this package, see
`tf.browser.ner.annotate` .
"""
from ...core.helpers import console as cs
from ...core.files import readYaml, fileExists
TOOLKEY = "ner"
"""The name of this annotation tool.
This name is used
* in directory paths on the file system to find the data that is managed by this tool;
* as a key to address the in-memory data that belongs to this tool;
* as a prefix to modularize the Flask app for this tool within the encompassing
TF browser Flask app and also it CSS files.
"""
NONE = "⌀"
"""GUI representiation of an empty value.
Used to mark the fact that an occurrence does not have a value for an entity feature.
That happens when an occurrence is not part of an entity.
"""
EMPTY = "␀"
"""GUI representation of the empty string.
If an entity feature has the empty string as value, and we want to create a button for
it, this is the label we draw on that button.
"""
LIMIT_BROWSER = 100
"""Limit of amount of buckets to load on one page when in the TF browser.
This is not a hard limit. We only use it if the page contains the whole corpus or
a filtered subset of it.
But as soon we have selected a token string or an entity, we show all buckets
that contain it, no matter how many there are.
!!! note "Performance"
We use the
[CSS device *content-visibility*](https://developer.mozilla.org/en-US/docs/Web/CSS/content-visibility)
to restrict rendering to the material that is visible in the viewport. However,
this is not supported in Safari, so the performance may suffer in Safari if we load
the whole corpus on a single page.
In practice, even in browsers that support this device are not happy with a big
chunk of HTML on the page, since they do have to build a large DOM, including
event listeners.
That's why we restrict the page to a limited amount of buckets.
But when a selection has been made, it is more important to show the whole,
untruncated result set, than to incur a performance penalty.
Moreover, it is hardly the case that a selected entity of occurrence occurs in a
very large number of buckets.
"""
LIMIT_NB = 20
"""Limit of amount of buckets to load on one page when in a Jupyter notebook.
See also `LIMIT_BROWSER` .
"""
ERROR = "error"
STYLES = dict(
minus=dict(bg="#ffaaaa;"),
plus=dict(bg="#aaffaa;"),
replace=dict(bg="#ffff88;"),
free=dict(
ff="monospace",
fz="small",
fw="normal",
fg="black",
bg="white",
),
free_active=dict(
fg="black",
bg="yellow",
),
free_bordered=dict(
bg="white",
br="0.5rem",
bw="1pt",
bs="solid",
bc="white",
p="0.4rem",
m="0.1rem 0.2rem",
),
free_bordered_active=dict(
bw="1pt",
bs="solid",
bc="yellow",
),
keyword=dict(
ff="monospace",
fz="medium",
fw="bold",
fg="black",
bg="white",
),
keyword_active=dict(
fg="black",
bg="yellow",
),
keyword_bordered=dict(
bg="white",
br="0.5rem",
bw="1pt",
bs="solid",
bc="white",
p="0.3rem",
m="0.1rem 0.2rem",
),
keyword_bordered_active=dict(
bw="1pt",
bs="solid",
bc="yellow",
),
)
"""CSS style configuration for entity features.
Here we define properties of the styling of the entity features and their
values.
Since these features are defined in configuration, we cannot work with a fixed
style sheet.
We divide entity features in *keyword* features and *free* features.
The typical keyword feature is `kind`, it has a limited set of values.
The typical free feature is `eid`, it has an unbounded number of values.
As it is now, we could have expressed this in a fixed style sheet.
But if we open up to allowing for more entity features, we can use this setup
to easily configure the formatting of them.
However, we should move these definitions to the `ner.yaml` file then, so that the
only place of configuration is that YAML file, and not this file.
"""
SORTDIR_DESC = "d"
"""Value that indicates the descending sort direction."""
SORTDIR_ASC = "a"
"""Value that indicates the ascending sort direction."""
SORTDIR_DEFAULT = SORTDIR_ASC
"""Default sort direction."""
SORTKEY_DEFAULT = "freqsort"
"""Default sort key."""
SORT_DEFAULT = (SORTKEY_DEFAULT, SORTDIR_DESC)
"""Default sort key plus sort direction combination."""
SC_ALL = "a"
"""Value that indicates *all* buckets."""
SC_FILT = "f"
"""Value that indicates *filtered* buckets."""
DEFAULT_SETTINGS = """
entityType: ent
entitySet: "{entityType}-nodes"
bucketType: chunk
strFeature: str
afterFeature: after
features:
- eid
- kind
keywordFeatures:
- kind
defaultValues:
kind: PER
spaceEscaped: false
"""
class Settings:
def __init__(self):
"""Provides configuration details.
There is fixed configuration, that is not intended to be modifiable by users.
These configuration values are put in variables in this module, which
other modules can import.
There is also customisable configuration, meant to adapt the tool to the
specifics of a corpus.
Those configuration values are read from a YAML file, located in a directory
`ner` next to the `tf` data of the corpus.
"""
specDir = self.specDir
nerSpec = f"{specDir}/config.yaml"
kwargs = (
dict(asFile=nerSpec) if fileExists(nerSpec) else dict(text=DEFAULT_SETTINGS)
)
settings = readYaml(preferTuples=True, **kwargs)
settings.entitySet = (settings.entitySet or "entity-nodes").format(
entityType=settings.entityType
)
self.settings = settings
features = self.settings.features
keywordFeatures = self.settings.keywordFeatures
self.settings.summaryIndices = tuple(
i for i in range(len(features)) if features[i] in keywordFeatures
)
def console(self, msg, **kwargs):
"""Print something to the output.
This works exactly as `tf.core.helpers.console`
It is handy to have this as a method on the Annotate object,
so that we can issue temporary console statements during development
without the need to add an `import` statement to the code.
"""
cs(msg, **kwargs)
Global variables
var EMPTY
-
GUI representation of the empty string.
If an entity feature has the empty string as value, and we want to create a button for it, this is the label we draw on that button.
var LIMIT_BROWSER
-
Limit of amount of buckets to load on one page when in the TF browser.
This is not a hard limit. We only use it if the page contains the whole corpus or a filtered subset of it.
But as soon we have selected a token string or an entity, we show all buckets that contain it, no matter how many there are.
Performance
We use the CSS device content-visibility to restrict rendering to the material that is visible in the viewport. However, this is not supported in Safari, so the performance may suffer in Safari if we load the whole corpus on a single page.
In practice, even in browsers that support this device are not happy with a big chunk of HTML on the page, since they do have to build a large DOM, including event listeners.
That's why we restrict the page to a limited amount of buckets.
But when a selection has been made, it is more important to show the whole, untruncated result set, than to incur a performance penalty. Moreover, it is hardly the case that a selected entity of occurrence occurs in a very large number of buckets.
var LIMIT_NB
-
Limit of amount of buckets to load on one page when in a Jupyter notebook.
See also
LIMIT_BROWSER
. var NONE
-
GUI representiation of an empty value.
Used to mark the fact that an occurrence does not have a value for an entity feature. That happens when an occurrence is not part of an entity.
var SC_ALL
-
Value that indicates all buckets.
var SC_FILT
-
Value that indicates filtered buckets.
var SORTDIR_ASC
-
Value that indicates the ascending sort direction.
var SORTDIR_DEFAULT
-
Default sort direction.
var SORTDIR_DESC
-
Value that indicates the descending sort direction.
var SORTKEY_DEFAULT
-
Default sort key.
var SORT_DEFAULT
-
Default sort key plus sort direction combination.
var STYLES
-
CSS style configuration for entity features.
Here we define properties of the styling of the entity features and their values. Since these features are defined in configuration, we cannot work with a fixed style sheet.
We divide entity features in keyword features and free features. The typical keyword feature is
kind
, it has a limited set of values. The typical free feature iseid
, it has an unbounded number of values.As it is now, we could have expressed this in a fixed style sheet. But if we open up to allowing for more entity features, we can use this setup to easily configure the formatting of them.
However, we should move these definitions to the
ner.yaml
file then, so that the only place of configuration is that YAML file, and not this file. var TOOLKEY
-
The name of this annotation tool.
This name is used
- in directory paths on the file system to find the data that is managed by this tool;
- as a key to address the in-memory data that belongs to this tool;
- as a prefix to modularize the Flask app for this tool within the encompassing TF browser Flask app and also it CSS files.
Classes
class Settings
-
Provides configuration details.
There is fixed configuration, that is not intended to be modifiable by users. These configuration values are put in variables in this module, which other modules can import.
There is also customisable configuration, meant to adapt the tool to the specifics of a corpus. Those configuration values are read from a YAML file, located in a directory
ner
next to thetf
data of the corpus.Expand source code Browse git
class Settings: def __init__(self): """Provides configuration details. There is fixed configuration, that is not intended to be modifiable by users. These configuration values are put in variables in this module, which other modules can import. There is also customisable configuration, meant to adapt the tool to the specifics of a corpus. Those configuration values are read from a YAML file, located in a directory `ner` next to the `tf` data of the corpus. """ specDir = self.specDir nerSpec = f"{specDir}/config.yaml" kwargs = ( dict(asFile=nerSpec) if fileExists(nerSpec) else dict(text=DEFAULT_SETTINGS) ) settings = readYaml(preferTuples=True, **kwargs) settings.entitySet = (settings.entitySet or "entity-nodes").format( entityType=settings.entityType ) self.settings = settings features = self.settings.features keywordFeatures = self.settings.keywordFeatures self.settings.summaryIndices = tuple( i for i in range(len(features)) if features[i] in keywordFeatures ) def console(self, msg, **kwargs): """Print something to the output. This works exactly as `tf.core.helpers.console` It is handy to have this as a method on the Annotate object, so that we can issue temporary console statements during development without the need to add an `import` statement to the code. """ cs(msg, **kwargs)
Subclasses
Methods
def console(self, msg, **kwargs)
-
Print something to the output.
This works exactly as
console()
It is handy to have this as a method on the Annotate object, so that we can issue temporary console statements during development without the need to add an
import
statement to the code.Expand source code Browse git
def console(self, msg, **kwargs): """Print something to the output. This works exactly as `tf.core.helpers.console` It is handy to have this as a method on the Annotate object, so that we can issue temporary console statements during development without the need to add an `import` statement to the code. """ cs(msg, **kwargs)