Module `tf.ner.scopes`

Scope handling

Scopes are a column in entity spreadsheets that limit the effects of entity triggers. This module defines data structures for scopes and some fundamental operations on them, such as representing them as string, parsing them from strings, comparing and sorting them.

A scope specification is a comma-separated list of interval strings.

An interval string is either a section or an interval between two sections.

A section is written as a section heading, an interval as two section headings with a - in between.

An interval is taken from the start of the first section to the end of the second section.

A section on its own is taken from its start to its end.

Examples:

3 is the whole of section 3
3-3.4 is from the start of section 3 to the end of section 3.4
3.4-3 is from the start of section 3.4 to the end of section 3
3.4-5 is from the start of section 3.4, through the rest of section 3, through the whole of section 4, till the end of section 5

A section heading is written as it appears when TF represents sections. If you browse the corpus in the TF browser you'll see how those sections are represented. Even if section headings are not numeric, TF knows which sections you mean:

Genesis 1:5-Deuteronomy 26:6 if from the start of book Genesis chapter 1 verse 5, through the rest of Genesis, through the books of Exodus and Leviticus, till the end of Deuteronomy chapter 26 verse 6.

Even if the headings given at the boundaries of an interval are numeric, TF knows the exact ordering of all sections, and will fill them in.

Suppose in a corpus you have main sections 1, 4, 3, 3b, 5 in that order, then:

1-5 is from the start of 1, through 4, 3, 3b, to the end of 5
1-4 is sections 1 and 4 only
1-3 is sections 1, 4, 3 only
3-5 is sections 3, 3b and 5 only

Implementation detail: Text-Fabric pre-computes the sequence of all sections in your corpus, and maps them onto a legal numbering system, where each section corresponds to a tuple of sequence numbers. For example, in the Hebrew Bible, Exodus 3:4 is mapped to (2, 3, 4) since Exodus is the second top level section in that corpus.

In the Suriano letters, 04@027:5 is mapped to (3, 27, 5), since the main sections start at 02, there is no 01.

The inverse mapping is also present in the pre-computed data of the corpus. These mappings are from section nodes to legal number tuples and back. The mapping from section node to heading string and back is is done by TF functions sectionStrFromNode() and nodeFromSectionStr().

Functions

def getIntvIndex(buckets, instructions, getSeqFromNode)

Expand source code Browse git

def getIntvIndex(buckets, instructions, getSeqFromNode):
    """Map buckets in the corpus on intervals in a given set of intervals.

    When we are going to look up triggers in the corpus, we do so bucket by bucket.
    The validity of triggers is constrained to their scope. Whenever we leave an
    interval and go to the next, the those scopes change.
    We need a quick way to determine for each bucket in the corpus to which
    interval it belongs.

    Parameters
    ----------
    buckets: iterable of integer
        The nodes corresponding to the lowest level sections in the corpus
    instructions: iterable of tuple of tuple of integer
        This is a sequence of intervals. You may pass the `instructions` member
        of sheet data, since that is a mapping from intervals to search
        instructions for those intervals.
        When treated as an iterbale, such a dict is a sequence of intervals.

        We will sort the intervals before computing the index.
    getSeqFromNode:
        Corpus dependent function that gives the "legal" sequence number for
        sections, passed as nodes. See `tf.ner.corpus.Corpus.getSeqFromNode()`

    Returns
    -------
    dict
        Maps bucket nodes to the interval in the sequence of intervals to which they
        belong.
    """
    intervals = _sortIntervals(x for x in instructions if x != ())

    nIntervals = len(intervals)

    if nIntervals == 0:
        return {bucket: () for bucket in buckets}

    intvIndex = {}
    i = 0
    intv = intervals[i]
    (b, e) = intv

    for bucket in buckets:
        hd = getSeqFromNode(bucket)

        assigned = False

        if _locCmp(b, hd) == 1:
            intvIndex[bucket] = ()
            continue

        while i < nIntervals:
            if _locCmp(e, hd) == -1:
                i += 1

                if i < nIntervals:
                    intv = intervals[i]
                    (b, e) = intv
                else:
                    intvIndex[bucket] = ()
                    assigned = True
                    break
            else:
                intvIndex[bucket] = () if _locCmp(b, hd) == 1 else intv
                assigned = True
                break

        if not assigned:
            intvIndex[bucket] = ()

    return intvIndex

Map buckets in the corpus on intervals in a given set of intervals.

When we are going to look up triggers in the corpus, we do so bucket by bucket. The validity of triggers is constrained to their scope. Whenever we leave an interval and go to the next, the those scopes change. We need a quick way to determine for each bucket in the corpus to which interval it belongs.

Parameters

buckets : iterable of integer

The nodes corresponding to the lowest level sections in the corpus

instructions : iterable of tuple of tuple of integer

This is a sequence of intervals. You may pass the instructions member of sheet data, since that is a mapping from intervals to search instructions for those intervals. When treated as an iterbale, such a dict is a sequence of intervals.

We will sort the intervals before computing the index.

getSeqFromNode: Corpus dependent function that gives the "legal" sequence number for sections, passed as nodes. See Corpus.getSeqFromNode

Returns

dict: Maps bucket nodes to the interval in the sequence of intervals to which they belong.

def locInScope(loc, scope)

Expand source code Browse git

def locInScope(loc, scope):
    """Whether a location (section) is contained in a scope (sequence of intervals).

    Parameters
    ----------
    loc: tuple of integer
        The location, which is a section given by its sequence numbers
    scope: tuple of tuple of tuple of integer
        The scope, given as a tuple of intervals

    Returns
    -------
    boolean
        Whether the location is contained in the scope
    """
    if not len(scope):
        return False

    return any(_locInInterval(loc, intv) for intv in scope)

Whether a location (section) is contained in a scope (sequence of intervals).

Parameters

loc : tuple of integer: The location, which is a section given by its sequence numbers
scope : tuple of tuple of tuple of integer: The scope, given as a tuple of intervals

Returns

boolean: Whether the location is contained in the scope

def partitionScopes(scopeDict)

Expand source code Browse git

def partitionScopes(scopeDict):
    """Partition a set of scopes into intervals.

    The idea is to create a set of intervals such that:

    *   there is no scope boundary within any interval;
    *   every interval has at least one scope boundary at its start and one at its end.

    Parameters
    ----------
    scopeDict: dict
        The scopes are given as a mapping from string representations of scopes to
        logical scopes, i.e. the data structures you get when you parse scopes.

    Returns
    -------
    list
        The sorted sequence of resulting intervals
    """
    scopeFromStr = {}
    strFromScope = {}
    boundaries = {}
    intervals = []

    for scopeStr, scope in scopeDict.items():
        for intv in scope:
            (b, e) = intv
            boundaries.setdefault(b, {}).setdefault("b", set()).add(intv)
            boundaries.setdefault(e, {}).setdefault("e", set()).add(intv)

            scopeFromStr.setdefault(scopeStr, set()).add(intv)
            strFromScope.setdefault(intv, set()).add(scopeStr)

    curScopes = set()
    inScope = False

    for x in _sortLocs(boundaries):
        beginScopes = boundaries[x].get("b", set())
        endScopes = boundaries[x].get("e", set())

        inScope = len(curScopes) > 0

        hasB = len(beginScopes) > 0
        hasE = len(endScopes) > 0

        if hasB and hasE:
            intervals[-1][1] = _prevLoc(x)
            if intervals[-1][1] < intervals[-1][0]:
                intervals.pop()
            curScopes |= beginScopes
            intervals.append([x, x, _sortIntervals(curScopes)])
            curScopes -= endScopes
            inScope = len(curScopes) > 0
            if inScope:
                intervals.append([_nextLoc(x), None, _sortIntervals(curScopes)])
        elif hasB:
            if inScope:
                intervals[-1][1] = _prevLoc(x)
                if intervals[-1][1] < intervals[-1][0]:
                    intervals.pop()
            curScopes |= beginScopes
            intervals.append([x, None, _sortIntervals(curScopes)])
        elif hasE:
            intervals[-1][1] = x
            curScopes -= endScopes
            inScope = len(curScopes) > 0
            if inScope:
                intervals.append([_nextLoc(x), None, _sortIntervals(curScopes)])

    if inScope:
        intervals[-1][1] = x

    for x in intervals:
        newX = []
        seen = set()

        for intv in x[2]:
            for scopeStr in strFromScope[intv]:
                if scopeStr in seen:
                    continue

                seen.add(scopeStr)
                newX.append(scopeStr)

        x[2] = tuple(newX)

    return intervals

Partition a set of scopes into intervals.

The idea is to create a set of intervals such that:

there is no scope boundary within any interval;
every interval has at least one scope boundary at its start and one at its end.

Parameters

scopeDict : dict: The scopes are given as a mapping from string representations of scopes to logical scopes, i.e. the data structures you get when you parse scopes.

Returns

list: The sorted sequence of resulting intervals

Classes

class Scopes

Expand source code Browse git

class Scopes:
    """Functions that do scope handling.

    These functions will be added as methods to the class that inherits this class.
    """

    def repLoc(self, loc):
        """Represent a location as a string.

        A location identifies a section by means of legal number.
        We now want the heading of that section, as given by the features and
        settings of the corpus. We use the corpus dependent function
        `tf.ner.corpus.Corpus.getStrFromSeq()` for that.

        Parameters
        ----------
        loc: tuple of integer
            The "legal" number of a section.

        Returns
        -------
        string
            The section heading of the corresponding section.
        """
        getStrFromSeq = self.getStrFromSeq

        return getStrFromSeq(tuple(x for x in loc if x != 0 and x != -1))

    def parseLoc(self, locStr, plain=True):
        """Parses a location string.

        A location string is a section heading in the corpus.
        Now we want the "legal" number of that section.
        We use the corpus dependent function
        `tf.ner.corpus.Corpus.getSeqFromStr()` for that.

        Parameters
        ----------
        locStr: string
            The section heading as it appears in the corpus
        plain: boolean, optional True
            Whether to return just the result or additional information as well.

        Returns
        -------
        tuple of integer or dict
            The plain result is a tuple of the numbers that make up the "legal"
            number of the section. This is returned when `plain=False` is passed.
            If there are errors, None is returned.

            But if `plain=True` is passed, a dict is returned, with keys **result**
            for the plain result; **warning** for warnings if the parsing failed; and
            **normal** for a normalized representaiton of the section.
        """
        getSeqFromStr = self.getSeqFromStr

        locStr = locStr.strip()

        if not locStr:
            result = ()
            return (
                ()
                if plain
                else dict(result=(), warning=None, normal=self.repLoc(result))
            )

        (error, result) = getSeqFromStr(locStr)

        if error:
            result = None
            normal = None
            warning = error
        else:
            normal = self.repLoc(result)
            warning = None

        return result if plain else dict(result=result, warning=warning, normal=normal)

    def repInterval(self, intv):
        """Represent an interval of sections as string.

        Parameters
        ----------
        intv: tuple of tuple of integer
            The interval given as start and end section tuples

        Returns
        -------
        string
            Either a single section heading, if the start and end section are the same,
            or `-` surrounded by the start and end section headings.
        """
        if intv is None or len(intv) == 0:
            return "()"

        (b, e) = intv

        return (
            self.repLoc(b) if _sameLoc(b, e) else f"{self.repLoc(b)}-{self.repLoc(e)}"
        )

    def repScope(self, scope):
        """Represent a scope as string.

        Parameters
        ----------
        scope: tuple of tuple of tuple of integer
            The scope given as sequence of intervals

        Returns
        -------
        string
            Either a single section heading, if the start and end section are the same,
            or `-` surrounded by the start and end section headings.
        """
        return ", ".join(self.repInterval(intv) for intv in scope)

    def parseInterval(self, intvStr, plain=True):
        """Parses an interval given as string.

        A interval string is two section headings separated by a `-`.
        We parse it into a 2-tuple of the section headings, both also parsed into
        tuples of integers.

        If the resulting section tuples have less than the maximum number of components,
        we fill them up: the start section will be filled up with `0`-s, and the
        end section will be filled up with `-1`-s. This corresponds to the
        interpretation that the start section represent its start point, and
        the end section represent its end point.

        Parameters
        ----------
        intvStr: string
            The interval string
        plain: boolean, optional True
            Whether to return just the result or additional information as well.

        Returns
        -------
        tuple of integer or dict
            The plain result is a 2-tuple of tuples of integer.
            This is returned when `plain=False` is passed.
            If there are errors, None is returned.

            But if `plain=True` is passed, a dict is returned, with keys **result**
            for the plain result; **warning** for warnings if the parsing failed; and
            **normal** for a normalized representaiton of the section.
        """
        result = None
        warnings = []

        intvStr = intvStr.strip()

        if not intvStr:
            result = ((0, 0, 0), (-1, -1, -1))

            return (
                result
                if plain
                else dict(
                    result=result, warning=warnings, normal=self.repInterval(result)
                )
            )

        parts = intvStr.split("-", 1)

        if len(parts) == 1:
            info = self.parseLoc(intvStr, plain=False)
            w = info["warning"]

            if w:
                warnings.append(w)
            else:
                s = info["result"]
                result = (
                    (s + (0, 0, 0))[0:3],
                    (s + (-1, -1, -1))[0:3],
                )
        else:
            part = parts[0].strip()
            info1 = self.parseLoc(part, plain=False)
            w1 = info1["warning"]

            part = parts[1].strip()
            info2 = self.parseLoc(part, plain=False)
            w2 = info2["warning"]

            if w1 or w2:
                if w1:
                    warnings.append(w1)
                if w2:
                    warnings.append(w2)
            else:
                s1 = info1["result"]
                s2 = info2["result"]
                result = (
                    (s1 + (0, 0, 0))[0:3],
                    (s2 + (-1, -1, -1))[0:3],
                )

        normal = None if result is None else self.repInterval(result)

        return result if plain else dict(result=result, warning=warnings, normal=normal)

    def parseScope(self, scopeStr, plain=True):
        """Parse a scope specification into logical specifiers of regions in the corpus.

        A scope specification is a comma-separated list of interval strings.

        An interval string is either a section or an interval between two sections.

        A section is written as a section heading, an interval as two section headings
        with a `-` in between.

        An interval is taken from the start of the first section to the end of the
        second section.

        A section on its own is taken from its start to its end.

        Parameters
        ----------
        scopeStr: string
            The scope string
        plain: boolean, optional True
            Whether to return just the result or additional information as well.

        Returns
        -------
        tuple of tuple of integer or dict
            The plain result is a tuple of 2-tuple of tuples of integer.
            This is returned when `plain=False` is passed.
            If there are errors, None is returned.

            But if `plain=True` is passed, a dict is returned, with keys **result**
            for the plain result; **warning** for warnings if the parsing failed; and
            **normal** for a normalized representaiton of the section.
        """

        results = []
        warnings = []

        if not scopeStr:
            return () if plain else dict(result=(), warning=warnings, normal="")

        for intvStr in scopeStr.split(","):

            if not intvStr:
                continue

            info = self.parseInterval(intvStr, plain=plain)
            result = info if plain else info["result"]

            if result is None:
                if not plain:
                    warnings.extend(info["warning"])
                continue

            results.append(result)

        results = tuple(_sortIntervals(results))

        return (
            results
            if plain
            else dict(result=results, warning=warnings, normal=self.repScope(results))
        )

    def intersectScopes(self, *scopeStrs):
        """Produce the intersection of severel scopes.

        We use this function to test whether two triggers have a region where they
        are both in scope.

        Parameters
        ----------
        scopeStrs: iterable
            Sequence of scope specifiers

        Returns
        -------
        tuple of tuple of tuple of integer
            This is a tuple of intervals, forming the intersection of all given scopes
        """
        curIntersection = [self.parseInterval("")]

        for scopeStr in scopeStrs:
            newIntersection = []
            for bLoc, eLoc in self.parseScope(scopeStr):

                for ibLoc, ieLoc in curIntersection:
                    if _locCmp(ieLoc, bLoc) == -1:
                        # ieLoc < bLoc
                        continue
                    if _locCmp(ibLoc, eLoc) == 1:
                        # ibLoc > eLoc
                        break

                    # now
                    # bLoc <= ieLoc
                    # ibLoc <= eLoc
                    newIbLoc = _locMax(ibLoc, bLoc)
                    newIeLoc = _locMin(ieLoc, eLoc)
                    newIntersection.append((newIbLoc, newIeLoc))

            curIntersection = newIntersection

        return tuple(curIntersection)

    def testPartitioning(self, scopeStrs):
        """Test the partitioning of scopes.

        Only for debugging purposes.

        Parameters
        ----------
        scopeStrs: iterable of string
            The scope specifiers

        Returns
        -------
        list
            A list of intervals that make up the resulting partition.
        """
        scopeIndex = {}

        for scopeStr in scopeStrs:
            info = self.parseScope(scopeStr, plain=False)
            warning = info["warning"]

            if len(warning):
                console(f"Errors in {scopeStr}: {'; '.join(warning)}")
            else:
                scopes = info["result"]
                normScopeStr = info["normal"]
                console(
                    f"{scopeStr} => {normScopeStr}\n"
                    f"\t{self.repScope(_sortIntervals(scopes))}"
                )
                scopeIndex[normScopeStr] = scopes

        partitionScopes(scopeIndex)

Functions that do scope handling.

These functions will be added as methods to the class that inherits this class.

Subclasses

Sheets

Methods

def intersectScopes(self, *scopeStrs)

Expand source code Browse git

def intersectScopes(self, *scopeStrs):
    """Produce the intersection of severel scopes.

    We use this function to test whether two triggers have a region where they
    are both in scope.

    Parameters
    ----------
    scopeStrs: iterable
        Sequence of scope specifiers

    Returns
    -------
    tuple of tuple of tuple of integer
        This is a tuple of intervals, forming the intersection of all given scopes
    """
    curIntersection = [self.parseInterval("")]

    for scopeStr in scopeStrs:
        newIntersection = []
        for bLoc, eLoc in self.parseScope(scopeStr):

            for ibLoc, ieLoc in curIntersection:
                if _locCmp(ieLoc, bLoc) == -1:
                    # ieLoc < bLoc
                    continue
                if _locCmp(ibLoc, eLoc) == 1:
                    # ibLoc > eLoc
                    break

                # now
                # bLoc <= ieLoc
                # ibLoc <= eLoc
                newIbLoc = _locMax(ibLoc, bLoc)
                newIeLoc = _locMin(ieLoc, eLoc)
                newIntersection.append((newIbLoc, newIeLoc))

        curIntersection = newIntersection

    return tuple(curIntersection)

Produce the intersection of severel scopes.

We use this function to test whether two triggers have a region where they are both in scope.

Parameters

scopeStrs : iterable: Sequence of scope specifiers

Returns

tuple of tuple of tuple of integer: This is a tuple of intervals, forming the intersection of all given scopes

def parseInterval(self, intvStr, plain=True)

Expand source code Browse git

def parseInterval(self, intvStr, plain=True):
    """Parses an interval given as string.

    A interval string is two section headings separated by a `-`.
    We parse it into a 2-tuple of the section headings, both also parsed into
    tuples of integers.

    If the resulting section tuples have less than the maximum number of components,
    we fill them up: the start section will be filled up with `0`-s, and the
    end section will be filled up with `-1`-s. This corresponds to the
    interpretation that the start section represent its start point, and
    the end section represent its end point.

    Parameters
    ----------
    intvStr: string
        The interval string
    plain: boolean, optional True
        Whether to return just the result or additional information as well.

    Returns
    -------
    tuple of integer or dict
        The plain result is a 2-tuple of tuples of integer.
        This is returned when `plain=False` is passed.
        If there are errors, None is returned.

        But if `plain=True` is passed, a dict is returned, with keys **result**
        for the plain result; **warning** for warnings if the parsing failed; and
        **normal** for a normalized representaiton of the section.
    """
    result = None
    warnings = []

    intvStr = intvStr.strip()

    if not intvStr:
        result = ((0, 0, 0), (-1, -1, -1))

        return (
            result
            if plain
            else dict(
                result=result, warning=warnings, normal=self.repInterval(result)
            )
        )

    parts = intvStr.split("-", 1)

    if len(parts) == 1:
        info = self.parseLoc(intvStr, plain=False)
        w = info["warning"]

        if w:
            warnings.append(w)
        else:
            s = info["result"]
            result = (
                (s + (0, 0, 0))[0:3],
                (s + (-1, -1, -1))[0:3],
            )
    else:
        part = parts[0].strip()
        info1 = self.parseLoc(part, plain=False)
        w1 = info1["warning"]

        part = parts[1].strip()
        info2 = self.parseLoc(part, plain=False)
        w2 = info2["warning"]

        if w1 or w2:
            if w1:
                warnings.append(w1)
            if w2:
                warnings.append(w2)
        else:
            s1 = info1["result"]
            s2 = info2["result"]
            result = (
                (s1 + (0, 0, 0))[0:3],
                (s2 + (-1, -1, -1))[0:3],
            )

    normal = None if result is None else self.repInterval(result)

    return result if plain else dict(result=result, warning=warnings, normal=normal)

Parses an interval given as string.

A interval string is two section headings separated by a -. We parse it into a 2-tuple of the section headings, both also parsed into tuples of integers.

If the resulting section tuples have less than the maximum number of components, we fill them up: the start section will be filled up with 0-s, and the end section will be filled up with -1-s. This corresponds to the interpretation that the start section represent its start point, and the end section represent its end point.

Parameters

intvStr : string: The interval string
plain : boolean, optional True: Whether to return just the result or additional information as well.

Returns

tuple of integer or dict

The plain result is a 2-tuple of tuples of integer. This is returned when plain=False is passed. If there are errors, None is returned.

But if plain=True is passed, a dict is returned, with keys result for the plain result; warning for warnings if the parsing failed; and normal for a normalized representaiton of the section.

def parseLoc(self, locStr, plain=True)

Expand source code Browse git

def parseLoc(self, locStr, plain=True):
    """Parses a location string.

    A location string is a section heading in the corpus.
    Now we want the "legal" number of that section.
    We use the corpus dependent function
    `tf.ner.corpus.Corpus.getSeqFromStr()` for that.

    Parameters
    ----------
    locStr: string
        The section heading as it appears in the corpus
    plain: boolean, optional True
        Whether to return just the result or additional information as well.

    Returns
    -------
    tuple of integer or dict
        The plain result is a tuple of the numbers that make up the "legal"
        number of the section. This is returned when `plain=False` is passed.
        If there are errors, None is returned.

        But if `plain=True` is passed, a dict is returned, with keys **result**
        for the plain result; **warning** for warnings if the parsing failed; and
        **normal** for a normalized representaiton of the section.
    """
    getSeqFromStr = self.getSeqFromStr

    locStr = locStr.strip()

    if not locStr:
        result = ()
        return (
            ()
            if plain
            else dict(result=(), warning=None, normal=self.repLoc(result))
        )

    (error, result) = getSeqFromStr(locStr)

    if error:
        result = None
        normal = None
        warning = error
    else:
        normal = self.repLoc(result)
        warning = None

    return result if plain else dict(result=result, warning=warning, normal=normal)

Parses a location string.

A location string is a section heading in the corpus. Now we want the "legal" number of that section. We use the corpus dependent function Corpus.getSeqFromStr for that.

Parameters

locStr : string: The section heading as it appears in the corpus
plain : boolean, optional True: Whether to return just the result or additional information as well.

Returns

tuple of integer or dict

The plain result is a tuple of the numbers that make up the "legal" number of the section. This is returned when plain=False is passed. If there are errors, None is returned.

def parseScope(self, scopeStr, plain=True)

Expand source code Browse git

def parseScope(self, scopeStr, plain=True):
    """Parse a scope specification into logical specifiers of regions in the corpus.

    A scope specification is a comma-separated list of interval strings.

    An interval string is either a section or an interval between two sections.

    A section is written as a section heading, an interval as two section headings
    with a `-` in between.

    An interval is taken from the start of the first section to the end of the
    second section.

    A section on its own is taken from its start to its end.

    Parameters
    ----------
    scopeStr: string
        The scope string
    plain: boolean, optional True
        Whether to return just the result or additional information as well.

    Returns
    -------
    tuple of tuple of integer or dict
        The plain result is a tuple of 2-tuple of tuples of integer.
        This is returned when `plain=False` is passed.
        If there are errors, None is returned.

        But if `plain=True` is passed, a dict is returned, with keys **result**
        for the plain result; **warning** for warnings if the parsing failed; and
        **normal** for a normalized representaiton of the section.
    """

    results = []
    warnings = []

    if not scopeStr:
        return () if plain else dict(result=(), warning=warnings, normal="")

    for intvStr in scopeStr.split(","):

        if not intvStr:
            continue

        info = self.parseInterval(intvStr, plain=plain)
        result = info if plain else info["result"]

        if result is None:
            if not plain:
                warnings.extend(info["warning"])
            continue

        results.append(result)

    results = tuple(_sortIntervals(results))

    return (
        results
        if plain
        else dict(result=results, warning=warnings, normal=self.repScope(results))
    )

Parse a scope specification into logical specifiers of regions in the corpus.

A scope specification is a comma-separated list of interval strings.

An interval string is either a section or an interval between two sections.

A section is written as a section heading, an interval as two section headings with a - in between.

An interval is taken from the start of the first section to the end of the second section.

A section on its own is taken from its start to its end.

Parameters

scopeStr : string: The scope string
plain : boolean, optional True: Whether to return just the result or additional information as well.

Returns

tuple of tuple of integer or dict

The plain result is a tuple of 2-tuple of tuples of integer. This is returned when plain=False is passed. If there are errors, None is returned.

def repInterval(self, intv)

Expand source code Browse git

def repInterval(self, intv):
    """Represent an interval of sections as string.

    Parameters
    ----------
    intv: tuple of tuple of integer
        The interval given as start and end section tuples

    Returns
    -------
    string
        Either a single section heading, if the start and end section are the same,
        or `-` surrounded by the start and end section headings.
    """
    if intv is None or len(intv) == 0:
        return "()"

    (b, e) = intv

    return (
        self.repLoc(b) if _sameLoc(b, e) else f"{self.repLoc(b)}-{self.repLoc(e)}"
    )

Represent an interval of sections as string.

Parameters

intv : tuple of tuple of integer: The interval given as start and end section tuples

Returns

string: Either a single section heading, if the start and end section are the same, or - surrounded by the start and end section headings.

def repLoc(self, loc)

Expand source code Browse git

def repLoc(self, loc):
    """Represent a location as a string.

    A location identifies a section by means of legal number.
    We now want the heading of that section, as given by the features and
    settings of the corpus. We use the corpus dependent function
    `tf.ner.corpus.Corpus.getStrFromSeq()` for that.

    Parameters
    ----------
    loc: tuple of integer
        The "legal" number of a section.

    Returns
    -------
    string
        The section heading of the corresponding section.
    """
    getStrFromSeq = self.getStrFromSeq

    return getStrFromSeq(tuple(x for x in loc if x != 0 and x != -1))

Represent a location as a string.

A location identifies a section by means of legal number. We now want the heading of that section, as given by the features and settings of the corpus. We use the corpus dependent function Corpus.getStrFromSeq for that.

Parameters

loc : tuple of integer: The "legal" number of a section.

Returns

string: The section heading of the corresponding section.

def repScope(self, scope)

Expand source code Browse git

def repScope(self, scope):
    """Represent a scope as string.

    Parameters
    ----------
    scope: tuple of tuple of tuple of integer
        The scope given as sequence of intervals

    Returns
    -------
    string
        Either a single section heading, if the start and end section are the same,
        or `-` surrounded by the start and end section headings.
    """
    return ", ".join(self.repInterval(intv) for intv in scope)

Represent a scope as string.

Parameters

scope : tuple of tuple of tuple of integer: The scope given as sequence of intervals

Returns

string: Either a single section heading, if the start and end section are the same, or - surrounded by the start and end section headings.

def testPartitioning(self, scopeStrs)

Expand source code Browse git

def testPartitioning(self, scopeStrs):
    """Test the partitioning of scopes.

    Only for debugging purposes.

    Parameters
    ----------
    scopeStrs: iterable of string
        The scope specifiers

    Returns
    -------
    list
        A list of intervals that make up the resulting partition.
    """
    scopeIndex = {}

    for scopeStr in scopeStrs:
        info = self.parseScope(scopeStr, plain=False)
        warning = info["warning"]

        if len(warning):
            console(f"Errors in {scopeStr}: {'; '.join(warning)}")
        else:
            scopes = info["result"]
            normScopeStr = info["normal"]
            console(
                f"{scopeStr} => {normScopeStr}\n"
                f"\t{self.repScope(_sortIntervals(scopes))}"
            )
            scopeIndex[normScopeStr] = scopes

    partitionScopes(scopeIndex)

Test the partitioning of scopes.

Only for debugging purposes.

Parameters

scopeStrs : iterable of string: The scope specifiers

Returns

list: A list of intervals that make up the resulting partition.