Skip to content

API Reference

Functions

load(path) → Treebank

Load a treebank from file(s). Accepts single files or glob patterns.

tb = ts.load("corpus.conllu")
tb = ts.load("data/**/*.conllu.gz")

from_string(text) → Treebank

Create a treebank from a CoNLL-U string.

tb = ts.from_string("""# text = Hello world.
1   Hello   hello   INTJ    _   _   0   root    _   _
2   world   world   NOUN    _   _   1   vocative    _   _
""")

compile_query(query) → Pattern

Compile a query string into a reusable Pattern. Raises ValueError on syntax error.

pattern = ts.compile_query('MATCH { V [upos="VERB"]; }')

trees(source, ordered=True) → Iterator[Tree]

Read trees from file(s).

for tree in ts.trees("corpus/*.conllu"):
    print(tree.sentence_text)

search(source, query, ordered=True) → Iterator[tuple[Tree, dict]]

Search file(s) for pattern matches.

for tree, match in ts.search("corpus.conllu", 'MATCH { V [upos="VERB"]; }'):
    verb = tree.word(match["V"])

search_trees(trees, query) → Iterator[tuple[Tree, dict]]

Search Tree object(s) for pattern matches.

tree = next(ts.trees("corpus.conllu"))
for tree, match in ts.search_trees(tree, pattern):
    ...

Treebank

Collection of trees from one or more files.

treebank.trees(ordered=True) → Iterator[Tree]

Iterate over all trees.

treebank.search(query, ordered=True) → Iterator[tuple[Tree, dict]]

Search for pattern matches. Accepts query string or compiled Pattern.

Parameters: - ordered: If True (default), results in corpus order. If False, faster but unordered.

Tree

A dependency tree (parsed sentence).

Properties

Property Type Description
sentence_text str \| None Reconstructed sentence
metadata dict[str, str] CoNLL-U comments

Methods

  • tree.word(id) → Word - Get word by ID (0-indexed). Raises IndexError if out of range.
  • tree[id] → Word - Same as word(id)
  • len(tree) → int - Number of words

Word

A single word in a tree.

Properties

Property Type Description
id int 0-based index in tree
token_id int 1-based CoNLL-U ID
form str Surface form
lemma str Dictionary form
upos str Universal POS tag
xpos str \| None Language-specific POS
deprel str Dependency relation
head int \| None Parent word ID (None for root)
children_ids list[int] Child word IDs
feats dict[str, str] Morphological features
misc dict[str, str] Miscellaneous annotations

Methods

  • word.parent() → Word | None - Get parent word
  • word.children() → list[Word] - Get all children
  • word.children_by_deprel(deprel) → list[Word] - Get children with specific relation

Pattern

Compiled query pattern (opaque). Created by compile_query(), used with search functions.

Query Language Summary

MATCH {
    # Node constraints
    V [upos="VERB"];              # by POS
    V [lemma="run"];              # by lemma
    V [upos="VERB" & lemma="run"]; # multiple (AND)
    V [feats.Tense="Past"];       # by feature
    V [];                         # any word

    # Negation
    V [upos!="VERB"];             # not a verb
    V !-[obj]-> _;                # no object

    # Edges
    V -[nsubj]-> N;               # specific relation
    V -> N;                       # any relation

    # Precedence
    V < N;                        # V immediately before N
    V << N;                       # V anywhere before N
}

See Query Language Reference for complete syntax.