API Reference
Functions
load(path) → Treebank
Load a treebank from file(s). Accepts single files or glob patterns.
tb = ts.load("corpus.conllu")
tb = ts.load("data/**/*.conllu.gz")
from_string(text) → Treebank
Create a treebank from a CoNLL-U string.
tb = ts.from_string("""# text = Hello world.
1 Hello hello INTJ _ _ 0 root _ _
2 world world NOUN _ _ 1 vocative _ _
""")
compile_query(query) → Pattern
Compile a query string into a reusable Pattern. Raises ValueError on syntax error.
pattern = ts.compile_query('MATCH { V [upos="VERB"]; }')
trees(source, ordered=True) → Iterator[Tree]
Read trees from file(s).
for tree in ts.trees("corpus/*.conllu"):
print(tree.sentence_text)
search(source, query, ordered=True) → Iterator[tuple[Tree, dict]]
Search file(s) for pattern matches.
for tree, match in ts.search("corpus.conllu", 'MATCH { V [upos="VERB"]; }'):
verb = tree.word(match["V"])
search_trees(trees, query) → Iterator[tuple[Tree, dict]]
Search Tree object(s) for pattern matches.
tree = next(ts.trees("corpus.conllu"))
for tree, match in ts.search_trees(tree, pattern):
...
Treebank
Collection of trees from one or more files.
treebank.trees(ordered=True) → Iterator[Tree]
Iterate over all trees.
treebank.search(query, ordered=True) → Iterator[tuple[Tree, dict]]
Search for pattern matches. Accepts query string or compiled Pattern.
Parameters:
- ordered: If True (default), results in corpus order. If False, faster but unordered.
Tree
A dependency tree (parsed sentence).
Properties
| Property | Type | Description |
|---|---|---|
sentence_text |
str \| None |
Reconstructed sentence |
metadata |
dict[str, str] |
CoNLL-U comments |
Methods
tree.word(id) → Word- Get word by ID (0-indexed). RaisesIndexErrorif out of range.tree[id] → Word- Same asword(id)len(tree) → int- Number of words
Word
A single word in a tree.
Properties
| Property | Type | Description |
|---|---|---|
id |
int |
0-based index in tree |
token_id |
int |
1-based CoNLL-U ID |
form |
str |
Surface form |
lemma |
str |
Dictionary form |
upos |
str |
Universal POS tag |
xpos |
str \| None |
Language-specific POS |
deprel |
str |
Dependency relation |
head |
int \| None |
Parent word ID (None for root) |
children_ids |
list[int] |
Child word IDs |
feats |
dict[str, str] |
Morphological features |
misc |
dict[str, str] |
Miscellaneous annotations |
Methods
word.parent() → Word | None- Get parent wordword.children() → list[Word]- Get all childrenword.children_by_deprel(deprel) → list[Word]- Get children with specific relation
Pattern
Compiled query pattern (opaque). Created by compile_query(), used with search functions.
Query Language Summary
MATCH {
# Node constraints
V [upos="VERB"]; # by POS
V [lemma="run"]; # by lemma
V [upos="VERB" & lemma="run"]; # multiple (AND)
V [feats.Tense="Past"]; # by feature
V []; # any word
# Negation
V [upos!="VERB"]; # not a verb
V !-[obj]-> _; # no object
# Edges
V -[nsubj]-> N; # specific relation
V -> N; # any relation
# Precedence
V < N; # V immediately before N
V << N; # V anywhere before N
}
See Query Language Reference for complete syntax.