API Reference
Functions
load(path) → Treebank
Load a treebank from file(s). Accepts single files or glob patterns.
tb = ts.load("corpus.conllu")
tb = ts.load("data/**/*.conllu.gz")
from_string(text) → Treebank
Create a treebank from a CoNLL-U string.
tb = ts.from_string("""# text = Hello world.
1 Hello hello INTJ _ _ 0 root _ _
2 world world NOUN _ _ 1 vocative _ _
""")
compile_query(query) → Pattern
Compile a query string into a reusable Pattern. Raises ValueError on syntax error.
pattern = ts.compile_query('MATCH { V [upos="VERB"]; }')
trees(source, ordered=True) → Iterator[Tree]
Read trees from file(s).
for tree in ts.trees("corpus/*.conllu"):
print(tree.sentence_text)
search(source, query, ordered=True) → Iterator[tuple[Tree, dict]]
Search file(s) for pattern matches.
for tree, match in ts.search("corpus.conllu", 'MATCH { V [upos="VERB"]; }'):
verb = tree.word(match["V"])
search_trees(trees, query) → Iterator[tuple[Tree, dict]]
Search Tree object(s) for pattern matches.
tree = next(ts.trees("corpus.conllu"))
for tree, match in ts.search_trees(tree, pattern):
...
to_displacy(tree) → dict
Convert a Tree to displaCy format for visualization.
data = ts.to_displacy(tree)
# Returns: {'words': [...], 'arcs': [...]}
render(tree, **options) → str
Render a Tree as SVG using displaCy. Requires spaCy (pip install treesearch-ud[viz]).
svg = ts.render(tree)
ts.render(tree, jupyter=True) # Display in Jupyter
Treebank
Collection of trees from one or more files.
treebank.trees(ordered=True) → Iterator[Tree]
Iterate over all trees.
treebank.search(query, ordered=True) → Iterator[tuple[Tree, dict]]
Search for pattern matches. Accepts query string or compiled Pattern.
Parameters:
- ordered: If True (default), results in corpus order. If False, faster but unordered.
treebank.filter(query, ordered=True) → Iterator[Tree]
Find trees that have at least one match. More efficient than search() when you only need matching trees, not bindings—stops after first match per tree.
# Count trees with passive constructions
count = sum(1 for _ in treebank.filter('MATCH { V []; V -[aux:pass]-> _; }'))
# Get matching trees for further processing
for tree in treebank.filter(pattern):
print(tree.sentence_text)
Tree
A dependency tree (parsed sentence).
Properties
| Property | Type | Description |
|---|---|---|
sentence_text |
str \| None |
Reconstructed sentence |
metadata |
dict[str, str] |
CoNLL-U comments |
Methods
tree.word(id) → Word- Get word by ID (0-indexed). RaisesIndexErrorif out of range.tree[id] → Word- Same asword(id)len(tree) → int- Number of wordstree.to_displacy() → dict- Convert to displaCy formattree.render(**options) → str- Render as SVG (requires spaCy)
Word
A single word in a tree.
Properties
| Property | Type | Description |
|---|---|---|
id |
int |
0-based index in tree |
token_id |
int |
1-based CoNLL-U ID |
form |
str |
Surface form |
lemma |
str |
Dictionary form |
upos |
str |
Universal POS tag |
xpos |
str \| None |
Language-specific POS |
deprel |
str |
Dependency relation |
head |
int \| None |
Parent word ID (None for root) |
children_ids |
list[int] |
Child word IDs |
feats |
dict[str, str] |
Morphological features |
misc |
dict[str, str] |
Miscellaneous annotations |
Methods
word.parent() → Word | None- Get parent wordword.children() → list[Word]- Get all childrenword.children_by_deprel(deprel) → list[Word]- Get children with specific relation
Pattern
Compiled query pattern (opaque). Created by compile_query(), used with search functions.
Query Language Summary
MATCH {
# Node constraints
V [upos="VERB"]; # by POS
V [lemma="run"]; # by lemma (exact)
V [lemma=/run.*/]; # by regex (starts with "run")
V [upos="VERB" & lemma="run"]; # multiple (AND)
V [feats.Tense="Past"]; # by feature
V []; # any word
# Regular expressions (automatically anchored)
V [form=/.*ing/]; # ends with -ing
V [upos=/VERB|AUX/]; # VERB or AUX
V [lemma!=/(be|have)/]; # not be or have
# Negation
V [upos!="VERB"]; # not a verb
V !-[obj]-> _; # no object
# Edges
V -[nsubj]-> N; # specific relation
V -> N; # any relation
# Precedence
V < N; # V immediately before N
V << N; # V anywhere before N
}
Regex patterns use /pattern/ syntax and are automatically anchored for full-string matching. Use .* for partial matches: /run.*/ matches "run", "runs", "running".
See Query Language Reference for complete syntax.