Treesearch
High-performance dependency treebank querying for corpus linguistics.
Quick Start
pip install treesearch
import treesearch as ts
# Find all passive constructions
query = """
MATCH {
V [upos="VERB"];
V -[aux:pass]-> _;
V -[nsubj:pass]-> Subj;
}
"""
for tree, match in ts.search("corpus/*.conllu", query):
verb = tree.word(match["V"])
subj = tree.word(match["Subj"])
print(f"{subj.form} was {verb.form}: {tree.sentence_text}")
Features
- Query language for structural patterns (nodes, edges, precedence, negation)
- Exhaustive search finds all matches using CSP solving
- Automatic parallelism for multi-file processing
- Memory efficient streaming with string interning
- Transparent gzip support
Documentation
- Tutorial - Complete walkthrough from installation to advanced usage
- Query Language - Full syntax reference
- API Reference - Functions and classes
- Internals - Architecture for contributors
License
MIT