compressedtrie

package module
v0.1.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 25, 2025 License: MIT Imports: 7 Imported by: 1

README

Compressed Trie

A compressed trie (or radix tree) is a more space efficient form of a trie that retains the fast word and prefix lookup. It has the same big-O performance as a trie but in practice is faster because it uses fewer nodes and stores data in more cache-efficient ways.

Don't just take my word for it, here are some images. Side by side below are the compressed trie (left) and traditional trie (right) representations of the words romane, romanus, romulus, rubens, ruber, rubicon and rubicundus. A double circle represents the end of a word. Note that the compressed trie has 14 nodes, while the more traditional trie has 28. The longest path in the compressed trie is 4 and 10 in the traditional.

Some real world node counts

Num words Trie Nodes Compressed Trie Nodes
270 1063 377
2378 7540 3088
3447 10518 4439
5998 17695 7762
7740 22680 9743
21448 57955 25876
39182 101508 46123
87871 290779 102677
117093 368683 136145
260815 1333420 313641
595111 3182208 707806

How to use

Currently the tree only supports the minimal set of features I needed, Insert(), FindWordsWithPrefix(), Serialize() and Deserialize().

    tree := compressedtrie.NewTree()
    example_words := []string{"test", "toaster", "toasting", "slow", "slowly"}

    for _, word := range example_words {
        tree.Insert(word)
    }

    tree.FindWordsByPrefix("t") // returns []string{"test", "toaster", "toasting"}

The (de-)serialization methods enable offline tree building

    # In your offline builder
    f, err := os.Create("prefixes.ctrie")
    tree.Serialize(f)
    f.Close()
    # In your runtime
    f, err := os.Open("prefixes.ctrie")
    tree := compressedtrie.Deserialize(f)
    f.Close()

Internally Serialize() and Deserialize() use buffered I/O to minimize memory overhead while laying out the file.

Tests

go test .

Some of the tests use test files, which can be updated. Don't forget to check them in!

go test . -update

Documentation

Index

Constants

View Source
const (
	// 32-bit magic number for the serialized tree binary format
	CtreeMagic uint32 = 'C'<<24 | 'T'<<16 | 'R'<<8 | 'E'
	// File format version
	Version uint32 = 1
)

Variables

View Source
var (
	ErrUnsupportedVersion = errors.New("unsupported version of the file format")
	ErrInvalidFormat      = errors.New("invalid file format")
)

Functions

This section is empty.

Types

type Node

type Node struct {
	// contains filtered or unexported fields
}

type SerializedTreeHeader

type SerializedTreeHeader struct {
	Magic   uint32 // magic number (CtreeMagic)
	Version uint32 // file format version
	Nodes   uint32 // number of nodes in the tree
}

type Tree

type Tree struct {
	N int // The number of nodes in the tree
	// contains filtered or unexported fields
}

func DeserializeTree

func DeserializeTree(r io.Reader) (*Tree, error)

DeserializeTree returns a *Tree from an io.Reader. Returns ErrUnsupportedVersion if the serialize format is an unsupported version, ErrInvalidFormat if the file is unrecognized.

func NewTree

func NewTree() *Tree

NewTree creates an empty instance of Tree, ready for word insertion.

func (*Tree) FindWordsWithPrefix

func (t *Tree) FindWordsWithPrefix(prefix string) []string

FindWordsWithPrefix returns all the words in the tree that start with prefix.

func (*Tree) Insert

func (t *Tree) Insert(word string)

Insert adds a word into t.

func (*Tree) Serialize

func (t *Tree) Serialize(w io.Writer) error

Serialize a tree into an io.Writer. The serialized format is binary.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL