BINSEQ: A Family of High-Performance Binary Formats for Nucleotide Sequences

BINSEQ is a family of simple binary formats that enable high-performance parallel processing of sequencing data. Currently there are two formats, BINSEQ (fixed-length records), and VBINSEQ (variable-length records with optional quality scores). Here we provide libraries for reading and writing these formats (binseq and vbinseq), a command-line tool for converting existing formats into BINSEQ formats (bqtools), and a rust-wrapper of minimap2 with BINSEQ family support (mmr).

Tool Features

Simple read/write interface for BINSEQ files

High-throughput SIMD-accelerated encoding and decoding

Native parallelization with map-reduce API

High-throughput conversions between FASTQ and BINSEQ

Generic alignment for BINSEQ formats with minimap2