Posts

Breaking ntHash (to better fix it)

#bioinfo #hashing

NtHash is a popular method for hashing k-mers in bioinformatics, yet it has some surprising flaws. In this post, I walk through a few of them, and show that they can arise naturally, without an adversarial setup. We'll also discuss how to fix each of these flaws.

Median trick and sketching

#math #sketching

In this post, I'd like to give some intuition about a useful technique from statistics which has many applications for randomized and sketching algorithms: the median trick.