What is this project?This is a open source music search engine that operates on data provided by RISM (Répertoire International des Sources Musicales) - the organization that documents various musical sources from around the world. The database contains a snapshot of about 1.7 million incipits from RISM data which can be searched by melodic and rhythmic criteria.
How does it work?All melodies are understood as series of intervals and imagined as vectors in n-dimentional space where n is the number of intervals. Searching is performed by comparing distances between these vectors. The main advantage of the project, which distinguishes it from similar search engines, is the implementation of locality-sensitive hashing (LSH) algorithm which vastly improves performance for most queries. There is also a rhythmic search but it's quite straightforward. You can read more in this article.
Are there other incipit search engines for RISM data?There is a simple search engine provided by RISM official website. There is also a powerful tool named Monochord Project created by Utrecht University which allows you to search melodic variants by similarity. This is exactly what my project tries to achieve but I'm doing it different way. The third project by Akademie der Wissenschaften und der Literatur in Mainz combines data from RISM and two other sources and can be found here.
How is your project compared to Monochord engine?I compare my project with the Monochord project because both projects are equally ambitious and use specific mathematical assumptions. The principles of the Monochord project are based on sequence alignment. My project depends on calculating distances in metrical space. The main disadvantage of my project is that it allows you to search only in the beginning of incipits and it is limited to maximum of 12 intervals. Monochord allows you to search also in the middle of incipits without limitations of query length and provides various rating algorithms. The advantage of my project, however, is performance. Monochord runs on a 32-core system that processes data in parallel - such a powerful system is necessary because all records are processed in every query. My system can be deployed on a potato and still maintain decent performance beause data is pre-filtered by spatial hashes. Currently it works on a virtual server with 1 core and 2GB of RAM.
Besides, my project uses a slightly more recent snapshot of RISM data, containing more incipits.