edit:
There appears to be more information in a variety of the creator's videos, but it looks like the projection from birdsong-to-3D is (probably simplified) taking the Mel spectrogram (40 features from this, unclear as to what they are) and passing it to PCA to get 3D vectors.
I wonder if it is captivating simply because it syncs cool graphics to audio, like those Winamp visualization filters in the old days.
https://www.youtube.com/watch?v=cPFmcVtGnh0
edit: There appears to be more information in a variety of the creator's videos, but it looks like the projection from birdsong-to-3D is (probably simplified) taking the Mel spectrogram (40 features from this, unclear as to what they are) and passing it to PCA to get 3D vectors.
The creator discusses a real-time version here: https://www.youtube.com/watch?v=prhQqpAxrm8