• Link: Same as arc • Token: Are same as state. They have costs • FrameToks: A link list that contain all tokens in a single frame • Adaptive Beam: Used in pruning before creating lattice and through decoding • NEmitting Tokens: Non Emitting Tokens or NEmitting Tokens are tokens that generate from emitting token…
A Simplified Block Diagram of ASR Process in Kaldi NGC Nvidia – Kaldi Container Oxinabox – Kaldi Notes KWS14 – Kaldi Lattices
• Costs: Are Log Negative Probability, so a higher cost means lower probability. • Frame: Each 10ms of audio that using MFCC turned into a fixed size vector called a frame. • Beam: Cutoff would be Best Cost–Beam (Around 10 to 16) • Cutoff: The maximum cost that all cost higher than this value will…
Measure Microphone Latency in Linux with Alsa The command below generates a tone signal out of the speaker and receives it back through the mic. Measuring the phase diff will reveal the round-trip latency. alsa_delay hw:1,0 hw:0,0 44100 256 2 1 1 Here hw:1,0 refer to the recording device that can be retrieved from arecord…
Let’s Enhance Kaldi, Here are some links along the way. Look like YouTube is progressing a lot during the last couple of years so basically here is just a bunch of random videos creating my favorite playlist to learn all the cool stuff under the Kaldi’s hood. YouTube Keith Chugg (USC) – Viterbi Algorithm Lim…
Thanks to this marvelous framework, a trained model is at disposal with WER of absolute zero percent over the 10 minutes of continuous speech file. The final piece to this puzzle would be implementing a semi-online decoding tool using GStreamer. As always useful links for further inspection GStreamer – Dynamic pipelines Function that save lives!…
On the way to develop a driver for Scarlet Solo Gen3 to harness the power of Shure SM57 Dynamic Microphone. Useful links to preserve: Microsoft – Universal Audio Architecture: Guideline to for Sound Card Without Propriety Driver Microsoft – Introduction to Port Class Microsoft – AVStream Overview Microsoft – WDM Audio Terminology Microsoft – Kernel…
So the third year has been passed. I mostly worked on developing a couple of hardware projects. Halsey music was a big passion there. Learning all ML cool stuff now is one of my top priority. Combine it with the emerge of Talon, a powerful C2 grammar framework by Ryan Hileman, and wave2letter a game-changing…
Here I am, pursuing once more the old-fashioned machine learning. I’ll keep it short and write down useful links Books Dan Povey – HTK Book Ian Goodfellow – Deep Learning Papers IEEE – Uncertainty Decoding with SPLICE for Noise Robust Speech Recognition YouTube Hannes van Lier – Basic Introduction to Speech Recognition (HMM & Neural…
The combination of FMCOMMS3 and PetaLinux is working only on Ubuntu 16.04 LTS, PetaLinux 2018.3, Vivado 2018.3 Required Packages: sudo apt-get install -y gcc git make net-tools libncurses5-dev tftpd zlib1g-dev libssl-dev flex bison libselinux1 gnupg wget diffstat chrpath socat xterm autoconf libtool tar unzip texinfo zlib1g-dev gcc-multilib build-essential libsdl1.2-dev libglib2.0-dev zlib1g:i386 screen pax gzip Installing…