Published on 2022-01-06 in
Speech Recognition
Published on 2021-11-29 in
Speech Recognition
• Costs: |
Are Log Negative Probability, so a higher cost means lower probability. |
• Frame: |
Each 10ms of audio that using MFCC turned into a fixed size vector called a frame. |
• Beam: |
Cutoff would be Best Cost –Beam (Around 10 to 16) |
• Cutoff: |
The maximum cost that all cost higher than this value will not be processed and removed. |
• Epsilon: |
The zero label in FST are called <eps> |
• Lattices: |
Are the same as FSTs, instead each token keeps in a framed based array calledframe_toks . In This way the distance in time between each token will be perceived too. |
• Rescoring: |
A language model scoring system that applied after final state to improve final result by using stronger LM model than n-gram . |
• HCLG(FST): |
The main FST used in the decoding. The iLabel in this FST is TransitionIDs. |
• Model(MDL): |
A model that used to convert sound into acoustic cost and TransitionIDs. |
• TransitionIDs: |
A number that contain information about state and corresponding PDF id. |
• Emiting States: |
States that have pdfs associated with them and emit phoneme. In other word states that have their ilabel is not zero |
• Bakis Model: |
Is a HMM that state transitions proceed from left to right. In a Bakis HMM, no transitions go from a higher-numbered state to a lower-numbered state. |
• Max Active: |
Uses to calculate cutoff to determince maximum number of tokens that will be processed inside emitting process. |
• Graph Cost: |
is a sum of the LM cost, the (weighted) transition probabilities, and any pronunciation cost. |
• Acoustic Cost: |
Cost that is got from the decodable object. |
• Acoustic Scale: |
A floating number that multiply in all Log Likelihood (inside the decodable object). |
Fig. 1. Demonstration of Finite State Automata vs Lattices, Courtesy of Peter F. Brown
- Stanford University – Speech and Language Processing Book
- IEEE ICASSP – Partial traceback and dynamic programming
Measure Microphone Latency in Linux with Alsa
The command below generates a tone signal out of the speaker and receives it back through the mic. Measuring the phase diff will reveal the round-trip latency.
alsa_delay hw:1,0 hw:0,0 44100 256 2 1 1
Here hw:1,0
refer to the recording device that can be retrieved from arecord -l
and hw:0,0
refer to the playback device. Again can be retrieved from aplay -l
.
The 44100
is the sampling rate. 256
is the buffer size. 256
works best for me. Lower numbers corrupt the test and higher numbers just bring more latency to the table. Don’t know exactly what nfrags
input
and output
arguments are but 2
1
and 1
respectively works magically for me. I just tinkering around and found these numbers. No other number works for me.
My Setup
1. Focusrite Scarlett Solo Latency: 2.5ms
2. Shure SM57 Mic Latency: 2.5ms
3. OverAll Delay: 14ms with non-RT mode
LoopBack
You can tinker around the effect of latency with
pactl load-module module-loopback latency_msec=15
To end the loopback mode
pactl unload-module module-loopback
As Always Useful links
PulseAudio – Latency Control
Arun Raghavan – Beamforming in PulseAudio
Arch Linux Wiki – Professional Audio, Realtime kernel
Published on 2021-08-02 in
Speech Recognition
Let’s Enhance Kaldi, Here are some links along the way. Look like YouTube is progressing a lot during the last couple of years so basically here is just a bunch of random videos creating my favorite playlist to learn all the cool stuff under the Kaldi’s hood.
YouTube
- Keith Chugg (USC) – Viterbi Algorithm
- Lim Zhi Hao (NTU) – WFST: A Nice Channel On Weighted Finite State Transducers
- Dan Povey (JHU) – ICASSP 2011 Kaldi Workshop: Dan Explaining Kaldi Basics
- Luis Serrano – The Covariance Matrix: To Understand GMM Acoustic Modeling
Kaldi
- Mehryar Mohri (NYU) – Speech Recognition with WFST: A joint work of RWTH and NYU
- Mehryar Mohri (NYU), Afshin Rostamizadeh – Foundations of Machine Learning
- George Doddington (US DoD) ICASSP 2011 – Human Assisted Speaker Recognition
- GitHub Kaldi – TED-LIUM Result: GMM, SGMM, Triple Deltas Comparison
- EE Columbia University – Speech Recognition Spring 2016
- D. Povey – Generating Lattices in the WFST : For understanding
LattceFasterDecoder
Notes
Lattices
: A more complex form of FST
‘s, The first version decoders were based on FST’s (like faster-decoder
and online
decoders). For Minimum Bayesian Risk Calculation Using Lattices
will give you a better paved way
faster-decoder
: Old decoder, very simple to understand how decoding process is done
lattice-faster-decoder
: general decoder, same as faster-decoder
but output lattices instead of FST
s
DecodableInterface
: An interface that connects decoder to the features. decoder uses this Decodable
object to pull CMVN features from it.
BestPath
: An FST that constructed from the Best Path (path with maximum likelihood) in the decoded FST.
nBestPath
: An FST constructed from the top N Best Path in the decoded FST.
GetLinearSymbolSequence
: The final step in the recognition process, get a BestPath FST or Lattice and output the recognized words with the path weight. CompactLattice
s need to be converted using ConvertLattice
Strongly Connected Component
: A set that all components are accessible (in two ways) by it’s member.
- The Main Function in Decoder is
ProcessEmitting
that pulls loglikelihood
from the decodable
object
Published on 2021-07-28 in
Speech Recognition
Thanks to this marvelous framework, a trained model is at disposal with WER of absolute zero percent over the 10 minutes of continuous speech file. The final piece to this puzzle would be implementing a semi-online decoding tool using GStreamer. As always useful links for further inspection
- GStreamer – Dynamic pipelines
- Function that save lives!
gst_caps_to_string(caps)
- GStreamer – GstBufferPool
- StackOverFlow – Gstreamer gst_buffer_pool_acquire_buffer function is slow on ARM
- GitHub – Alumae: GST-Kaldi-NNet2-Online
- StackOverFlow – How to create GstBuffer
Published on 2021-05-21 in
Software,
Windows
Published on 2020-12-08 in
Speech Recognition
The combination of FMCOMMS3 and PetaLinux is working only on Ubuntu 16.04 LTS, PetaLinux 2018.3, Vivado 2018.3
Required Packages:
sudo apt-get install -y gcc git make net-tools libncurses5-dev tftpd zlib1g-dev libssl-dev flex bison libselinux1 gnupg wget diffstat chrpath socat xterm autoconf libtool tar unzip texinfo zlib1g-dev gcc-multilib build-essential libsdl1.2-dev libglib2.0-dev zlib1g:i386 screen pax gzip
Installing PetaLinux
Create a new directory
sudo mkdir -m 755 PetaLinux
sudo chown bijan ./PetaLinux
Install PetaLinux by running the following command.
./petalinux-v2018.3-final-installer.run .
Building Vivado Project
Clone Analog Devices HDL repository
git clone https://github.com/analogdevicesinc/hdl.git
git clone https://github.com/analogdevicesinc/meta-adi.git
Make HDL Project
export PATH="$PATH:/mnt/hdd1/Vivado/Vivado/2018.3/bin"
make fmcomms2.zc702
Creating a New PetaLinux Project:
source ../settings.sh
petalinux-create --type project --template zynq --name fmcomms3_linux
Then change directory to the created project directory.
petalinux-config --get-hw-description=<hdf file directory>
set Subsystem AUTO Hardware Settings -> Advanced bootable
images storage setting -> u-boot env partition settings -> image
storage media -> primary sd
/home/bijan/Projects/ADI_Linux/meta-adi/meta-adi-core
/home/bijan/Projects/ADI_Linux/meta-adi/meta-adi-xilinx
Download following files and write it down to meta-adi/meta-adi-xilinx/recipes-bsp/device-tree/files
device-tree.bbappend
pl-delete-nodes-zynq-zc702-adv7511-ad9361-fmcomms2-3.dtsi
zynq-zc702-adv7511-ad9361-fmcomms2-3.dts
Build PetaLinux:
To build petalinux run following command inside petalinux directory
petalinux-build
In case of error remove -e from first line of system-user.dtsi
file inside build/tmp/work/plnx_zynq7-xilinx-linux-gnueabi/device-tree/xilinx+gitAUTOINC+b7466bbeee-r0/system-user.dtsi
Program ZC-702 FPGA Board Through JTAG
Install Digilent Drivers
<Vivado Install Dir>/data/xicom/cable_drivers/lin64/install_script/install_drivers/install_drivers
To program the board using jtag interface. First we should package the kernel with the following command.
petalinux-package --boot --fsbl images/linux/zynq_fsbl.elf --fpga images/linux/system.bit --u-boot --force
Then login to the root account and run following commands.
petalinux-package --prebuilt --fpga images/linux/system.bit --force
petalinux-boot --jtag --prebuilt 3 -v
petalinux-boot --jtag --fpga --bitstream images/linux/system.bit
Program ZC-702 FPGA Board Through SD-Card
Enable SW16.3 & SW16.4 on ZC702 Board.
Generate BOOT.BIN file by executing following command:
petalinux-package --boot --fsbl images/linux/zynq_fsbl.elf --fpga images/linux/system.bit --u-boot --force
copy image.ub and BOOT.BIN to SD-Card
Customize Username and Password
To change username and password open
meta-adi/meta-adi-xilinx/recipes-core/images/petalinux-user-image.bbappend
Change analog
to your desired password. If you want to remove login requirement comment EXTRA_USERS_PARAMS
and enable debug-tweak
in petalinux-config -c rootfs
.
Change UART BaudRate
To change UART baudrate run
petalinux-config
go to Subsystem AUTO Hardware Settings -> Serial Settings -> System stdin/stdout baudrate
Useful Links
Analog Wiki – Building with Petalinux
Analog Wiki – HDL Releases
GitHub – Analog Device No OS
Published on 2017-08-28 in
Software,
Windows
I use AutoHotkey in almost all my application but I notice some Qt applications (Like ADS) had some difficulty interpreting the AHK keys correctly. The problem turns out to be affiliated with UAC and Admin rights rather than the Qt library. To solve the issue simply add following lines to the top of your AHK scripts that you want to be applied on the specific app.
#SingleInstance Force
SetWorkingDir %A_ScriptDir%
if not A_IsAdmin
Run *RunAs "%A_ScriptFullPath%" ; Run Script as admin
This problem arises when you open a software instance with the admin rights while the AHK script doesn’t spawn with the same permission level. The above script simply run AHK with admin rights too so whether application is with or without admin right AHK script will always be able to set the key bindings.
Team Solid Squad
Published on 2017-05-27 in
Linux
One of the great feature of XInput is “Mouse Wheel Emulation” that I graceful with it for more than 5 years. Lately I notice it relieve my hand If I could trig wheel emulation by a key on keyboard instead of a mouse button.
Unfortunately up to this time Evdev not support that and pushing a mouse button through a synthetic way (like using bash script or liunx API) won’t trick EvDev. The reason behind this is that Evdev Wheel Emulation is a device specific driver. you won’t be able to enable it on all devices so it only executed if a button is pressed only on the specific device that you enable emulation on . Moreover after the triggering, wheel emulation only apply to movement of this specific devices with trigger. The figure below show two scenario that lead to unsuccessful wheel emulation.
In a nutshell if you press mouse button source from a bash script emulation not work because EvDev driver won’t get it at all thus the button press have to come from a event-based device and not a complete virtual software. Additionally the movement and button press should come from same device. So as shown in the image, movement on Mouse #1 won’t do the emulation triggered from Mouse #2. To recap:
- The button press should come from an event-based device
- Trigger and movement should be on same device
Solution: Uinput-Mapper
Uinput is a kernel module to create event-based device. Uinput-Mapper is a wrapper around uinput in python that can create duplicate from a physical device and then manipulate it. Unfortunately python and Uinput-Mapper are slow and that can cause unexpected lag on mouse movement so the idea in here is to only use Uinput-Mapper under necessary cicumstances.
Uinput-Mapper can be cloned from this github repositories. If you get a glimpse of uinput-mapper, you notice two files input-read
and input-create
. input-read is a python script that read all event from a event-based device and spit all events out in form of pickle to stdout. For those who like me that don’t know what pickle is, pickle is a saving format. It can save and restore any type of variable so with power of pickle you can save a complex class and restor all variables and function in the state that it had been saved before.
Now what is input-create? input-create is twin of input-read that first create a event-base device and then get all event from stdin in form of pickle and then execut the events on a virtual device that has been just created by the script. To summerize input-read capture all events from an actual device and convert it to pickle, and then input create duplicate device and execute captured events coming from input-read.
So you had to pipeline all output of input read to input-create with something like command shown below
$ ./input-read -D /dev/input/event3 | ./input-create
The -D
option in above command imply that input-read
output is in format of pickle(not in a human readable format).
Keycode Based Shortcut Trigger
Now the time hase come to manipulate uinput-mapper to our requiment. I think posting all codes in here will clutter the post structure so I explain basic of modification applied on uinput-mapper and leave all codes to my github Bijoux repository.
First we need a script to trigger on pressing a key on keyboard. For sake of simplicity and because of my extra experience with bash I use bash over python to do the key detection and then used stdio to pass it to input-read. The big picture here is first using input-mapper we create a duplicate of our physical mouse. How? by running exact command mentioned above. Then we disable the physical mouse in XInput by running
$ xinput disable $MOUSE_ID
Now instead of data passing directly to EvDev, it pipelined through synthetic event-base device and then pass to EvDev. Here the trick is to write a script (we call it shortcut from now on) to detect a shortcut and then when a shortcut is detected we inject a synthetic event into virtual mouse that we have been created previously. Now from EvDev perspective, this won’t be any different from the event Physical Mouse #1 generate so it trigger the emulation wheel and moving in x/y direction actually scroll the screen on the active window. The figure below demostrate the forgoing concept.
shortcut script detect a key press -> send a notification to input-read -> input-read load a previously recorded button press -> send recorded event to uinput-create -> uinput-create execute synthethic event as the same as other event come from mouse #1 -> …
Performance Issue
As remark previously, python scripts are slow intrinsically passing all mouse events through it cannot be accepted from performance point of view. Furthuremore in my brief test script runtime speed create not dramatic but noticable lag on mouse movement. To solve this issue bridge solution described above should only applied if and only if the wheel emulation should excute on that time. To do that following vivid shortcut (script) clears how this can be done.
The script is nothing more than a program than switch between physical mouse and virtual mouse on the fly. This is accomplished by using enable/disable function of XInput
#!/bin/bash
MOUSE_ID=($(xinput list | grep -m 1 MOUSE_MODEL | awk -F "=" '{print $2}' | awk -F " " '{print $1}'))
VIRTUAL_ID=14
while read -r line
do
#echo $line
if [ "$line" == "key press 108" ];then
echo 1 #inform input-read
xinput disable $MOUSE_ID
xinput enable $VIRTUAL_ID
while read -r line
do
if [ "$line" == "key release 108" ];then
echo 0 #inform input-read
xinput enable $MOUSE_ID
xinput disable $VIRTUAL_ID
break
else
break
fi
done < <(xinput test 12)
fi
done < <(xinput test 12)