ESP32 VoIP/RTP pager and Scream receiver
Very simple RTP pager (audio receiver) based on ESP32 SoC. See also ESP8266 rtp pager for general description.
Application is using settings stored as JSON file in SPIFFS filesystem. This might be overkill as only WiFi network SSID and password are stored at the moment, but it is easy to extend and both backward and forward compatible if reasonable default are used.
Application is able to receive RTP streams encoded with G.722, G.711a and G.711u. Receiving port if fixed in firmware to 4000.
Apart from unicast address, receiver also joins multicast group at address 188.8.131.52. Most desk VoIP phones with "BLF" keys can be used as transmitters.
tSIP softphone works also (either wav file or default audio source device can be used with each programmed button):
Including WiFi network SSID/password in source code is not practical, thus I would recommend using SmartConfig - application for android.
SmartConfig code on ESP32 is started when SSID/password is not set, otherwise is not active.
Application settings can be reset to default values (empty SSID, empty password) by pulling low GPIO27 during startup. Once SSID/password SmartConfig is activated so they can be set again.
Firmware was built with ESP-IDF 3.3. I was also using Code::Blocks for editing code, so there is C::B project, but it can be ignored completely.
For loading firmware programming either ESP-IDF can be used (e.g. idf.py -p COM4 flash from ESP-IDF command prompt)
or ESP32 download tool, as with firmware files and load addresses shown below (or as listed in build/flasher_args.json). For popular
ESP32 devkit (I have 30-pin variant) press Start in application, then long press "BOOT" button on PCB.
To compile, change directory to project directory from ESP-IDF command prompt and use:
Very basic version might use internal DAC:
Audio quality from internal DAC might be dissatisfying - noise level is pretty high. It might be compared to AM radio reception.
Firmware using ESP32 internal DAC: esp32_rtp_pager_dac.zip.
External I2S codec
Let's use proper I2S codec then. These PCM5102A modules cost less than $4:
There are two "S2RE" LDO regulators on the board - it can be powered from 3.3V or 5V.
PCM5102 can work with or without master clock / system clock (reducing number of connections to bit clock, word strobe clock and data).
The device starts up expecting an external SCK input, but if BCK and LRCK start correctly while SCK remains at ground level for 16 successive LRCK periods, then the internal PLL starts, automatically generating an internal SCK from the BCK reference.
SCK can be either tied to the ground manually or with solder jumper on the top side.
There are four three-state solder jumbers on the bottom side. Default settings seems to be fine. Same signals are available on the goldpin header if anyone would want to change these settings dynamically - soft mute in particular:
-  FLT - Filter select : Normal latency (Low) / Low latency (High)
-  DEMP - De-emphasis control for 44.1kHz sampling rate: Off (Low) / On (High)
-  XSMT - Soft mute control: Soft mute (Low) / soft un-mute (High)
-  FMT - Audio format selection : I2S (Low) / Left justified (High)
PCM5102 datasheet recommends minimum output load equal to 1kOhm, although there are already two 470Ohm resistors in series with outputs on the board and module seems to be able low-impedance headphones directly with no issues. Audio outputs are also available on 2.54mm header.
ESP32 I2S pins can be remapped creating nice looking 1:1 layout match with PCM5102 module header:
- D4 - BCK
- D2 - DIN
- D15 - LCK (LRCK / WS)
- GND - GND
- 3V3 - VIN
To avoid mistakes I would not solder pin for SCK at all - using solder jumper on top side instead.
Accidentally D2 pin is also connected to LED (blue one on my board), giving visual indication if something else than silence is transmitted to codec.
Two boards could be soldered together using single header, connected with jumper cables, connected with 5 jumpers, but here is something more rare - x8 jumper block from old network card:
Basic (still using 16ksps sampling rate) firmware with I2S output: esp32_rtp_pager_i2s.zip.
Version with larger pre-buffering time (800ms): esp32_rtp_pager_i2s_800ms_prebuffer.zip.
Scream virtual sound card
Scream is a virtual sound card emitting samples as uncompressed UDP stream.
I have tested version 3.3.
Few minor issues with Scream installation and running:
- installation (Install.bat) was not detecting correctly that is running as administrator on one of my PCs running Windows 7 x64; comment/remove checking for administrator rights from this file if it complains about lack of it wrongly
- Scream was sending packets through VirtualBox network interface instead of "correct" one; brutal yet simple solution was just disabling VirtualBox network adapter when not in use
- Scream sound card ("speakers") was muted by default
ESP32 is listening on port 4010, joining 184.108.40.206 multicast group address (default Scream configuration) and is expecting stereo 16-bit 44100 sps stream (this was default in my Scream installation).
Hardware: ESP32 + PCM5102 as with RTP pager above.
In my limited testing ESP32 (or WiFi link itself?) seems to be dropping substantial number of packets in this application and might not be acceptable as long-term replacement for real sound card. Most of the time sound quality is acceptable (single packet loss would not be significant) but once jitter buffer gets empty there is short but annoying break.
Packet loss seems to depend on many factors. I've received fairly good results (about 1 buffer underrun event for 4 minutes) with following setup:
- Pentagram Cerberus router/AP (my basic router from operator has most of the settings locked)
- Scream set to unicast mode (see https://tools.ietf.org/id/draft-mcbride-mboned-wifi-mcast-problem-statement-01.html)
- radio channel set to 13, probably least used in dense city block