Technical information and specifications
Neurotechnology Voice Verification system provides advanced capabilities for speaker recognition applications, including high-level API for all operations. There are also certain requirements for sound settings, environment constraints amd user behavior.
Voiceprint verification operation can be performed both on client side through the SDK component and on server side through the Web Service component.
SDK component specifications
The Voice Verification system architecture requires to account the performed operations on integrator's or end-user's server:
- Integrators should ensure that encrypted connection is used for communications with the server.
- No biometric information is sent to the server during all operations performed using the SDK API, which require communication with the server. The biometric data is kept on the client-side, only transaction accounting information is sent to and received from the server.
The following operations are available via the high-level API of the SDK:
-
Voiceprint template creation – a voice sample is captured from microphone and the voiceprint template is extracted for further usage in the voiceprint verification operation.
- The server returns proprietary encrypted data as a result of an enrolment transaction that has been completed successfully.
- The template may be saved to any storage (database, file etc) together with custom metainformation (like person's name etc.). Note that the storage functionality is not part of the Voice Verification system, although the programming samples include an example of such implementation).
- Voice verification – a voice sample is captured from microphone and is verified against the voiceprint template which was created during the voiceprint template creation operation.
- Template import – a voice sample can be imported into the application, based on Neurotechnology Voice Verification system. Later this template can be used for voiceprint verification operation in the same way, as the native templates from the voiceprint template creation operation.
Web Service Component specifications
The Voice Verification system architecture requires to account the performed operations on the server-side.
The following operations are available via the high-level API of the component:
-
Voiceprint template creation – a voice sample is captured from microphone and the voiceprint template is extracted for further usage in the voiceprint verification operation.
- The template is saved to server together with custom metainformation (like person's name etc.).
- Voice verification – a voice sample is captured from a web stream is verified against the voiceprint template which was created during the voiceprint template creation operation.
- Template import – a voice sample can be imported into the application, based on Neurotechnology Voice Verification system. Later this template can be used for voiceprint verification operation in the same way, as the native templates from the voiceprint template creation operation.
Basic Recommendations for voice
The speaker recognition accuracy depends on the audio quality during enrollment and identification.
- Voice samples of at least 2-seconds in length are recommended to assure speaker recognition quality.
- A passphrase should be kept secret and not spoken in an environment where others may hear it if the speaker recognition system is used in a scenario with unique phrases for each user.
- The text-independent speaker recognition may be vulnerable to attack with a covertly recorded phrase from a person. Passphrase verification or two-factor authentication (i.e. requirement to type a password) will increase the overall system security.
-
Microphones – there are no particular constraints on models or manufacturers when using regular PC microphones, headsets or the built-in microphones in laptops, smartphones and tablets. However these factors should be noted:
- The same microphone model is recommended (if possible) for use during both enrollment and recognition, as different models may produce different sound quality. Some models may also introduce specific noise or distortion into the audio, or may include certain hardware sound processing, which will not be present when using a different model. This is also the recommended procedure when using smartphones or tablets, as different device models may alter the recording of the voice in different ways.
- The same microphone position and distance is recommended during enrollment and recognition. Headsets provide optimal distance between user and microphone; this distance is recommended when non-headset microphones are used.
- Web cam built-in microphones should be used with care, as they are usually positioned at a rather long distance from the user and may provide lower sound quality. The sound quality may be affected if users subsequently change their position relative to the web cam.
-
Sound settings:
- Settings for clear sound must be ensured; some audio software, hardware or drivers may have sound modification enabled by default. For example, the Microsoft Windows OS usually has, by default, sound boost enabled.
- A minimum 8000 Hz sampling rate, with at least 16-bit depth, should be used during voice recording.
- Environment constraints – the speaker recognition engine is sensitive to noise or loud voices in the background; they may interfere with the user's voice and affect the recognition results. Usually, a quiet environment for enrollment and recognition is enough.