VoicePassport: Secure Voice Identity Verification using Vector Databases and Blockchain Technology

Sergio Sánchez Sánchez
24 min readMay 15, 2024

--

In today’s tech-focused world, keeping our online accounts safe is super important. But regular passwords and PINs can sometimes be easy for bad actors to guess. That’s where VoicePassport comes in! It’s a cool new way to log in using just your voice.

VoicePassport makes logging in easier because you don’t have to remember long, complicated passwords. Instead, you just speak, and it recognizes your unique voice to let you in.

What’s neat about VoicePassport is that it also uses something called blockchain to keep everything super secure. This means some details from your account are stored in a way that’s really hard for anyone to mess with.

VoicePassport is like having a super secure bouncer for your online accounts, making sure only you can get in. It’s a game-changer for staying safe and secure online!

In the rest of this article, we’ll dive deeper into how VoicePassport works, explore its exciting features, and learn how this solution has been crafted. So, grab a cup of coffee and let’s dive in!

Unveiling the Project’s Purpose

VoicePassport is a robust and secure voice authentication system designed to ensure the authenticity of users through their unique voiceprints. Powered by Resemblyzer, VoicePassport leverages advanced voice processing technology to generate voice embeddings, which are compact numerical representations of voice characteristics. These embeddings capture the distinctive features of an individual’s voice in a highly accurate and secure manner.

Using these voice embeddings, VoicePassport employs a similarity search mechanism to authenticate users. By comparing the voice embeddings extracted from an input voice sample with those stored in its database, VoicePassport can determine the likelihood of a match, thereby verifying the identity of the user.

VoicePassport offers a reliable and efficient means of authentication, enabling seamless user access to various applications and services while ensuring a high level of security. With its innovative approach to voice-based authentication, VoicePassport provides a convenient and dependable solution for organizations seeking robust identity verification mechanisms.

Key Components of VoicePassport: Revolutionizing Authentication with Advanced Technology

VoicePassport represents a paradigm shift in authentication systems, leveraging cutting-edge technology to ensure robust security and user-friendly experiences. Let’s delve deeper into its key components:

  • Advanced Voice Authentication: At the core of VoicePassport lies Resemblyzer technology, a state-of-the-art voice analysis tool. This innovative solution analyzes user audio samples to generate unique voice embeddings. These embeddings capture the distinct characteristics of each user’s voice with remarkable precision. By harnessing the power of Resemblyzer, VoicePassport establishes a reliable foundation for user authentication based on voice similarity.
  • Blockchain-Powered Security: VoicePassport integrates blockchain technology to fortify security and immutability in storing user authentication data. Through blockchain integration, each user’s voice authentication details are securely hashed and recorded on the blockchain. This approach establishes a tamper-proof ledger of user interactions, safeguarding authentication data against unauthorized access or manipulation.
  • Efficient Vector Database: VoicePassport utilizes a specialized vector database to store and query voice embeddings derived from user audio samples. This database employs advanced vector similarity search algorithms, enabling rapid and accurate matching of voice patterns for seamless user authentication. By optimizing the storage and retrieval of voice embeddings, VoicePassport ensures efficient and reliable authentication processes.
  • Streamlined Workflow Management: Powered by Apache Airflow, VoicePassport streamlines the authentication process with robust workflow management capabilities. Through automated task orchestration, including audio processing, embedding generation, and database integration, VoicePassport ensures smooth and dependable operation. This streamlined workflow management enhances efficiency and scalability, enabling VoicePassport to meet the authentication needs of diverse applications and services.
  • Seamless Integration: VoicePassport offers a straightforward RESTful API for easy integration with other projects and applications. This RESTful API simplifies the process of integrating voice authentication functionality into existing systems, enabling developers to seamlessly incorporate VoicePassport’s robust authentication capabilities into their applications. Whether it’s web-based platforms, mobile applications, or IoT devices, VoicePassport’s RESTful API ensures smooth integration, allowing developers to enhance the security and usability of their products with minimal effort.

Exploring the VoicePassport Architecture: Seamless Integration for Robust Voice Authentication

In the intricate ecosystem of the VoicePassport platform, a harmonious interplay of various architectural components ensures not only robust voice authentication but also efficient user management functionalities. Let’s embark on a journey to unravel the intricate web of elements that form the backbone of this cutting-edge system:

VoicePassport Architecture
  • Enrollment: At the heart of the VoicePassport platform lies the enrollment process, where users seamlessly register their unique voice profiles by providing audio samples. These samples undergo meticulous analysis by Resemblyzer, an advanced voice analysis tool, to generate distinctive voice embeddings. These embeddings encapsulate the nuanced characteristics of each user’s voice with unparalleled accuracy and security. Once created, these embeddings find their home in a dedicated database, primed for future use in the authentication process.
  • Authentication: The authentication phase of VoicePassport is where the magic truly unfolds. Users articulate a predetermined passphrase, and their voice is meticulously compared against the stored voice embeddings using a sophisticated vector similarity search algorithm. This intricate process meticulously evaluates the resemblance between the user’s voice and the previously registered embeddings. Upon achieving a significant match threshold, the user is seamlessly authenticated, unlocking access to their desired services.
  • Blockchain Verification: As a cornerstone of security, VoicePassport leverages blockchain technology to fortify the authentication process further. Authentication data, including voice embeddings and verification results, undergoes rigorous cryptographic hashing before being permanently etched onto the blockchain. This immutable ledger serves as an impregnable fortress, safeguarding the integrity and authenticity of all user interactions within the VoicePassport ecosystem.
  • Apache Airflow Integration: Driving the orchestration of the entire authentication workflow is Apache Airflow, a robust workflow management tool. From the intricate tasks of audio processing and voice embedding generation to the seamless integration with blockchain protocols, Apache Airflow ensures the flawless execution of each step. Moreover, its centralized monitoring and management capabilities provide invaluable insights, guaranteeing the reliability and scalability of the authentication process.

This architectural approach provides a comprehensive and robust solution for voice authentication, offering an optimal balance of security, efficiency, and user-friendliness for end-users.

Technological Foundations: Building Blocks of VoicePassport Authentication

In order to bring VoicePassport to life, a sophisticated blend of cutting-edge technologies forms its backbone. Each component plays a crucial role in ensuring the seamless operation and robust security of the voice authentication system. Let’s delve into the foundational technologies that power VoicePassport and explore how they contribute to its effectiveness and reliability.

  • Resemblyzer: This advanced voice analysis tool plays a pivotal role in generating voice embeddings. By analyzing input audio samples, Resemblyzer extracts unique characteristics of each user’s voice and converts them into compact numerical representations known as voice embeddings.
  • QDrant: As a vector database, QDrant efficiently stores and manages the voice embeddings generated by Resemblyzer. Its specialized capabilities enable rapid and accurate querying of voice embeddings, facilitating seamless authentication processes based on voice similarity.
  • Flask: Serving as the web framework for building the RESTful API, Flask provides a robust foundation for exposing the functionality of VoicePassport to external applications and services. It facilitates communication between different system components and enables easy access to authentication features.
  • Web3: This Python library facilitates interaction with the Ethereum blockchain, allowing VoicePassport to leverage blockchain technology for secure and immutable storage of authentication data. Through Web3, the system can execute transactions, interact with smart contracts, and retrieve blockchain-related information.
  • Solidity: As the programming language for writing smart contracts on the Ethereum blockchain, Solidity enables the creation of secure and autonomous authentication protocols within VoicePassport. Smart contracts written in Solidity govern the validation and execution of authentication transactions on the blockchain.
  • Polygon PoS: Serving as a scalable Ethereum sidechain, Polygon PoS offers fast and low-cost transactions for VoicePassport. By leveraging Polygon PoS, the solution ensures efficient blockchain interactions while minimizing transaction fees and latency.
  • Alchemy: This analytics platform provides valuable insights into blockchain transactions, enabling users to monitor and analyze authentication data recorded on the Ethereum blockchain. Alchemy enhances transparency and auditability within the VoicePassport ecosystem by offering real-time visibility into blockchain transactions.
  • MinIO: As an object storage service, MinIO stores voice sample files securely within the VoicePassport system. It provides scalable and efficient storage infrastructure for managing large volumes of voice data, ensuring seamless access and retrieval when needed.
  • MongoDB: This NoSQL database serves as the repository for user metadata and authentication data within VoicePassport. MongoDB offers flexibility and scalability in storing structured and unstructured data, enabling efficient management of user profiles and authentication records.
  • Apache Airflow: As a workflow management tool, Apache Airflow automates and orchestrates various tasks within the VoicePassport authentication process. From audio processing to database integration, Airflow streamlines operations, enhances reliability, and enables centralized monitoring of system activities.
  • Docker: Acting as a containerization platform, Docker packages VoicePassport application components into portable and scalable containers. Docker simplifies deployment, ensures consistency across different environments, and enables seamless scaling of the VoicePassport solution.
  • HAProxy: Serving as a load balancer, HAProxy distributes incoming traffic across multiple Docker containers hosting VoicePassport components. It ensures high availability, fault tolerance, and optimal resource utilization, enhancing the scalability and performance of the overall system.

Unraveling the Importance of Blockchain Verification

Blockchain verification is pivotal in ensuring the security, integrity, and transparency of the voice authentication system. Here’s why it’s essential, especially considering the implementation of the VoiceIDVerifier DApp:

  1. Immutable Record: By recording user authentication data on the blockchain via the VoiceIDVerifier DApp, the system creates an immutable and tamper-proof record of all authentication transactions. This ensures that once authentication data is stored, it cannot be altered or deleted, providing a reliable audit trail of user interactions.
  2. Enhanced Security: Through the VoiceIDVerifier DApp, user authentication data, including the hash of the user ID and the hash of the voice audio stored in MinIO, is cryptographically hashed and securely recorded on the blockchain. This robust security measure ensures that sensitive information remains protected from unauthorized access or tampering.
  3. Transparency and Auditability: The decentralized nature of blockchain technology, facilitated by the VoiceIDVerifier DApp, enables transparent and auditable verification of user authentication data. Stakeholders can easily access and verify the authenticity of recorded transactions, fostering trust and transparency in the authentication process.
  4. Decentralized Trust: The VoiceIDVerifier DApp eliminates the need for centralized authorities or intermediaries to verify user authentication data. Instead, trust is distributed across the network, with consensus mechanisms ensuring the accuracy and validity of recorded transactions. This decentralized trust model enhances the reliability and resilience of the authentication system.

By leveraging the capabilities of the VoiceIDVerifier DApp and blockchain technology, the voice authentication system achieves heightened security, transparency, and trustworthiness, ensuring a robust and reliable mechanism for authenticating user identities.

Understanding the VoiceIDVerifier DApp Deployment on Polygon PoS

The UML diagram provides an overview of the VoiceIDVerifier decentralized application (DApp) deployed on the Polygon Proof of Stake (PoS) blockchain network. This diagram illustrates the key components, interactions, and workflows involved in the authentication process within the DApp.

The Vector Database: A Core Element, Why QDrant?

The vector database plays a crucial role in the voice authentication system, and choosing QDrant as the platform for its implementation offers several significant advantages. Below are some key reasons why QDrant is the ideal choice for managing the vector database in our system:

  1. Scalability and Performance: QDrant is designed to handle large volumes of data and provide exceptional performance in high-load environments. Its distributed architecture and parallel processing capabilities ensure optimal scalability, enabling efficient management of large amounts of voice vectors without compromising system performance.
  2. Advanced Similarity Search: QDrant offers powerful similarity search capabilities that are essential for the voice authentication process. Its vector-based similarity search algorithm ensures accurate and efficient results, allowing for quick and effective comparison of input voice vectors with those stored in the database.
  3. Security and Privacy: QDrant prioritizes data security and privacy, offering robust security measures to protect the integrity and confidentiality of stored voice vectors. Its advanced security features, such as data encryption and granular access controls, ensure that user data is effectively protected against external threats.
  4. Integration with Voice Technologies: QDrant seamlessly integrates with other key voice technologies, such as Resemblyzer, making it easy to generate, store, and search voice vectors in the voice authentication system. This seamless integration ensures optimal interoperability between the various tools and components of the system.

In summary, QDrant provides a comprehensive and highly efficient solution for managing the vector database in our voice authentication system. Its scalability, performance, security, and integration capabilities make it the ideal choice to meet the storage and search needs of voice vectors in a robust and secure voice authentication environment.

User voice embeddings collection

Benefits of the System’s Design

The design of VoicePassport offers a multitude of advantages that enhance security, efficiency, and user experience. Let’s delve into some key benefits:

  1. Enhanced Security: VoicePassport integrates advanced technologies like Resemblyzer for voice analysis and blockchain for data immutability, ensuring robust security measures. By leveraging blockchain, user authentication data remains tamper-proof, providing an added layer of protection against unauthorized access.
  2. Seamless User Experience: With a focus on user convenience, VoicePassport eliminates the need for complex passwords and PINs. Users can effortlessly authenticate themselves using their voice, simplifying the login process and reducing friction for end-users.
  3. Reliable Authentication: The system’s architecture employs sophisticated algorithms for voice analysis and vector similarity search, enabling accurate and reliable authentication. By comparing voice embeddings with stored profiles, VoicePassport ensures precise identification of users, minimizing false positives and negatives.
  4. Scalability and Efficiency: VoicePassport’s design is built to scale, accommodating a growing user base and increasing authentication demands. With components like Apache Airflow and Docker, the system can efficiently handle audio processing tasks and workflow orchestration, ensuring optimal performance even under high loads.
  5. Flexible Integration: The system’s RESTful API and modular architecture facilitate seamless integration with other applications and services. Organizations can easily incorporate VoicePassport into their existing authentication workflows, enhancing security without disrupting user operations.
  6. Cost-Effectiveness: By leveraging technologies like MinIO for object storage and Polygon PoS for blockchain transactions, VoicePassport optimizes resource utilization and reduces operational costs. The efficient utilization of resources translates into a cost-effective solution for organizations seeking reliable authentication mechanisms.
  7. Compliance and Auditability: VoicePassport’s adherence to industry standards and regulatory requirements ensures compliance with data protection regulations. With blockchain-based transaction logging and audit trails, organizations can demonstrate compliance and accountability, fostering trust among users and stakeholders.

In summary, the thoughtful design of VoicePassport brings a host of benefits, ranging from enhanced security and seamless user experience to scalability, flexibility, and cost-effectiveness. By prioritizing security, efficiency, and user-centric design principles, VoicePassport offers a compelling solution for modern authentication challenges.

Comprehensive Explanation of the DAG and Its Operators in VoicePassport

In this section, we’ll provide a detailed breakdown of the Directed Acyclic Graph (DAG) and its constituent operators within the VoicePassport architecture. The DAG serves as the backbone of the workflow orchestration system, enabling seamless coordination and execution of tasks essential for voice authentication and user management.

DAG and Its Operators in VoicePassport

By delving into the intricacies of the DAG and its operators, we aim to shed light on the underlying mechanisms that power VoicePassport’s authentication workflow. From audio processing to blockchain integration, each operator plays a pivotal role in ensuring the system’s reliability, security, and efficiency.

Setting up Voice Identity Registration Workflow

In this implementation, we establish a robust workflow for voice identity registration using Apache Airflow, a platform designed for orchestrating complex computational tasks, leveraging the flexibility and scalability offered by this tool. By encapsulating each step of the registration process within tasks, we ensure modularity, maintainability, and ease of integration, ultimately enhancing the security and usability of voice recognition systems.

Initialization: We begin by defining default arguments for our Directed Acyclic Graph (DAG), the fundamental structure in Airflow representing our workflow. These arguments include ownership details, start date, number of retries, and logging level, ensuring smooth execution and monitoring of tasks.

Operator Imports: Next, we dynamically import operator classes from external modules using Python’s importlib module. These operators encapsulate specific actions within our workflow, such as generating voice embeddings, managing data in a database, interacting with external services like QDrant, and processing results via webhooks.

Task Definitions: Each task within our workflow corresponds to a distinct operation in the voice identity registration process. For instance:

  • Generating Voice Embeddings: This task involves extracting meaningful features from audio files, a crucial step in voice-based identity verification.
  • Upserting into QDrant: Here, we manage the insertion or updating of voice embeddings in QDrant, a database optimized for similarity searches.
  • Registering VoiceID: Utilizing a Smart Contract, this task facilitates the secure registration of voice identities, ensuring integrity and authenticity.
  • Processing Results via Webhook: Finally, we process the outcome of the registration process and dispatch it to a webhook for further action.
Voice Identity Registration DAG

How the GenerateVoiceEmbeddingsOperator Simplifies Voice Embedding Generation in Apache Airflow

The GenerateVoiceEmbeddingsOperator is a specialized Apache Airflow operator crafted to extract voice embeddings from audio files within Airflow workflows. Let's delve into its functionality:

Custom Initialization: Upon initialization, the operator inherits and extends functionality from the BaseCustomOperator, a foundational class for custom operators in Airflow. This initialization process ensures that the operator is ready for execution within Airflow environments.

Audio Processing and Embedding Generation: A pivotal aspect of the operator’s functionality lies in its ability to preprocess audio files and derive voice embeddings from them. Leveraging the preprocess_wav function from the Resemblyzer library, the operator prepares the audio data for embedding generation. Subsequently, using the VoiceEncoder class, it extracts embeddings from the preprocessed audio data.

Execution Handling: During execution, the operator meticulously orchestrates the process of generating embeddings for the provided audio file. It accesses the execution context passed by Airflow, allowing seamless integration with other tasks and workflows. Notably, the operator logs pertinent details of its execution, providing insights into the process for monitoring and troubleshooting purposes.

Integration with External Resources: The operator seamlessly integrates with external resources, such as MongoDB for logging execution details and MinIO for accessing audio files. It fetches the audio file specified in the execution context from MinIO, ensuring that the necessary data is readily available for processing.

In essence, the GenerateVoiceEmbeddingsOperator streamlines the process of extracting voice embeddings from audio files within Apache Airflow workflows. Its robust functionality, coupled with seamless integration capabilities with external resources, makes it an invaluable tool for voice data processing pipelines. Through meticulous execution handling and logging, it facilitates transparent monitoring and efficient management of embedding generation tasks.

Exploring the QDrantEmbeddingsOperator: Enabling Effortless Integration with QDrant Vector Database for Voice Embeddings Upsertion

The QDrantEmbeddingsOperator is a custom Apache Airflow operator tailored for seamless integration with the QDrant vector database. Its primary function is to facilitate the upsertion of voice embeddings into designated collections within the QDrant service.

Custom Initialization: Upon initialization, the operator sets up essential parameters such as the QDrant URI, API key for authentication, and the target collection where the embeddings will be stored. This initialization process ensures seamless interaction with the QDrant service throughout the operator’s execution.

QDrant Client Initialization: A crucial step in the operator’s workflow involves the initialization of a QDrant client. This client serves as the interface for communication with the QDrant service, allowing for operations such as collection creation and embeddings upsertion.

Collection Management: The operator takes charge of managing collections within the QDrant database. It verifies the existence of the specified collection and creates it if not already present. This ensures that the embeddings have a designated space for storage within QDrant.

Embeddings Upsertion: With the collection in place, the operator proceeds to upsert the voice embeddings into QDrant. Leveraging the initialized QDrant client, it efficiently inserts the embeddings into the designated collection, ensuring seamless integration with the QDrant vector database.

Execution Handling: During execution, the operator meticulously handles various scenarios, such as error conditions or missing parameters. It logs relevant information at each step of the process, allowing for transparent monitoring and troubleshooting.

In essence, the QDrantEmbeddingsOperator acts as a bridge between Apache Airflow workflows and the QDrant vector database, simplifying the process of managing and integrating voice embeddings. Its robust functionality, coupled with meticulous error handling and logging, makes it an indispensable component for voice data processing pipelines.

Unveiling the Capabilities of the RegisterVoiceIDOperator: Streamlining Voice ID Registration in Blockchain Smart Contracts with Apache Airflow

The RegisterVoiceIDOperator is a specialized Apache Airflow operator designed to facilitate the registration of voice IDs within a blockchain smart contract. Let's delve into its functionality:

Initialization: Upon initialization, the operator inherits and extends functionality from the BaseWeb3CustomOperator, a foundational class for custom operators interfacing with Web3 providers. This initialization process ensures that the operator is properly configured for interaction with blockchain networks.

Voice ID Registration: At the heart of its functionality lies the capability to register voice IDs on a blockchain smart contract. Leveraging the provided Web3 provider and contract instance, the operator constructs and executes a transaction to register voice ID verification data on the blockchain. This transaction includes essential information such as the SHA256 hashes of the user ID and voice file ID, ensuring data integrity and security.

Transaction Handling: During execution, the operator meticulously handles the transaction lifecycle. It connects to the Web3 provider, retrieves necessary parameters such as chain ID and nonce, and interacts with the smart contract to register voice IDs. Subsequently, it waits for the transaction receipt, providing confirmation of transaction execution and logging relevant details for monitoring and auditing purposes.

Integration with External Resources: The operator seamlessly integrates with external resources, such as MongoDB for logging execution details and storing contract ABIs. It fetches the voice file ID from the execution context, derives its SHA256 hash, and retrieves user information associated with the voice ID for registration. This integration ensures a cohesive workflow for voice ID registration within Airflow environments.

In essence, the RegisterVoiceIDOperator serves as a vital component in voice authentication pipelines, enabling the secure and transparent registration of voice IDs on blockchain networks. Its robust functionality, coupled with seamless integration capabilities with external resources and blockchain networks, makes it an indispensable tool for identity management and verification workflows. Through meticulous transaction handling and logging, it facilitates transparent monitoring and auditing of voice ID registration operations within Airflow workflows.

Delving into the Functionality of the Process Result Webhook Operator: Enabling Seamless Execution of Result Processing and Webhook Dispatch in Apache Airflow

The ProcessResultWebhookOperator is a crucial component within Apache Airflow designed to execute tasks related to processing and sending result data to a predefined webhook. Let's dive into its functionality:

Initialization: Upon initialization, the operator inherits from the BaseCustomOperator, providing a foundation for custom operator functionality. This initialization ensures that the operator is configured to execute within Airflow environments seamlessly.

Task Execution: At the core of its functionality lies the execute method, responsible for orchestrating the execution logic. It begins by logging the start of the execution, indicating the initiation of result data processing.

Result Data Retrieval: The operator retrieves the result webhook URL from the DAG run configuration, ensuring flexibility in defining the destination for result data. It then retrieves result data from specified tasks, combining and validating the data for further processing.

Webhook Interaction: With the result data aggregated and validated, the operator proceeds to interact with the specified webhook. It constructs a POST request containing the result data and sends it to the webhook endpoint. Exception handling mechanisms are in place to address any errors encountered during the POST request, ensuring robustness and reliability.

Logging and Completion: Throughout the execution lifecycle, the operator diligently logs relevant details, including the result data before making the POST request and the outcome of the request. Upon successful execution or in the event of errors, comprehensive logging ensures transparency and traceability.

In summary, the ProcessResultWebhookOperator serves as a pivotal component in Airflow workflows, facilitating the seamless processing and transmission of result data to external endpoints via webhooks. Its robust functionality, coupled with comprehensive logging and error handling mechanisms, ensures the reliability and integrity of result data processing operations within Airflow environments. Through its execution, it enables seamless integration with external systems, empowering users to automate and streamline result data workflows effectively.

Configuring Voice Identity State Changes Workflow in Apache Airflow

In this setup, we’re establishing a workflow within Apache Airflow specifically designed to handle changes in the state of voice identities. This workflow is crucial for systems where the verification state of voice identities needs to be dynamically managed and updated.

Initialization: We start by defining default arguments for our Directed Acyclic Graph (DAG), which serves as the backbone of our workflow. These arguments include details such as ownership, start date, retry attempts, and logging level, ensuring the reliability and smooth execution of our tasks.

Operator Imports: To execute various operations within our workflow, we import operator classes dynamically from external modules using Python’s importlib module. These operators encapsulate specific actions required for managing voice identity state changes and processing results.

Task Definitions: Our workflow consists of two main tasks, each responsible for a distinct operation:

  • Change Voice ID Verification State Task: This task handles the process of changing the verification state of a voice identity. It interacts with external systems and smart contracts to update the state securely and efficiently.
  • Process Result via Webhook Task: After successfully changing the verification state, this task processes the result and sends it to a designated webhook. This enables further actions or notifications based on the outcome of the state change operation.
Voice ID change state DAG

Unveiling the Functionality of the Change Voice Id Verification State Operator: A Key Component for Modifying Voice ID Verification States in Blockchain Smart Contracts

The ChangeVoiceIdVerificationStateOperator is a critical component within Airflow tailored for altering the verification state of a voice ID in a blockchain smart contract. Below, we delve into its operational intricacies:

Initialization: Upon initialization, the operator inherits from the BaseWeb3CustomOperator, ensuring seamless integration with Web3 functionalities. This initialization process sets the stage for executing blockchain-related operations within Airflow workflows.

Task Execution: At its core lies the execute method, orchestrating the operational workflow. It commences by logging the initiation of the execution process, providing crucial insights into the execution lifecycle.

Parameter Retrieval: The operator retrieves essential parameters, such as the user ID and verification state, from the DAG run configuration. These parameters serve as inputs for determining the action to be taken on the smart contract.

Validation: Before proceeding further, the operator validates the retrieved parameters to ensure data integrity and adherence to specified constraints. It raises exceptions if any parameter is missing or does not meet the required format.

Blockchain Interaction: With parameters validated, the operator establishes a connection to the Web3 provider, retrieves the contract ABI, and obtains an instance of the smart contract. It then constructs the necessary contract function call based on the provided parameters and builds the transaction for execution on the blockchain.

Transaction Execution: The operator signs and sends the transaction to the blockchain network, awaiting confirmation. Upon transaction receipt, it logs relevant transaction details and concludes the execution process.

Result Reporting: Throughout the execution lifecycle, the operator diligently logs pertinent information, ensuring transparency and traceability. Upon completion, it returns a structured dictionary containing information about the executed operation, including the user ID and the outcome of the transaction.

In summary, the ChangeVoiceIdVerificationStateOperator plays a pivotal role in Airflow workflows by facilitating the modification of voice ID verification states within blockchain smart contracts. Its robust functionality, coupled with comprehensive validation and logging mechanisms, ensures the reliability and integrity of verification state modification operations within Airflow environments. Through its execution, it empowers users to seamlessly integrate blockchain interactions into their workflows, enabling efficient management of voice ID verification processes.

Voice Authentication DAG: Workflow Overview and Task Descriptions

In this setup, we’re establishing a workflow within Apache Airflow specifically designed to handle voice authentication processes. This workflow is essential for systems where verifying voice identities is a critical component.

To initialize our workflow, we define default arguments for our Directed Acyclic Graph (DAG). These arguments set the groundwork for reliable task execution, including ownership, start date, retry attempts, and logging level.

For executing operations within our workflow, we dynamically import operator classes from external modules using Python’s importlib module. These operators encapsulate essential actions required for various stages of the voice authentication process.

Our workflow comprises several tasks, each assigned a specific responsibility:

  1. Generate Voice Embeddings Task: This task is responsible for extracting voice embeddings from audio files. It retrieves necessary parameters such as MongoDB and MinIO access details from environment variables.
  2. Find Most Similar Voice Embeddings Task: Here, the task finds the most similar voice embeddings to a given sample. It also fetches MongoDB and MinIO access information from environment variables, along with additional parameters like QDRANT details.
  3. Verify Voice Identity Task: This task performs the actual verification process using voice authentication. It accesses a range of environment variables for MongoDB, MinIO, HTTP provider, caller address, private key, contract address, contract ABI, JWT secret, and JWT duration hours.
  4. Process Result via Webhook Task: Following successful verification, this task processes the result and dispatches it to a designated webhook for further actions or notifications. Like other tasks, it retrieves MongoDB and MinIO access details from environment variables.

The tasks are interlinked in sequence to ensure a smooth execution flow. Each subsequent task depends on the successful completion of the preceding one, guaranteeing the integrity of the voice authentication process.

Voice Authentication DAG

Exploring the Functionality of the FindMostSimilarVoiceOperator

The FindMostSimilarVoiceOperator is like a special tool in Apache Airflow that helps find voices that sound alike. It’s kind of like a detective searching through a big library of voice recordings to find the one that sounds most similar to a specific recording.

Before it starts looking, it needs a few things: the web address and a secret code to access the library, and the name of the section in the library where it should search. These details help it know where to start looking.

Once it’s all set up, the operator gets to work. It connects to the library and starts searching. First, it gets information about the voice recording it needs to compare with others from Airflow. Then, it compares this recording with all the others in the library.

As it searches, it keeps track of which recording sounds the most like the one it’s comparing. Once it’s looked at all of them, it picks the one that’s the closest match.

After all the searching, the operator finishes its job. It writes down what it found, like which recording is the closest match and how similar it is. This way, it helps Airflow figure out which voice recordings are most alike.

Understanding the VerifyVoiceIdOperator: Verifying User Identity on the Ethereum Blockchain

The VerifyVoiceIdOperator is a custom Airflow operator tailored to verify a user's identity using a Smart Contract on the Ethereum blockchain. Here's how it works:

Initialization and Setup: When initializing the operator, two parameters are required: jwt_secret, which serves as the secret key for generating JWT (JSON Web Tokens), and jwt_duration_hours, indicating the duration of validity for these JWT tokens in hours.

JWT Token Generation: Before initiating the identity verification process, the operator generates a JWT token with the user’s ID as a claim. This token is signed using the provided secret key and has an expiration date calculated based on the current time plus the specified duration.

Execution of the Operator: During execution, the operator first retrieves the user ID associated with the most similar voice ID found in the previous stage of the process. If no voice ID is found, an error is logged, and a failed authentication result is returned.

Connection to Web3 Provider: The operator establishes a connection to the Web3 provider, essential for interacting with the Ethereum blockchain.

Loading Contract ABI: The operator loads the Contract Application Binary Interface (ABI) of the Ethereum smart contract. The ABI is crucial for interacting with smart contracts on the blockchain.

Execution of Smart Contract: Using the user ID and voice ID, corresponding hashes are generated, and the verifyVoiceID function of the smart contract is invoked. This function verifies whether the combination of user ID and voice ID is valid according to the rules defined in the smart contract.

Session Token Generation: If the verification is successful, a session JWT token is generated using the verified user ID. This token will be used to authorize subsequent user requests.

Logging and Result Return: Events of the start and completion of the operator’s execution are logged, and a result indicating whether the authentication was successful, along with the session token in case of success, or simply a failure indicator if authentication failed, is returned.

Acknowledgements

I would like to extend my heartfelt gratitude to Karan Shingde for his insightful article published on Medium titled “Build an Audio-driven Speaker Recognition System Using Open Source Technologies: resemblyzer and pyAudioAnalysis”. His comprehensive guide served as a significant source of inspiration and a crucial starting point for developing the VoicePassport Architecture project. Karan’s expertise and dedication have been instrumental in shaping my understanding and implementation of speaker recognition technologies. I am truly thankful for his invaluable contribution to the field and for sharing his knowledge with the community.

This is it. I have really enjoyed developing and documenting this little project. Thanks for reading it. I hope this is the first of many. Special thanks to the open-source community and the contributors who have made this project possible.

If you are interested in the complete code, here is the link to the public repository:

--

--

Sergio Sánchez Sánchez

Mobile Developer (Android, IOS, Flutter, Ionic) and Backend Developer (Spring, J2EE, Laravel, NodeJS). Computer Security Enthusiast.