Follow My Reading
- Follow My Reading
- Overview
- Installation
- Deployment
- Plugins
- Quick Start
- Requirements
- Algorithms
- Audio Algorithms
- Text Algorithms
- API
-
Authentication
- The endpoint /upload allows clients to upload audio files and returns a unique file ID.
- The endpoint /download allows to download audio file by given uuid.
- The endpoint /models returns available (loaded) audio models.
- The endpoint /process/task creates an audio processing task based on the given request parameters.
- The endpoint /process/result retrieves the result of an audio
- The endpoint /extract/task extract specified phrases from given audio
- The endpoint /extract/result retrieves the result of an audio
- The endpoint /upload allows clients to upload image files and returns a unique file ID.
- The endpoint /download allows to download audio file by given uuid.
- The endpoint /models returns available (loaded) image models.
- The endpoint /process/task creates an image processing task based on the given request parameters.
- The endpoint /process/result retrieves the result of an image
- The endpoint /register registers a new user by storing their username, password, email, and
- The endpoint /token handles the login process and returns an
- The endpoint /users/me returns the current user.
- The endpoint /audio/image/task creates a task to compare an audio against image file using specified
- The endpoint /audio/image/result retrieves the results of a task with a given task ID, and returns the
- The endpoint '/audio/text/task' creates a task to compare audio against text from user input
- The endpoint /audio/text/result retrieves the results of a task with a given task ID, and returns the
- The endpoint status returns the status of a task identified by its task_id.
- The endpoint /result retrieves the results of a task with a given task ID, and returns the
-
Schemas
- Properties
- Properties
- Properties
- Properties
- Properties
- Properties
- Properties
- Properties
- Properties
- Properties
- Properties
- Properties
- Properties
- Properties
- Properties
- Properties
- Properties
- Properties
- Properties
- Properties
- Properties
- Properties
- Properties
- Properties
- Properties
- Properties
- Properties
- Properties
- Properties
- Advanced
- Audio
- Image
- Task System
Overview
Follow My Reading is a game-changer for individuals who may struggle with proper pronunciation while reading. Follow My Reading provides a API service for our users to upload an image and audio of their reading session, and our service checks whether there are any mistakes in pronunciation in the audio.
Here's how it works - users can take a photo of the text, read it aloud, and record their audio simultaneously using their device. Our platform reviews the audio against the text on the image and provides feedback on areas that need improvement. Our technology uses several deep neural network models to detect mispronunciation of words.
Moreover, the "Follow My Reading" project has been designed to be highly customizable and easily configurable to meet the needs of different users and applications. The system administrator has the flexibility to add or remove models for audio and image processing as needed, making it a very versatile system.
Adding or removing models from the platform can be accomplished quickly thanks to the plugin system. This approach allows administrator to create a custom audio or image processing plugin. Thus, the plugin system allows administrator to add custom models, extend the functionality of existing models, integrate third-party models, or even train their own processing models making "Follow My Reading" an even more powerful
Documentation Overview
-
Overview
This section provides an introduction to the product and its features, along with a summary of the content and structure of the documentation.
-
Installation
This section covers the steps required to install the product, including prerequisites, system requirements, installation options, and troubleshooting tips. This section may also include information on how to update or uninstall the product.
-
Deployment
This section provides guidance on deploying the product in different environments or scenarios, such as on-premise, cloud, or hybrid deployments. It may cover topics such as scaling, fault-tolerance, security, and monitoring.
-
Plugins
This section explains how to and manage plugins, which provide additional functionalities to the product. This include information on how to create or customize plugins, as well as best practices for using plugins effectively.
-
Algorithms
This section explains algorithms that are used for the product to work. This include descriptions of algorithms, what they accept and what they return.
-
API
This section documents the product's API and provides guidance on how to use it. This may include information on supported protocols, authentication, rate limiting, and error handling. Sample code snippets and use cases may also be provided.
-
Advanced
This section covers more advanced topics, such as performance optimization, customization, integration with other systems, and troubleshooting complex issues. This section also include an explanation of the task system functionality.
Full list of Features
- ✅ Image and audio upload
- ✅ Audio Processing
- ✅ Splitting audio by words or by phrases
- ✅ Image Processing
- ✅ Reporting text coordinates on the image
- ✅ Comparing audio and image
- ✅ Comparing audio and text
- ✅ Extracting audio by given phrases
- ✅ Plugin Support
- ✅ Distributed computing using Task System
- ✅ Authentication
Installation
Before locally using Follow My Reading, there are a few prerequisites that need to be installed first.
Prerequisites
1. Install Python 3.10
Python is required for the installation of Follow My Reading. If Python is not already installed on your device, download and install it from the official Python website.
2. Install pip
Pip is a package manager for Python packages. It allows you to install and manage additional packages that are not included with Python by default. To install Pip, follow the instructions below:
- For Windows:
py -m ensurepip --upgrade
- For Linux/MacOS:
python -m ensurepip --upgrade
3. Install Poetry
Poetry is a Python packaging and dependency management tool. You can install it by running the following command:
pip install poetry
4. Get the source code:
Once you have the access to the code, to get it, use the following command:
git clone https://gitlab.pg.innopolis.university/a.kudryavtsev/follow-my-reading.git
5. Install project dependencies
To install all project dependencies, use the following command
poetry install
These steps will ensure that you have everything required to be able to install and use Follow My Reading.
6. [Optional] Model Dependencies
Several models require additional steps to set up.
Tesseract
Deployment
Follow My Reading can be deployed in several ways depending on your requirements. Below are instructions for deploying Follow My Reading in different ways.
Stand-alone
If you want to run Follow My Reading as a stand-alone Docker container, you can run the following command:
make standalone
This will build and run the Follow My Reading Docker container.
Launch
If you want to run Follow My Reading locally with Redis and Huey, you need to run the following commands:
- Run the Redis server:
redis-server
- Run the Huey consumer:
huey_consumer.py core.task_system.scheduler -n -k thread
- Run the server:
uvicorn main:app
Scalability
Follow My Reading can be scaled horizontally by running multiple Huey consumers with the following command:
huey_consumer.py core.task_system.scheduler -n -k thread -w NUMBER
Where NUMBER
is the number of workers you want to run. You can run this command on multiple machines to run a worker on each of them, as long as they are connected to Redis.
NOTE! Right now executing task on multiple machines is unstable
Plugins
Quick Start
First Plugin
Plugins in our system are described as Python files in the /plugins directory. There are several requirements for the format of these plugins. To implement a new plugin, create a file with a name that ends in _plugin.py. In this file, you should include the following imports:
For Image processing models:
from core.plugins import (
ImageProcessingResult,
ImageTextBox,
Point,
Rectangle,
register_plugin,
)
For Audio processing models:
from core.plugins import AudioChunk, AudioProcessingResult, register_plugin
The register_plugin
function is a decorator that you should use to register your custom plugin. This function takes a single parameter which is the class of your plugin.
Image Processing Example
Here is an example of how to create and register a custom plugin for image processing:
import easyocr
from core.plugins import (
ImageProcessingResult,
ImageTextBox,
Point,
Rectangle,
register_plugin,
)
@register_plugin
class EnArEasyOCRPlugin:
name = "en_ar_easyocr"
description = (
"An open source library for certain languages and alphabets,"
"mainly used for working with text on an image"
)
# List of supported languages can be found here: https://www.jaided.ai/easyocr/
languages = ["en", "ar"]
reader = easyocr.Reader(languages, gpu=False)
@staticmethod
def process_image(filename: str) -> ImageProcessingResult:
model_response = EnArEasyOCRPlugin.reader.readtext(filename)
boxes = []
for coordinates, text, _ in model_response:
lt, rt, rb, lb = coordinates
boxes.append(
ImageTextBox(
text=text,
coordinates=Rectangle(
left_top=Point(x=lt[0], y=lt[1]),
right_top=Point(x=rt[0], y=rt[1]),
right_bottom=Point(x=rb[0], y=rb[1]),
left_bottom=Point(x=lb[0], y=lb[1]),
),
)
)
result_text = " ".join(map(lambda x: x[1], model_response))
return ImageProcessingResult(text=result_text, boxes=boxes)
Audio Processing Example
And here is an example of how to create and register a custom plugin for audio processing:
import whisper
from core.plugins import AudioChunk, AudioProcessingResult, register_plugin
@register_plugin
class WhisperPlugin:
name = "whisper"
languages = ["en", "ru", "ar"]
description = "Robust Speech Recognition via Large-Scale Weak Supervision By OpenAI"
model = whisper.load_model("base") # large-v2
@staticmethod
def process_audio(filename: str) -> AudioProcessingResult:
model_response = WhisperPlugin.model.transcribe(filename)
chunks = [
AudioChunk(start=seg["start"], end=seg["end"], text=seg["text"])
for seg in model_response["segments"]
]
return AudioProcessingResult(text=model_response["text"], segments=chunks)
Requirements
In our system, each plugin file must be named in the format *_plugin.py
and located in the /plugins
directory. Each plugin must also contain the following static variables:
-
name
: A string that specifies the name of the plugin. -
languages
: A list of strings specifying the natural languages that the plugin can process. -
description
: A string that provides a description of the plugin.
Additionally, each plugin must implement one of the following static methods:
-
process_audio(filename: str)
: Must accept an argument of type string and return an object of type AudioProcessingResult. -
process_image(filename: str)
: Must accept an argument of type string and return an object of type ImageProcessingResult.
Algorithms
Audio Algorithms
dbfs_to_fraction
The dbfs_to_fraction
function accepts a decibel value relative to full scale (dbfs) and returns the corresponding fraction of the maximum volume as a float.
fraction_to_dbfs
The fraction_to_dbfs
function accepts a fraction of the maximum volume and returns the corresponding decibels relative to full scale (dbfs) as a float.
split_audio
The split_audio
function accepts the path to an audio file or a pydub AudioSegment object and a list of tuples representing the timestamps for the beginning and end of each desired segment (in seconds). The function returns the UUIDs of the cut-up files in the order they appeared in the intervals.
split_silence
The split_silence
function accepts the path to an audio file, the maximum length of a desired segment (in seconds), and the percentage of the maximum volume at which a segment is considered "silent". The function cuts the file only by silence, not by words, and adds a 50 ms buffer around each segment. The function returns a list of the UUIDs of all the cut-up segments and the intervals at which they were cut.
Text Algorithms
match_words
The match_words
function accepts two texts and returns a list of changes that need to be made to the first text in order to get the second one. The comparison takes place using whole words, and the function returns the list of changes in the following format: Tuple(Index in the first text where the difference was found, The segment of the first text which is to be removed, The segment of the second text which is to be substituted in).
match_phrases
match_phrases
is a function that takes in two arguments, phrases
and text
. phrases
is a list of phrases or string fragments to be checked against text
, which is the correct text. It returns a list of error tuples for each phrase in the phrases
list, indicating the index at which the error occurred, the incorrect phrase, and the correct phrase.
The function first prepares the input texts by ignoring capital letters and non-letter symbols. It then uses levenshtein distance to calculate the full answer between the phrases
and text
. Finally, it cross-references the indices in the full answer to distribute the errors by phrases.
find_phrases
find_phrases
is a function that takes in three arguments, phrases
, to_find
, and margin
(default 1.05). phrases
is a list of phrases; to_find
is the piece of text to be found within the phrases
. It returns a list of indices of the phrases in which the text appears in.
The function first prepares the input to_find
and phrases
to ignore multiple spaces and non-letter symbols by calling on the helper function __prep_text
. It then computes the size of a window to compare to the text and finds the window that best fits the string via the __match_symbols
helper function.
The function then trims the window to exclude unnecessary symbols (trims using full words) and transforms the indices from the prepared text to initial text. Lastly, it iterates through the phrases to compute the final answer.
API
FastAPI v0.1.0
Scroll down for code samples, example requests and responses. Select a language for code samples from the tabs above or the mobile navigation menu.
Authentication
-
oAuth2 authentication.
-
Flow: password
-
Token URL = /v1/auth/token
-
Scope | Scope Description |
---|
audio
The endpoint /upload allows clients to upload audio files and returns a unique file ID.
Code samples
POST /v1/audio/upload
The endpoint validates file based on
MIME types specification.
The endpoint converts audio file into .mp3
format.
Parameters:
- upload_file: The audio file to upload
List of the most important allowed extensions:
- .acc
- .mp3
- .m4a
- .oga, .ogv
- .ogg
- .opus
- .wav
Body parameter
upload_file: string
Parameters
Name | In | Type | Required | Description |
---|---|---|---|---|
body | body | Body_upload_audio_file_v1_audio_upload_post | true | none |
» upload_file | body | string(binary) | true | none |
Example responses
200 Response
{
"file_id": "8a0cfb4f-ddc9-436d-91bb-75133c583767"
}
422 Response
{
"detail": "Only audio files uploads are allowed"
}
Responses
Status | Meaning | Description | Schema |
---|---|---|---|
200 | OK | The file is uploaded successfully | UploadFileResponse |
422 | Unprocessable Entity | The file was not sent or the file has unallowed extension | None |
Response Schema
To perform this operation, you must be authenticated by means of one of the following methods: OAuth2PasswordBearer
/download
allows to download audio file by given uuid.
The endpoint
Code samples
GET /v1/audio/download
The endpoint /download
takes a file UUID as input, checks if the file exists in the
audio directory, and returns the file as bytes (.mp3
format). If file does not exist, returns 404 HTTP response code
Responses:
- 200, file bytes (
.mp3
format)
Parameters
Name | In | Type | Required | Description |
---|---|---|---|---|
file | query | string(uuid) | true | none |
Example responses
404 Response
{
"detail": "File not found"
}
422 Response
{
"detail": [
{
"loc": [
"string"
],
"msg": "string",
"type": "string"
}
]
}
Responses
Status | Meaning | Description | Schema |
---|---|---|---|
200 | OK | Successful Response | None |
404 | Not Found | The specified file was not found. | None |
422 | Unprocessable Entity | Validation Error | HTTPValidationError |
Response Schema
To perform this operation, you must be authenticated by means of one of the following methods: OAuth2PasswordBearerThe endpoint /models returns available (loaded) audio models.
Code samples
GET /v1/audio/models
Returns list of models, which are loaded into the worker and available for usage.
Example responses
200 Response
{
"models": [
{
"name": "string",
"languages": [
"string"
],
"description": "string"
}
]
}
Responses
Status | Meaning | Description | Schema |
---|---|---|---|
200 | OK | List of available models | ModelsDataReponse |
/process/task
creates an audio processing task based on the given request parameters.
The endpoint
Code samples
POST /v1/audio/process/task
Parameters:
- audio_file: an uuid of file to process
- audio_model: an audio processing model name (check '/models' for available models)
Responses:
- 404, No such audio file available
- 404, No such audio model available
Body parameter
{
"audio_file": "732b10bd-0006-4780-8f48-4319d2791290",
"audio_model": "string"
}
Parameters
Name | In | Type | Required | Description |
---|---|---|---|---|
body | body | AudioProcessingRequest | true | none |
» audio_file | body | string(uuid) | true | none |
» audio_model | body | string | true | none |
Example responses
200 Response
{
"task_id": "736fde4d-9029-4915-8189-01353d6982cb"
}
404 Response
{
"detail": "No such audio file available"
}
Responses
Status | Meaning | Description | Schema |
---|---|---|---|
200 | OK | Task was successfully created and scheduled | TaskCreateResponse |
404 | Not Found | The specified file or model was not found. | None |
422 | Unprocessable Entity | Validation Error | HTTPValidationError |
Response Schema
To perform this operation, you must be authenticated by means of one of the following methods: OAuth2PasswordBearer
/process/result
retrieves the result of an audio
The endpoint processing task from task system and returns it.
Code samples
GET /v1/audio/process/result
Responses:
- 200, returns a processing result in the format:
{
"text": "string", // total extracted text
"segments": [ // list of audio segments
{
"start": 0.0, // absolute timecode (in seconds) of the beginning of the segment
"end": 10.0, // absolute timecode (in seconds) of the beginning of the segment
"text": "string", // text, which was extracted from the segment
"file": "3fa85f64-5717-4562-b3fc-2c963f66afa6" // file uuid of the audio segment (for downloading)
}
]
}
- 406, is impossible to get task result (task does not exist or it has not finished yet).
- 422, if the task was not created as audio processing task
Parameters
Name | In | Type | Required | Description |
---|---|---|---|---|
task_id | query | string(uuid) | true | none |
Example responses
200 Response
{
"text": "string",
"segments": [
{
"start": 0,
"end": 0,
"text": "string",
"file": "00bd29cf-1ab3-4825-b15f-d80a4a0e1cbb"
}
]
}
406 Response
{
"detail": "The job is non-existent or not done"
}
422 Response
{
"detail": "There is no such audio processing task"
}
Responses
Status | Meaning | Description | Schema |
---|---|---|---|
200 | OK | Successful Response | AudioProcessingResponse |
406 | Not Acceptable | It is impossible to get task result (task does not exist or it has not finished yet). | None |
422 | Unprocessable Entity | The specified task is not audio processing task. | None |
Response Schema
To perform this operation, you must be authenticated by means of one of the following methods: OAuth2PasswordBearer
/extract/task
extract specified phrases from given audio
The endpoint file using specified given audio model
Code samples
POST /v1/audio/extract/task
Parameters:
- audio_file: an uuid of file to process
- audio_model: an audio processing model name (check '/models' for available models)
Responses:
- 404, No such audio file available
- 404, No such audio model available
Body parameter
{
"audio_file": "732b10bd-0006-4780-8f48-4319d2791290",
"audio_model": "string",
"phrases": [
"string"
]
}
Parameters
Name | In | Type | Required | Description |
---|---|---|---|---|
body | body | AudioExtractPhrasesRequest | true | none |
» audio_file | body | string(uuid) | true | none |
» audio_model | body | string | true | none |
» phrases | body | [string] | true | none |
Example responses
200 Response
{
"task_id": "736fde4d-9029-4915-8189-01353d6982cb"
}
404 Response
{
"detail": "No such audio file available"
}
Responses
Status | Meaning | Description | Schema |
---|---|---|---|
200 | OK | Task was successfully created and scheduled | TaskCreateResponse |
404 | Not Found | The specified file or model was not found. | None |
422 | Unprocessable Entity | Validation Error | HTTPValidationError |
Response Schema
To perform this operation, you must be authenticated by means of one of the following methods: OAuth2PasswordBearer
/extract/result
retrieves the result of an audio
The endpoint extracting task from task system and returns it.
Code samples
GET /v1/audio/extract/result
Parameters
Name | In | Type | Required | Description |
---|---|---|---|---|
task_id | query | string(uuid) | true | none |
Example responses
200 Response
{
"data": [
{
"audio_segment": {
"start": 0,
"end": 0,
"text": "string",
"file": "00bd29cf-1ab3-4825-b15f-d80a4a0e1cbb"
},
"found": true,
"phrase": "string"
}
]
}
406 Response
{
"detail": "The job is non-existent or not done"
}
422 Response
{
"detail": "There is no such audio extraction task"
}
Responses
Status | Meaning | Description | Schema |
---|---|---|---|
200 | OK | Successful Response | AudioExtractPhrasesResponse |
406 | Not Acceptable | It is impossible to get task result (task does not exist or it has not finished yet). | None |
422 | Unprocessable Entity | The specified task is not audio extraction task. | None |
Response Schema
To perform this operation, you must be authenticated by means of one of the following methods: OAuth2PasswordBearerimage
The endpoint /upload allows clients to upload image files and returns a unique file ID.
Code samples
POST /v1/image/upload
The endpoint validates file based on MIME types specification. The endpoint converts image file into .png format.
Parameters:
- upload_file: The file to upload
Allowed extension:
- .avif
- .bmp
- .gif
- .ico
- .jpeg, .jpg
- .png
- .svg
- .tif, .tiff
- .webp
Body parameter
upload_file: string
Parameters
Name | In | Type | Required | Description |
---|---|---|---|---|
body | body | Body_upload_image_v1_image_upload_post | true | none |
» upload_file | body | string(binary) | true | none |
Example responses
200 Response
{
"file_id": "8a0cfb4f-ddc9-436d-91bb-75133c583767"
}
422 Response
{
"detail": "Only image files uploads are allowed"
}
Responses
Status | Meaning | Description | Schema |
---|---|---|---|
200 | OK | The file is uploaded successfully | UploadFileResponse |
422 | Unprocessable Entity | The file was not sent or the file has unallowed extension | None |
Response Schema
To perform this operation, you must be authenticated by means of one of the following methods: OAuth2PasswordBearer
/download
allows to download audio file by given uuid.
The endpoint
Code samples
GET /v1/image/download
The endpoint /download
takes a file UUID as input, checks if the file exists in the
image directory, and returns the file as bytes. If file does not exist, returns 404 HTTP response code
Responses:
- 200, file bytes
Parameters
Name | In | Type | Required | Description |
---|---|---|---|---|
file | query | string(uuid) | true | none |
Example responses
404 Response
{
"detail": "File not found"
}
422 Response
{
"detail": [
{
"loc": [
"string"
],
"msg": "string",
"type": "string"
}
]
}
Responses
Status | Meaning | Description | Schema |
---|---|---|---|
200 | OK | Successful Response | None |
404 | Not Found | The specified file was not found. | None |
422 | Unprocessable Entity | Validation Error | HTTPValidationError |
Response Schema
To perform this operation, you must be authenticated by means of one of the following methods: OAuth2PasswordBearerThe endpoint /models returns available (loaded) image models.
Code samples
GET /v1/image/models
Returns list of models, which are loaded into the worker and available for usage.
Example responses
200 Response
{
"models": [
{
"name": "string",
"languages": [
"string"
],
"description": "string"
}
]
}
Responses
Status | Meaning | Description | Schema |
---|---|---|---|
200 | OK | List of available models | ModelsDataReponse |
/process/task
creates an image processing task based on the given request parameters.
The endpoint
Code samples
POST /v1/image/process/task
Parameters:
- image_file: an uuid of file to process
- image_model: an image processing model name (check '/models' for available models)
Responses:
- 404, No such image file available
- 404, No such image model available
Body parameter
{
"image_file": "89f23c23-fe12-4935-b746-3bbc447c7a72",
"image_model": "string"
}
Parameters
Name | In | Type | Required | Description |
---|---|---|---|---|
body | body | ImageProcessingRequest | true | none |
» image_file | body | string(uuid) | true | none |
» image_model | body | string | true | none |
Example responses
200 Response
{
"task_id": "736fde4d-9029-4915-8189-01353d6982cb"
}
404 Response
{
"detail": "No such image file available"
}
Responses
Status | Meaning | Description | Schema |
---|---|---|---|
200 | OK | Task was successfully created and scheduled | TaskCreateResponse |
404 | Not Found | The specified file or model was not found. | None |
422 | Unprocessable Entity | Validation Error | HTTPValidationError |
Response Schema
To perform this operation, you must be authenticated by means of one of the following methods: OAuth2PasswordBearer
/process/result
retrieves the result of an image
The endpoint processing task from task system and returns it.
Code samples
GET /v1/image/process/result
Responses:
- 200, returns a processing result in the format:
{
"text": "string", // total extracted text
"boxes": [ // list of boxes with text
{
"text": "string", // text, which was extracted from the box
"coordinates": { // coordinates of the box on image
"left_top": { // four points defining the rectangle
"x": 0,
"y": 0
},
"right_top": {
"x": 0,
"y": 0
},
"left_bottom": {
"x": 0,
"y": 0
},
"right_bottom": {
"x": 0,
"y": 0
}
}
}
]
}
- 406, is impossible to get task result (task does not exist or it has not finished yet).
- 422, if the task was not created as audio processing task
Parameters
Name | In | Type | Required | Description |
---|---|---|---|---|
task_id | query | string(uuid) | true | none |
Example responses
200 Response
{
"text": "string",
"boxes": [
{
"text": "string",
"coordinates": {
"left_top": {
"x": 0,
"y": 0
},
"right_top": {
"x": 0,
"y": 0
},
"left_bottom": {
"x": 0,
"y": 0
},
"right_bottom": {
"x": 0,
"y": 0
}
}
}
]
}
406 Response
{
"detail": "The job is non-existent or not done"
}
422 Response
{
"detail": "There is no such image processing task"
}
Responses
Status | Meaning | Description | Schema |
---|---|---|---|
200 | OK | Successful Response | ImageProcessingResponse |
406 | Not Acceptable | It is impossible to get task result (task does not exist or it has not finished yet). | None |
422 | Unprocessable Entity | The specified task is not image processing task. | None |
Response Schema
To perform this operation, you must be authenticated by means of one of the following methods: OAuth2PasswordBearerauth
/register
registers a new user by storing their username, password, email, and
The endpoint full name in a Redis database.
Code samples
PUT /v1/auth/register
Parameters:
- username: The "username: parameter is a string representing the username of the user being registered
- password: The "password" parameter is a string that represents the user's password
- email: The "email" parameter is an optional string that represents the email address of the user
- full_name: The "full_name" parameter is an optional parameter that represents the full name of the user
Parameters
Name | In | Type | Required | Description |
---|---|---|---|---|
username | query | string | true | none |
password | query | string | true | none |
query | string | false | none | |
full_name | query | string | false | none |
Example responses
200 Response
{
"text": "string"
}
422 Response
{
"detail": "Username is already taken"
}
Responses
Status | Meaning | Description | Schema |
---|---|---|---|
200 | OK | Successful Response | RegisterResponse |
422 | Unprocessable Entity | The specified username is already taken | None |
Response Schema
This operation does not require authentication
/token
handles the login process and returns an
The endpoint access token for the authenticated user.
Code samples
POST /v1/auth/token
Parameters:
- username - unique username, which the client has provided while registering
- password - client's password
Responses:
- 401, incorrect username or password
- 200, token
Body parameter
grant_type: string
username: string
password: string
scope: ""
client_id: string
client_secret: string
Parameters
Name | In | Type | Required | Description |
---|---|---|---|---|
body | body | Body_login_for_access_token_v1_auth_token_post | true | none |
» grant_type | body | string | false | none |
» username | body | string | true | none |
» password | body | string | true | none |
» scope | body | string | false | none |
» client_id | body | string | false | none |
» client_secret | body | string | false | none |
Example responses
200 Response
{
"access_token": "string",
"token_type": "string"
}
401 Response
{
"detail": "Incorrect username or password"
}
Responses
Status | Meaning | Description | Schema |
---|---|---|---|
200 | OK | Successful Response | Token |
401 | Unauthorized | Incorrect username or password. | None |
422 | Unprocessable Entity | Validation Error | HTTPValidationError |
Response Schema
This operation does not require authentication
/users/me
returns the current user.
The endpoint
Code samples
GET /v1/auth/users/me
Example responses
200 Response
{
"username": "string",
"email": "string",
"full_name": "string",
"disabled": true
}
400 Response
{
"detail": "Inactive user"
}
401 Response
{
"detail": "Could not validate credentials"
}
Responses
Status | Meaning | Description | Schema |
---|---|---|---|
200 | OK | Successful Response | User |
400 | Bad Request | User is inactive | None |
401 | Unauthorized | Could not validate credentials | None |
Response Schema
To perform this operation, you must be authenticated by means of one of the following methods: OAuth2PasswordBearercomparison
/audio/image/task
creates a task to compare an audio against image file using specified
The endpoint models and returns the task ID.
Code samples
POST /v1/comparison/audio/image/task
Parameters:
- audio_file: an uuid of file to process
- audio_model: an audio processing model name (check '/audio/models' for available models)
- image_file: an uuid of file to process
- image_model: an image processing model name (check '/image/models' for available models)
Responses:
- 200, Task created
- 404, No such audio file available
- 404, No such audio model available
- 404, No such image file available
- 404, No such image model available
Body parameter
{
"audio_file": "732b10bd-0006-4780-8f48-4319d2791290",
"image_file": "89f23c23-fe12-4935-b746-3bbc447c7a72",
"audio_model": "string",
"image_model": "string"
}
Parameters
Name | In | Type | Required | Description |
---|---|---|---|---|
body | body | AudioToImageComparisonRequest | true | none |
» audio_file | body | string(uuid) | true | none |
» image_file | body | string(uuid) | true | none |
» audio_model | body | string | true | none |
» image_model | body | string | true | none |
Example responses
200 Response
{
"task_id": "736fde4d-9029-4915-8189-01353d6982cb"
}
404 Response
{
"detail": "No such image model available"
}
Responses
Status | Meaning | Description | Schema |
---|---|---|---|
200 | OK | Task was successfully created and scheduled | TaskCreateResponse |
404 | Not Found | The specified file or model was not found. | None |
422 | Unprocessable Entity | Validation Error | HTTPValidationError |
Response Schema
To perform this operation, you must be authenticated by means of one of the following methods: OAuth2PasswordBearer
/audio/image/result
retrieves the results of a task with a given task ID, and returns the
The endpoint results.
Code samples
GET /v1/comparison/audio/image/result
Parameters:
-
task_id: The
task_id
is the uuid of the task to fetch results of
Responses:
- 200, job results in the format
{
"image": { // image proccessing result
"text": "string", // total extracted text
"boxes": [ // list of boxes with text
{
"text": "string", // text extracted from the box
"coordinates": { // coordinates of the box on the image
"left_top": { // four points defining a rectangle
"x": 0,
"y": 0
},
"right_top": {
"x": 0,
"y": 0
},
"left_bottom": {
"x": 0,
"y": 0
},
"right_bottom": {
"x": 0,
"y": 0
}
}
}
]
},
"audio": { // audio processing results
"text": "string", // total extracted text
"segments": [ // audio segments, that were processed
{
"start": 0, // absolute time code of the beginning of the segment
"end": 0, // absolute time code of the ending of the segment
"text": "string", // text extracted from the segment
"file": "3fa85f64-5717-4562-b3fc-2c963f66afa6" // audio segment
}
]
},
"errors": [ // results of comparing
{
"audio_segment": { // audio segment where error was made
"start": 0,
"end": 0,
"text": "string",
"file": "3fa85f64-5717-4562-b3fc-2c963f66afa6"
},
"at_char": 0, // chat, at which an error stats
"found": "string", // found word (based on audio)
"expected": "string" // exptected word (suggetion for improvement based on image)
}
]
}
- 406, Results are not ready yet or no task with such id exist
- 422, There is no such audio processing task
Parameters
Name | In | Type | Required | Description |
---|---|---|---|---|
task_id | query | string(uuid) | true | none |
Example responses
200 Response
{
"image": {
"text": "string",
"boxes": [
{
"text": "string",
"coordinates": {
"left_top": {
"x": 0,
"y": 0
},
"right_top": {
"x": 0,
"y": 0
},
"left_bottom": {
"x": 0,
"y": 0
},
"right_bottom": {
"x": 0,
"y": 0
}
}
}
]
},
"audio": {
"text": "string",
"segments": [
{
"start": 0,
"end": 0,
"text": "string",
"file": "00bd29cf-1ab3-4825-b15f-d80a4a0e1cbb"
}
]
},
"errors": [
{
"audio_segment": {
"start": 0,
"end": 0,
"text": "string",
"file": "00bd29cf-1ab3-4825-b15f-d80a4a0e1cbb"
},
"at_char": 0,
"found": "string",
"expected": "string"
}
]
}
406 Response
{
"detail": "Results are not ready yet or no task with such id exist"
}
422 Response
{
"detail": "There is no such task consists of the both image and audio"
}
Responses
Status | Meaning | Description | Schema |
---|---|---|---|
200 | OK | Successful Response | AudioImageComparisonResultsResponse |
406 | Not Acceptable | It is impossible to get task result (task does not exist or it has not finished yet). | None |
422 | Unprocessable Entity | There is no such task consists of the both image and audio. | None |
Response Schema
To perform this operation, you must be authenticated by means of one of the following methods: OAuth2PasswordBearerThe endpoint '/audio/text/task' creates a task to compare audio against text from user input
using specified models and returns the task ID.
Code samples
POST /v1/comparison/audio/text/task
Parameters:
- audio_file: an uuid of file to process
- audio_model: an audio processing model name (check '/audio/models' for available models)
- text: a list of strings to compare audio against
Responses:
- 200, Task created
- 404, No such audio file available
- 404, No such audio model available
Body parameter
{
"audio_file": "732b10bd-0006-4780-8f48-4319d2791290",
"text": [
"string"
],
"audio_model": "string"
}
Parameters
Name | In | Type | Required | Description |
---|---|---|---|---|
body | body | AudioToTextComparisonRequest | true | none |
» audio_file | body | string(uuid) | true | none |
» text | body | [string] | true | none |
» audio_model | body | string | true | none |
Example responses
200 Response
{
"task_id": "736fde4d-9029-4915-8189-01353d6982cb"
}
404 Response
{
"detail": "No such audio model available"
}
Responses
Status | Meaning | Description | Schema |
---|---|---|---|
200 | OK | Task was successfully created and scheduled | TaskCreateResponse |
404 | Not Found | The specified file or model was not found. | None |
422 | Unprocessable Entity | Validation Error | HTTPValidationError |
Response Schema
To perform this operation, you must be authenticated by means of one of the following methods: OAuth2PasswordBearer
/audio/text/result
retrieves the results of a task with a given task ID, and returns the
The endpoint results.
Code samples
GET /v1/comparison/audio/text/result
Parameters:
-
task_id: The
task_id
is the uuid of the task to fetch results of
Responses:
- 200, job results in the format
{
"audio": { // audio processing results
"text": "string", // total extracted text
"segments": [ // audio segments, that were processed
{
"start": 0, // absolute time code of the beginning of the segment
"end": 0, // absolute time code of the ending of the segment
"text": "string", // text extracted from the segment
"file": "3fa85f64-5717-4562-b3fc-2c963f66afa6" // audio segment
}
]
},
"errors": [ // results of comparing
{
"audio_segment": { // audio segment where error was made
"start": 0,
"end": 0,
"text": "string",
"file": "3fa85f64-5717-4562-b3fc-2c963f66afa6"
},
"at_char": 0, // chat, at which an error stats
"found": "string", // found word (based on audio)
"expected": "string" // exptected word (suggetion for improvement based on text)
}
]
}
- 406, Results are not ready yet or no task with such id exist
- 422, There is no such audio processing task
Parameters
Name | In | Type | Required | Description |
---|---|---|---|---|
task_id | query | string(uuid) | true | none |
Example responses
200 Response
{
"audio": {
"text": "string",
"segments": [
{
"start": 0,
"end": 0,
"text": "string",
"file": "00bd29cf-1ab3-4825-b15f-d80a4a0e1cbb"
}
]
},
"errors": [
{
"audio_segment": {
"start": 0,
"end": 0,
"text": "string",
"file": "00bd29cf-1ab3-4825-b15f-d80a4a0e1cbb"
},
"at_char": 0,
"found": "string",
"expected": "string"
}
]
}
406 Response
{
"detail": "Results are not ready yet or no task with such id exist"
}
422 Response
{
"detail": "There is no such task consists of the both audio and text"
}
Responses
Status | Meaning | Description | Schema |
---|---|---|---|
200 | OK | Successful Response | AudioTextComparisonResultsResponse |
406 | Not Acceptable | It is impossible to get task result (task does not exist or it has not finished yet). | None |
422 | Unprocessable Entity | There is no such task consists of the both audio and text. | None |
Response Schema
To perform this operation, you must be authenticated by means of one of the following methods: OAuth2PasswordBearertask
status
returns the status of a task identified by its task_id
.
The endpoint
Code samples
GET /v1/task/status
Parameters:
-
task_id: The
task_id
is the uuid of the task to fetch status of
Responses:
- 200, Job status
Parameters
Name | In | Type | Required | Description |
---|---|---|---|---|
task_id | query | string(uuid) | true | none |
Example responses
200 Response
{
"task_id": "736fde4d-9029-4915-8189-01353d6982cb",
"status": "string",
"ready": true
}
Responses
Status | Meaning | Description | Schema |
---|---|---|---|
200 | OK | Successful Response | TaskStatusResponse |
422 | Unprocessable Entity | Validation Error | HTTPValidationError |
/result
retrieves the results of a task with a given task ID, and returns the
The endpoint results.
Code samples
GET /v1/task/result
Parameters:
-
task_id: The
task_id
is the uuid of the task to fetch results of
Responses:
- 200, job results
- 406, Results are not ready yet or no task with such id exist
Parameters
Name | In | Type | Required | Description |
---|---|---|---|---|
task_id | query | string(uuid) | true | none |
Example responses
200 Response
{}
406 Response
{
"detail": "Results are not ready yet or no task with such id exist"
}
Responses
Status | Meaning | Description | Schema |
---|---|---|---|
200 | OK | Successful Response | Inline |
406 | Not Acceptable | It is impossible to get task result (task does not exist or it has not finished yet). | None |
422 | Unprocessable Entity | Validation Error | HTTPValidationError |
Response Schema
Status Code 200
Response Get Job Result V1 Task Result Get
Name | Type | Required | Restrictions | Description |
---|
Schemas
AudioChunk
{
"start": 0,
"end": 0,
"text": "string",
"file": "00bd29cf-1ab3-4825-b15f-d80a4a0e1cbb"
}
AudioChunk
Properties
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
start | number | true | none | none |
end | number | true | none | none |
text | string | true | none | none |
file | string(uuid) | true | none | none |
AudioExtractPhrasesRequest
{
"audio_file": "732b10bd-0006-4780-8f48-4319d2791290",
"audio_model": "string",
"phrases": [
"string"
]
}
AudioExtractPhrasesRequest
Properties
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
audio_file | string(uuid) | true | none | none |
audio_model | string | true | none | none |
phrases | [string] | true | none | none |
AudioExtractPhrasesResponse
{
"data": [
{
"audio_segment": {
"start": 0,
"end": 0,
"text": "string",
"file": "00bd29cf-1ab3-4825-b15f-d80a4a0e1cbb"
},
"found": true,
"phrase": "string"
}
]
}
AudioExtractPhrasesResponse
Properties
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
data | [AudioPhrase] | true | none | none |
AudioImageComparisonResultsResponse
{
"image": {
"text": "string",
"boxes": [
{
"text": "string",
"coordinates": {
"left_top": {
"x": 0,
"y": 0
},
"right_top": {
"x": 0,
"y": 0
},
"left_bottom": {
"x": 0,
"y": 0
},
"right_bottom": {
"x": 0,
"y": 0
}
}
}
]
},
"audio": {
"text": "string",
"segments": [
{
"start": 0,
"end": 0,
"text": "string",
"file": "00bd29cf-1ab3-4825-b15f-d80a4a0e1cbb"
}
]
},
"errors": [
{
"audio_segment": {
"start": 0,
"end": 0,
"text": "string",
"file": "00bd29cf-1ab3-4825-b15f-d80a4a0e1cbb"
},
"at_char": 0,
"found": "string",
"expected": "string"
}
]
}
AudioImageComparisonResultsResponse
Properties
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
image | ImageProcessingResponse | true | none | none |
audio | AudioProcessingResponse | true | none | none |
errors | [TextDiff] | true | none | none |
AudioPhrase
{
"audio_segment": {
"start": 0,
"end": 0,
"text": "string",
"file": "00bd29cf-1ab3-4825-b15f-d80a4a0e1cbb"
},
"found": true,
"phrase": "string"
}
AudioPhrase
Properties
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
audio_segment | AudioChunk | false | none | none |
found | boolean | true | none | none |
phrase | string | true | none | none |
AudioProcessingRequest
{
"audio_file": "732b10bd-0006-4780-8f48-4319d2791290",
"audio_model": "string"
}
AudioProcessingRequest
Properties
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
audio_file | string(uuid) | true | none | none |
audio_model | string | true | none | none |
AudioProcessingResponse
{
"text": "string",
"segments": [
{
"start": 0,
"end": 0,
"text": "string",
"file": "00bd29cf-1ab3-4825-b15f-d80a4a0e1cbb"
}
]
}
AudioProcessingResponse
Properties
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
text | string | true | none | none |
segments | [AudioChunk] | true | none | none |
AudioTextComparisonResultsResponse
{
"audio": {
"text": "string",
"segments": [
{
"start": 0,
"end": 0,
"text": "string",
"file": "00bd29cf-1ab3-4825-b15f-d80a4a0e1cbb"
}
]
},
"errors": [
{
"audio_segment": {
"start": 0,
"end": 0,
"text": "string",
"file": "00bd29cf-1ab3-4825-b15f-d80a4a0e1cbb"
},
"at_char": 0,
"found": "string",
"expected": "string"
}
]
}
AudioTextComparisonResultsResponse
Properties
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
audio | AudioProcessingResponse | true | none | none |
errors | [TextDiff] | true | none | none |
AudioToImageComparisonRequest
{
"audio_file": "732b10bd-0006-4780-8f48-4319d2791290",
"image_file": "89f23c23-fe12-4935-b746-3bbc447c7a72",
"audio_model": "string",
"image_model": "string"
}
AudioToImageComparisonRequest
Properties
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
audio_file | string(uuid) | true | none | none |
image_file | string(uuid) | true | none | none |
audio_model | string | true | none | none |
image_model | string | true | none | none |
AudioToTextComparisonRequest
{
"audio_file": "732b10bd-0006-4780-8f48-4319d2791290",
"text": [
"string"
],
"audio_model": "string"
}
AudioToTextComparisonRequest
Properties
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
audio_file | string(uuid) | true | none | none |
text | [string] | true | none | none |
audio_model | string | true | none | none |
Body_login_for_access_token_v1_auth_token_post
{
"grant_type": "string",
"username": "string",
"password": "string",
"scope": "",
"client_id": "string",
"client_secret": "string"
}
Body_login_for_access_token_v1_auth_token_post
Properties
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
grant_type | string | false | none | none |
username | string | true | none | none |
password | string | true | none | none |
scope | string | false | none | none |
client_id | string | false | none | none |
client_secret | string | false | none | none |
Body_upload_audio_file_v1_audio_upload_post
{
"upload_file": "string"
}
Body_upload_audio_file_v1_audio_upload_post
Properties
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
upload_file | string(binary) | true | none | none |
Body_upload_image_v1_image_upload_post
{
"upload_file": "string"
}
Body_upload_image_v1_image_upload_post
Properties
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
upload_file | string(binary) | true | none | none |
HTTPValidationError
{
"detail": [
{
"loc": [
"string"
],
"msg": "string",
"type": "string"
}
]
}
HTTPValidationError
Properties
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
detail | [ValidationError] | false | none | none |
IPRPoint
{
"x": 0,
"y": 0
}
IPRPoint
Properties
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
x | integer | true | none | none |
y | integer | true | none | none |
IPRRectangle
{
"left_top": {
"x": 0,
"y": 0
},
"right_top": {
"x": 0,
"y": 0
},
"left_bottom": {
"x": 0,
"y": 0
},
"right_bottom": {
"x": 0,
"y": 0
}
}
IPRRectangle
Properties
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
left_top | IPRPoint | true | none | none |
right_top | IPRPoint | true | none | none |
left_bottom | IPRPoint | true | none | none |
right_bottom | IPRPoint | true | none | none |
IPRTextBox
{
"text": "string",
"coordinates": {
"left_top": {
"x": 0,
"y": 0
},
"right_top": {
"x": 0,
"y": 0
},
"left_bottom": {
"x": 0,
"y": 0
},
"right_bottom": {
"x": 0,
"y": 0
}
}
}
IPRTextBox
Properties
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
text | string | true | none | none |
coordinates | IPRRectangle | true | none | none |
ImageProcessingRequest
{
"image_file": "89f23c23-fe12-4935-b746-3bbc447c7a72",
"image_model": "string"
}
ImageProcessingRequest
Properties
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
image_file | string(uuid) | true | none | none |
image_model | string | true | none | none |
ImageProcessingResponse
{
"text": "string",
"boxes": [
{
"text": "string",
"coordinates": {
"left_top": {
"x": 0,
"y": 0
},
"right_top": {
"x": 0,
"y": 0
},
"left_bottom": {
"x": 0,
"y": 0
},
"right_bottom": {
"x": 0,
"y": 0
}
}
}
]
}
ImageProcessingResponse
Properties
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
text | string | true | none | none |
boxes | [IPRTextBox] | true | none | none |
ModelData
{
"name": "string",
"languages": [
"string"
],
"description": "string"
}
ModelData
Properties
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
name | string | true | none | none |
languages | [string] | true | none | none |
description | string | true | none | none |
ModelsDataReponse
{
"models": [
{
"name": "string",
"languages": [
"string"
],
"description": "string"
}
]
}
ModelsDataReponse
Properties
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
models | [ModelData] | true | none | none |
RegisterResponse
{
"text": "string"
}
RegisterResponse
Properties
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
text | string | true | none | none |
TaskCreateResponse
{
"task_id": "736fde4d-9029-4915-8189-01353d6982cb"
}
TaskCreateResponse
Properties
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
task_id | string(uuid) | true | none | none |
TaskStatusResponse
{
"task_id": "736fde4d-9029-4915-8189-01353d6982cb",
"status": "string",
"ready": true
}
TaskStatusResponse
Properties
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
task_id | string(uuid) | true | none | none |
status | string | true | none | none |
ready | boolean | true | none | none |
TextDiff
{
"audio_segment": {
"start": 0,
"end": 0,
"text": "string",
"file": "00bd29cf-1ab3-4825-b15f-d80a4a0e1cbb"
},
"at_char": 0,
"found": "string",
"expected": "string"
}
TextDiff
Properties
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
audio_segment | AudioChunk | true | none | none |
at_char | integer | true | none | none |
found | string | true | none | none |
expected | string | true | none | none |
Token
{
"access_token": "string",
"token_type": "string"
}
Token
Properties
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
access_token | string | true | none | none |
token_type | string | true | none | none |
UploadFileResponse
{
"file_id": "8a0cfb4f-ddc9-436d-91bb-75133c583767"
}
UploadFileResponse
Properties
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
file_id | string(uuid) | true | none | none |
User
{
"username": "string",
"email": "string",
"full_name": "string",
"disabled": true
}
User
Properties
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
username | string | true | none | none |
string | false | none | none | |
full_name | string | false | none | none |
disabled | boolean | false | none | none |
ValidationError
{
"loc": [
"string"
],
"msg": "string",
"type": "string"
}
ValidationError
Properties
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
loc | [anyOf] | true | none | none |
anyOf
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
» anonymous | string | false | none | none |
or
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
» anonymous | integer | false | none | none |
continued
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
msg | string | true | none | none |
type | string | true | none | none |
Advanced
Audio
Audio Conversion
Our system uses the pydub python package
to work with audio files. The pydub
package is a high-level audio library that simplifies the process of audio file manipulation. This package relies on FFmpeg framework. FFmpeg is a multimedia framework that enables the operation of various audio and video file formats.
The pydub
package and FFmpeg
framework, together, support various audio file formats, including MP3, WAV, FLAC, M4A, among others. However, it is important to note that uploading of audio files to our system is restricted to the most general audio formats specified by MIMO. This is to ensure convenience and prevent errors when processing the uploaded files.
List of the most important allowed extensions:
- .acc
- .mp3
- .m4a
- .oga, .ogv
- .ogg
- .opus
- .wav
For a comprehensive list of the supported formats, please refer to: Full list of FFmpeg supported formats
Audio Models
Our system fetches audio models from a worker that loads plugins. This process is carried out by sending a request to the worker, which then returns the loaded plugins. The worker is responsible for loading audio processing plugins, which include machine learning models for audio analysis and other related functionalities.
To initiate this process, our system sends a request to the worker to retrieve the list of loaded plugins that are ready to use. This helps ensure that the audio models used in the system are up-to-date.
Image
Image
Image Models
Our system fetches image models from a worker that loads plugins. This process is carried out by sending a request to the worker, which then returns the loaded plugins. The worker is responsible for loading image processing plugins, which include machine learning models for image analysis and other related functionalities.
To initiate this process, our system sends a request to the worker to retrieve the list of loaded plugins that are ready to use. This helps ensure that the image models used in the system are up-to-date.
Task System
This is a set of functions and methods used in Follow My Reading task system.
_plugin_class_method_call
_plugin_class_method_call()
is a helper function that searches each plugin for class_name object. If the object is found, it loads the function from it and calls it with the filepath argument. It returns the result of the function.
dynamic_plugin_call
dynamic_plugin_call()
is a scheduled job that accepts class_name, function, and filepath as parameters. It calls _plugin_class_method_call() with these parameters.
load_plugins_into_memories
load_plugins_into_memories()
is a startup function that loads plugins.
audio_processing_call
audio_processing_call()
is a scheduled job that accepts audio_class, audio_function, and audio_path as parameters. It calls _audio_process() with these parameters.
image_processing_call
image_processing_call()
is a scheduled job that accepts image_class, image_function, and image_path as parameters. It calls _image_process() with these parameters.
compare_audio_image
compare_audio_image()
is a scheduled job that accepts audio_class, audio_function, audio_path, image_class, image_function, and image_path as parameters. It calls _audio_process() and _image_process() with these parameters. It matches resulted texts and returns the difference.
compare_audio_text
compare_audio_text()
is a scheduled job that accepts audio_class, audio_function, audio_path, and text as parameters. It calls _audio_process() with these parameters. It matches resulted texts and returns the difference.
_get_audio_plugins
_get_audio_plugins()
is a scheduled job that returns information about loaded audio plugins.
_get_image_plugins
_get_image_plugins()
is a scheduled job that returns information about loaded image plugins.
_extact_phrases_from_audio
_extact_phrases_from_audio()
is a helper function that extracts text from audio and searches for each phrase. It splits the audio by non-none intervals and assigns the splitted files. It returns the result of audio phrases extraction.
extact_phrases_from_audio
extact_phrases_from_audio()
is a scheduled job that accepts audio_class, audio_path, and phrases as parameters