========================== Making Transcripts with WhisperX ========================== 1. Enter the following code into the terminal: :code:`pip install whisperx`. 2. In that same respository, add a folder of your source files. 3. Enter the following prompt into the terminal: :code:`whisperx {filepath} --model turbo --output_dir output` You may need more commands depending on what you want from WhisperX. ------------------ Special prompts: ------------------ If the audio is in a language other than English, append the following code to your command prompt: :code:`--language {Language_Name}` To add diarization, append the following code: :code:`--diarize` To help diarization, append the following code: :code:`--min_speakers {Number} --max_speakers {Number}` Keep in mind that commercials, announcements, and narrators are also considered speakers. --------------- WhisperX output --------------- As output, five files will be generated: .json, .srt, .tsv, .txt, and .vtt To move the json and vtt to `whisper-editor `_, make sure you have the following add-to-json.py downloaded. .. code:: python from csv import DictReader import os import json import shutil json_file_data = "/Users/{Username}/whisper-reviewer/public/test_for_whisper_transcript_online.json" all_data = [] with open(json_file_data, "r") as my_json: audio_file_data = json.load(my_json) with open("mycsv.csv", "r") as my_file: reader = DictReader(my_file) for row in reader: if row['Title'] != "": all_data.append( { "uin": row['UIN'], "title": row['Title'], "audio": row['Audio'] } ) for path, directories, files in os.walk("output"): for file in files: if ".json" in file: identifier = file.split('.json')[0] name = [item["title"] for item in all_data if item['uin'] == identifier][0] full_name = f"{name} whisperx" audio_file_location = [item["audio"] for item in all_data if item['uin'] == identifier][0] audio_file_data.append( { "audio": audio_file_location, "url": f"./transcripts/{file}", "name": full_name } ) shutil.move(f"{path}/{file}", "/Users/{Username}/whisper-reviewer/public/transcripts") with open(json_file_data, "w") as f: json.dump(audio_file_data, f, indent=4) elif '.vtt' in file: shutil.move(f"{path}/{file}", "/Users/{Username}/whisper-reviewer/vtts/{output-folder}")