Chris Krecicki (ckrecicki@shield-legal.com)
2025-08-01 10:08:46

Good morning everyone. This chat is for the mass transcription project.

Ryan (ryan@themedialab.agency)
2025-08-01 10:11:12

@Ryan has joined the conversation

Dustin Surwill (dsurwill@shield-legal.com)
2025-08-01 10:21:07

@Dustin Surwill has joined the conversation

Chris Krecicki (ckrecicki@shield-legal.com)
2025-08-01 10:23:25

Hey @Nick Ward I got your email and wanted to clarification on the Law Ruler status criteria. When you say 'All' do you mean, every single status in the law ruler system, or all of the specific statuses you listed, already represented, declined, etc? I want to make sure we pull the right data set. I remember our talk, I am just making 100% sure.

Chris Krecicki (ckrecicki@shield-legal.com)
2025-08-01 10:23:47

Hey @Nick Ward I got your email and wanted to clarification on the Law Ruler status criteria. When you say 'All' do you mean, every single status in the law ruler system, or all of the specific statuses you listed, already represented, declined, etc? I want to make sure we pull the right data set. I remember our talk, I am just making 100% sure.

Chris Krecicki (ckrecicki@shield-legal.com)
2025-08-01 11:05:53

Update on the joining criteria,

James, Dustin & I talked about this yesterday at different times. Here is what we came up with.

This is how the files are joined and found because this is how they are saved in the sl-five9-recordings google cloud bucket: Alizabeth Burge-2025-02-14-14-44-01 6103687090.mp3 aka {agent_name}-{year}-{month}-{day}-{hour}-{minute}-{second} {phone}.mp3

I have a script we use for the legal compliance pipeline I will alter for downloading and transcribing files from the sl-five9-recordings google cloud bucket.

We're going to be uploading the .json files with the transcriptions sent back by assembly.ai into the google cloud bucket called sl-five9-recordings with the file name being Alizabeth Burge-2025-02-14-14-44-01 6103687090.json aka {agent_name}-{year}-{month}-{day}-{hour}-{minute}-{second} {phone}.json

Once this is completed, in the future, when you want specific criteria, we can find the phone number in the database, match it to the criteria/statuses, and pull our already transcribed and stored .json transcribed data. We can avoid double storing data into postgres doing it this way and still have the ability to get certain calls.

Currently in my legal pipeline we look for a minimum of 120 seconds. Out of 600 audio files we pull for legal use we still end up with about 30 voicemail files. So 45 seconds won't be long enough.

As for cost, estimated total bucket: ~$9,826 (36,392 hours x $0.27 per hour) which is around 4.5m+ audio files. With the removal of calls under 120 seconds, we should cut that down, so this is a high end estimate.

Processing time with assembly.ai, speech-to-text (pre-recorded): 200 concurrent transcriptions with automatic queuing for overflow with 30 requests per minute (account-specific), fails with 429 error when exceeded.

Once all of this is completed, we can filter down criteria, pull the .json files we want from the sl-five9-recordings google cloud bucket convert it however SimpleTalk needs it and then fire them off via API or however they accept the files.

I've included a flow chart for visual aid.

πŸ‘ James Turner, James Scott
πŸ‘:skin_tone_4: Ryan
Chris Krecicki (ckrecicki@shield-legal.com)
2025-08-01 11:06:11
Chris Krecicki (ckrecicki@shield-legal.com)
2025-08-01 11:28:01
Nick Ward (nicholas@tortintakeprofessionals.com)
2025-08-01 11:29:28

@Chris Krecicki Right now we want all leads/calls that are in this campaign. The lead count is ~4.5K. Once we have the payload, I/we can evaluate whether or not to filter down the list to specific statuses before sending to the folks who will work with the data. I just want to be sure we catch all of the leads in this campaign (not vertical, not campaign type, just this specific campaign), with the current status labeled in the file for future filtering. Glad to chat if this needs more explanation. Thanks!

πŸ‘ James Turner
πŸ™:skin_tone_4: Ryan
Chris Krecicki (ckrecicki@shield-legal.com)
2025-08-01 11:30:56

OK so we're only downloading and transcribing those 4.5K -- what is the campaign. I'll make updates in the code and run a test, expect it by EOD or Monday. @Nick Ward

πŸ‘ James Turner
πŸ‘:skin_tone_4: Ryan
Chris Krecicki (ckrecicki@shield-legal.com)
2025-08-01 11:31:20

OK so we're only downloading and transcribing those 4.5K -- what is the campaign. I'll make update in the code. @Nick Ward

Chris Krecicki (ckrecicki@shield-legal.com)
2025-08-01 11:31:45

OK so we're only downloading and transcribing those 4.5K -- what is the campaign. I'll make updates in the code and run a test, expect it by EOD or Monday. @Nick Ward

Chris Krecicki (ckrecicki@shield-legal.com)
2025-08-01 11:44:55

@Nick Ward confirming the campaign is: Depo-Provera - DL - Flatirons - Shield Legal aka case_type 1923

:1000: Nick Ward
Chris Krecicki (ckrecicki@shield-legal.com)
2025-08-01 12:40:14

@Nick Ward @James Turner -- please confirm the sample sent via email

Nick Ward (nicholas@tortintakeprofessionals.com)
2025-08-01 13:47:33

Correct @Chris Krecicki, 1923 is the LR # for the campaign

Chris Krecicki (ckrecicki@shield-legal.com)
2025-08-01 14:29:51

Please check email chain everyone https://github.com/shield-legal/mass-transcription-simpletalk/blob/master/prod_data/depo_provera_filenames.json -- I have these ready to go. I just need the approval. @Joe Santana

Chris Krecicki (ckrecicki@shield-legal.com)
2025-08-04 13:19:21

give me a couple days .. learning that time stamps are 7+ **+ hours ahead in the bucket due to how they are uplaoded and file names vary .. give me a bit to sort this out -- unexpected things

πŸ‘ James Turner
πŸ‘:skin_tone_4: Ryan
Chris Krecicki (ckrecicki@shield-legal.com)
2025-08-04 14:54:26

Aight boys, this is done. Spent a few hours sorting out those caveats. I am waiting for Dustin to get back to add permissions so I can upload the .json file with the transcription to the call bucket. But it is done. When do you all want to setup a meeting to review all this before we move forward?

Chris Krecicki (ckrecicki@shield-legal.com)
2025-08-04 14:57:04

Shoot me a calendar invite when you get a time together

Ryan (ryan@themedialab.agency)
2025-08-04 15:43:01

Thanks @Chris Krecicki

πŸ™ Chris Krecicki
Chris Krecicki (ckrecicki@shield-legal.com)
2025-08-07 10:16:46

I'll run this starting tomorrow. Once they get the DB migrated we can just take our local DB inserts and import them to prod so prod can catch up.

Ryan (ryan@themedialab.agency)
2025-08-07 10:35:33

Ok. Let’s move the demo meeting to tomorrow, get everything ready. FYI @James Scott

Chris Krecicki (ckrecicki@shield-legal.com)
2025-08-07 10:37:32

@Ryan this is for the mass transcription project, what i was talking about here -- we should still do a demo today over the tort finder we were talking about in the ai-development chat

Chris Krecicki (ckrecicki@shield-legal.com)
2025-08-07 10:48:39

@Ryan check our thread in the other chat we have together

Ryan (ryan@themedialab.agency)
2025-08-07 10:54:10

Ok. Will move it back.

βœ… Chris Krecicki