Chris Krecicki [2:13 PM] Setup for transcription project.
Steps for Admin:
Cost runs approximately $0.006 per 15-second increment (very low cost for testing)
and or I just need a audio file to run a pilot test betweem two different text to speech diarization
my github https://github.com/krecicki
https://dev.to/shayy/postgres-is-too-good-and-why-thats-actually-a-problem-4imc
https://www.bytebase.com/blog/features-i-wish-mysql-had-but-postgres-already-has/
looks like I need to run: echo "sk-proj-kgC52qr0bQyRPKjOXN65hj5oUlYk1c8SC8FaRoEYdB3FGMF3W47r46VEJyCHu0pXC86Vao_OT3BlbkFJCXeoAflB--MhCIgU7m3tYbTr2fDjl1fmlJifo9MWFZuPDNWKMNbe9km8FY4RrrkURQYfdLCzAA" | gcloud secrets create openai-api-key --data-file=-
or input manually here? https://console.cloud.google.com/security/secret-manager?inv=1&invt=Ab1Kcw&project=shield-legal-tools
OK. So the middleware is blocking me, internal_tools.applications table. The auto-discovery system registered the transcription app when I added the code, but I don't have permissions to access it. Can you add transcription permissions to my user account in the auth system?
There's a bug in auth/middleware.py line 73. The audit logging is trying to JSON serialize request.body which is bytes. It needs to be decoded to string first.
Change line 73 from:
python
json.dumps({'body': request.body}
To:
python
json.dumps({'body': request.body.decode('utf-8', errors='replace') if request.body else None}
This will safely decode the bytes to string for JSON serialization. Can I make this change without affecting anything else?
It happens when I try to load the .json file of the chat log into the chat window
I hate how python does not default to UTF-8 on windows but does on all other systems
evil windows, i've spent half my time setting up env and auth stuff
The JSON serialization issue is fixed, but now there's a database permissions problem. My user doesn't have permission to insert into the internal_tools.audit_log table. The error is:
permission denied for sequence audit_log_id_seq
Can you grant INSERT permissions on the audit_log table and its sequence to my database user?
lets not include empty files or installers in the repo
Also your html file does not extend from html/base.html
hmm what did you see? i only added two files -- yes the head in html, will fix
Next project, I don't need it asap. But how can I see who was hired within the last 60 days and match them to their recordings so I can grab 10 random intakes spaced over the different campaign. I'll then run them to transcribe, ask it the 3 questions legal wants to know and then it'll email them a report.
Here is what he wants; SOLUTION: In a perfect world, you would be able to write a script for me that would auto select intake conversations from people hired within 60 days. It would select 10 random intakes from each person on that list, spaced evenly over the different campaigns. It would then take each of the selected interviews and run them through the transcription and make check if the three issues above were correctly/incorrectly mentioned. Then the report would pop out and we would see how many issues there were. We would then run the script 2x a month(?) Depending on the cost and time.
we should have a github repo with utility files like this connector file you sent me
do i have permission to do so, so I don't have to keep it locally for the future?
You can start it. I will add more helpers tomorrow
Figured you might be on .. https://github.com/shield-legal/sandbox-toolbox/tree/main
we should also consider a github repo for the integrations guys so they can share and keep track of templates, changes etc
My phone is always on. There is a repo for integrations, that's where I grabbed the cloud postgres file from
seems tyson is making a new toolbox called slcommonutilities? Looks like hes adding a lot of the same files ... should I abandon my sandbox toolbox and let him take over it?
ah cool -- should i delete my repo i was doing it in?
sorry i left a bit early, was in pain
If you want, or you can leave it as your sandbox for some scripts...
OK cool. Sounds good. I didn't want to do double work when I saw it.
I spun up a project locally called untitled that is my local sandbox
All the tools in sl-common-utilities are current the sync versions not the async versions. the imap.py file came from the integrations project
It looks like your repo has the async versions since it was probably based on the website
We will probably roll the async versions in at some point but I want to talk to @Tyson Green about it first
I know some people have an account, not sure if its shared or a team/org thing. why?
figured i could use it to send legal a report without waiting for sendgrid
use zapier to send a gmail with the report
or a script locally on your machine with the application default creds to send as you?
i didnt want to have it scheduled running off my laptop
at least until next week. some things should have calmed down by then
hehe ill just send it by hand once he gives me feedback on what I sent him earlier, i think a lot of people are mentally already on vacation or are trying to close up work before tomorrow
I think Nick was more meaning the data from LawRuler / the answers to the questionnaires
such as standardizing address format...
i don't believe I have access to that database .. only postgres table five9bulkcalldatatabularv2
did I give you the ERD?
@Chris Krecicki , Lexi is getting the error shown in the below image when trying to transcribe either a .wav or .mp3 file
> Alexia O'Brien🦂 [9:04 AM]
> 769655,
request.ctx.user. It contains the following: {'iss': 'https://accounts.google.com', 'azp': '...', 'aud': '...', 'sub': '...', 'hd': '..domain..', 'email': '..email..', 'emailverified': True, 'athash': '...', 'name': '.', 'picture': '..url..', 'givenname': '.', 'familyname': '.', 'iat': 000000, 'exp': 000000}
I suggest just using the id, example: request.ctx.user['id']
For the tort identifier project cam wants. I'm going to use PostgreSQL instead of LanceDB. As usual, you're right - it is better. I need the following extensions installed to use it as a vector database:
• uuid-ossp
• pgtrgm
• btreegin
Can we do this? I didn't want to mess with the DB since I don't think I have superuser privileges.
Attached is the table schema I need created - it includes the CREATE EXTENSION statements and everything else needed. It's just one new table called legal_cases that won't affect any existing tables.
OK. For Cameron's tort opportunities. I'm going to use PostgreSQL instead of LanceDB. As usual, you're right - it is better. I need the following extensions installed to use it as a vector database:
• uuid-ossp
• pgtrgm
• btreegin
Can we do this? I didn't want to mess with the DB since I don't think I have superuser privileges.
Attached is the table schema I need created - it includes the CREATE EXTENSION statements and everything else needed. It's just one new table called new_tort_opportunities that won't affect any existing tables. Attached is the SQL file with everything that needs to be done.
https://hub.docker.com/_/postgres#-via-docker-compose
https://youtu.be/T17bpGItqXw?si=xC7e33Bf345-mYz0
boss we need one of these in the office with us
Outside of permission I just dont know
We've reviewd: network log comparisons from a successful computer and non-sucessful, we've compared logs from GCP, we went over the middleware files for auth. The only thing that make since is a firewall or permissions error.
All my work on this so far to scroll through https://claude.ai/share/04b96ea8-b8ee-42e8-b658-0adb1c3308a0
That new link still has 92k rows without a leadid, only 799 rows have a leadid
Sorry, my mistake, I was looking at the wrong column
I got you. I'm working out the migration stuff right now
```-- Find leads with final/declined status that have NO attorney emails -- This identifies true "System Failure" cases
SELECT l.id AS leadidnoattorneyemails, ct.name AS campaignname, ls.name AS statusname, l.createdate, 'System Failure - No Attorney Emails' AS issuetype FROM lead l JOIN casetype ct ON l.casetypeid = ct.id JOIN leadstatus ls ON l.statusid = ls.id WHERE ls.id IN (1074, 1075) -- Final or declined status AND l.createdate >= '2022-01-01' -- Recent data AND l.id NOT IN ( -- Exclude leads that DO have attorney emails SELECT DISTINCT lh.leadid FROM lead_history lh WHERE ( -- Check for ANY of the 110 attorney domains lh.username LIKE '%@800goldlaw.com%' OR lh.username LIKE '%@aaronlawgroup.com%' OR lh.username LIKE '%@actslaw.com%' OR lh.username LIKE '%@openjar.com' OR lh.username LIKE '%@alexwalshlaw.com%' OR lh.username LIKE '%@amirianlawgroup.com%' OR lh.username LIKE '%@anapolweiss.com%' OR lh.username LIKE '%@antheminjurylaw.com%' OR lh.username LIKE '%@asilpc.com%' OR lh.username LIKE '%@askllp.com%' OR lh.username LIKE '%@aswtlawyers.com%' OR lh.username LIKE '%@attorney4life.com%' OR lh.username LIKE '%@attorneyrobertwoods.com%' OR lh.username LIKE '%@awkolaw.com%' OR lh.username LIKE '%@babinlaws.com%' OR lh.username LIKE '%@baileyglasser.com%' OR lh.username LIKE '%@bbtrial.com%' OR lh.username LIKE '%@beasleyallen.com%' OR lh.username LIKE '%@bencrump.com%' OR lh.username LIKE '%@benefitshealth.com%' OR lh.username LIKE '%@bighornlaw.com%' OR lh.username LIKE '%@bowersoxlaw.com%' OR lh.username LIKE '%@bradleygrombacher.com%' OR lh.username LIKE '%@bradmorrislawfirm.com%' OR lh.username LIKE '%@burgsimpson.com%' OR lh.username LIKE '%@christophergolden.org%' OR lh.username LIKE '%@cochrantexas.com%' OR lh.username LIKE '%@collinslaw.com%' OR lh.username LIKE '%@coopermasterman.com%' OR lh.username LIKE '%@cpialaw.com%' OR lh.username LIKE '%@dawsonmedlocklaw.com%' OR lh.username LIKE '%@douglasandlondon.com%' OR lh.username LIKE '%@dsbcllc.com%' OR lh.username LIKE '%@elglaw.com%' OR lh.username LIKE '%@extralegalhelp.com%' OR lh.username LIKE '%@fabianlawfirm.com%' OR lh.username LIKE '%@federmanlaw.com%' OR lh.username LIKE '%@fitzpatrickfirm.com%' OR lh.username LIKE '%@fleischnerlawfirm.com%' OR lh.username LIKE '%@flknjlaw.com%' OR lh.username LIKE '%@fortlauderdaletriallaw.com%' OR lh.username LIKE '%@fraserlawfirmllc.com%' OR lh.username LIKE '%@goldbergkohn.com%' OR lh.username LIKE '%@goldeninjurylaw.com%' OR lh.username LIKE '%@greene-phillips.com%' OR lh.username LIKE '%@guyratner.com%' OR lh.username LIKE '%@herrmanlaw.com%' OR lh.username LIKE '%@hrnjlaw.com%' OR lh.username LIKE '%@injurylaw.com%' OR lh.username LIKE '%@injurylawyers.com%' OR lh.username LIKE '%@jeanlawfirm.com%' OR lh.username LIKE '%@jllawfirm.net%' OR lh.username LIKE '%@johnbales.com%' OR lh.username LIKE '%@josephryancpa.com%' OR lh.username LIKE '%@kennedylaw.net%' OR lh.username LIKE '%@kline-specter.com%' OR lh.username LIKE '%@koenigfirm.com%' OR lh.username LIKE '%@kornbluthlaw.com%' OR lh.username LIKE '%@landaverde-law.com%' OR lh.username LIKE '%@law-lls.com%' OR lh.username LIKE '%@law-rm.com%' OR lh.username LIKE '%@lawfirm4you.com%' OR lh.username LIKE '%@lawkb.com%' OR lh.username LIKE '%@lawrenceadler.com%' OR lh.username LIKE '%@lawyerseattle.com%' OR lh.username LIKE '%@lexlaw.com%' OR lh.username LIKE '%@lieffcabraser.com%' OR lh.username LIKE '%@lopezlaw.com%' OR lh.username LIKE '%@lpfllp.com%' OR lh.username LIKE '%@marlerclark.com%' OR lh.username LIKE '%@martindale.com%' OR lh.username LIKE '%@mctlaw.com%' OR lh.username LIKE '%@miaminursinghomelaw.com%' OR lh.username LIKE '%@mmalaw.com%' OR lh.username LIKE '%@mnoglaw.com%' OR lh.username LIKE '%@morganlaw.com%' OR lh.username LIKE '%@morganmorganpa.com%' OR lh.username LIKE '%@moseleycollins.com%' OR lh.username LIKE '%@mundylaw.com%' OR lh.username LIKE '%@napolilaw.com%' OR lh.username LIKE '%@nelsonlaw.com%' OR lh.username LIKE '%@ohioemploymentlaw.com%' OR lh.username LIKE '%@onderlaw.com%' OR lh.username LIKE '%@owenslaw.com%' OR lh.username LIKE '%@parmelelaw.com%' OR lh.username LIKE '%@peckarlaw.com%' OR lh.username LIKE '%@philipslaw.com%' OR lh.username LIKE '%@purcellkrug.com%' OR lh.username LIKE '%@rheingoldlaw.com%' OR lh.username LIKE '%@robinslawoffices.com%' OR lh.username LIKE '%@ryanlaw.com%' OR lh.username LIKE '%@sairamlaw.com%' OR lh.username LIKE '%@sgghlaw.com%' OR lh.username LIKE '%@simmonsfirm.com%' OR lh.username LIKE '%@strausslaw.com%' OR lh.username LIKE '%@stuartandstuart.com%' OR lh.username LIKE '%@tampanursinghomelaw.com%' OR lh.username LIKE '%@teichmanlaw.com%' OR lh.username LIKE '%@thaddeusculley.com%' OR lh.username LIKE '%@theblumfirm.com%' OR lh.username LIKE '%@thelegaladvocate.com%' OR lh.username LIKE '%@theschifferlaw.com%' OR lh.username LIKE '%@turleylawfirm.com%' OR lh.username LIKE '%@waldmancerny.com%' OR lh.username LIKE '%@walterclark.com%' OR lh.username LIKE '%@weitzlux.com%' OR lh.username LIKE '%@wilsonlaw.com%' OR lh.username LIKE '%@yourlawyer.com%' OR lh.username LIKE '%@yourvaccinelawyer.com%' OR lh.username LIKE '%@zimmreed.com%' ) ) ORDER BY ct.name, l.createdate DESC;```
It works. I have a handful of people using it now.
i have about 12 people on the floor using it
going to be showing brooke ghannon and a few others soon
So I was thinking on the issue with the automation for pulling these "did not send emails" -- we could have a new table called attorney_domains -- and we can get a list of all the automation, etc emails and have an automation that looks for new emails and adds them if they NOT LIKE.
Lets stop sending Github links unless you know that everyone is on github and has access to that repo. Just send the file. If you expect to update the file later then use Google Drive
Also most people do not require 2 pings (email + slack)...
Amidayne Nelsen/2025/07/29/16-52-15 5594102415.mp3: you need a private key to sign credentials.the credentials you are currently using <class 'google.oauth2.credentials.Credentials'> just contains a token. see https://googleapis.dev/python/google-api-core/latest/auth.html#setting-up-a-service-account for more details.
OK. Everything is working, finally. The only issue is I don't have permissiosn to upload to the bucket, these transcribed json files, they have the same filename as the audio just with .json at the end
Permission given. Can we get a SQL query that shows the record/transcript path/url to make as a view in the DB?
yeah we can do that what table? or should i make a new one? What should we use as a key for the table? I just got in I'll see you soon
CREATE TABLE IF NOT EXISTS transcriptions (
id SERIAL PRIMARY KEY,
audio_file_path TEXT NOT NULL,
transcription_json_path TEXT,
bucket_json_path TEXT,
status VARCHAR(20),
created_at TIMESTAMP DEFAULT NOW(),
completed_at TIMESTAMP,
file_size BIGINT,
duration_seconds INTEGER,
error_message TEXT,
assemblyai_id TEXT
);
also it looks like were uploading to gcs
That query only creates the table, what about one that generates the paths using SQL dynamically?
To dynamically generate the full GCS URLs? And not just the paths?
CREATE VIEW transcriptionfilepaths AS
SELECT
id,
audiofilepath,
'
But the transcriptions table is a local table to you. Can this work from the five9.five9_bulk_call_data_tabularv2?
i can add these columns to the five9.five9bulkcalldatatabularv2 and the views. I do not believe I have write access. Only read.
i just made this local db for testing
what code did you use to generate the audiofilepath?
everything works perfectly -- finds files correctly etc
this is the file doing all the work once I have the depoproverafilenames.json
check this out! https://openai.com/index/introducing-gpt-oss/
first time since GPT-2 they did this
https://cookbook.openai.com/articles/gpt-oss/fine-tune-transfomers -- we can fine tune it too
how does your code handle when there are multiple recordings for the same call? ex: row['recordings']='10:33:32(0:35) 10:34:16(0:31) 10:34:54(0:21)'
it is based on the time stamp so it shouldn't interfere
what do you think? that's my imagination considering things
```# Extract time from recordings field (format: "HH:MM:SS(duration)") recordingstime = row['recordings'] if recordingstime and isinstance(recordingstime, str): timematch = recordingstime.split('(')[0] # Get "HH:MM:SS" part hour, minute, second = timematch.split(':')
# Add 7 hours to match actual filename timezone
hour_int = int(hour) + 7
if hour_int >= 24:
hour_int -= 24
# If we roll over to next day, increment the date
base_timestamp += pd.Timedelta(days=1)
else:
# Fallback to timestamp_millisecond if recordings field is missing
timestamp = base_timestamp + pd.Timedelta(hours=7)
hour = f"{timestamp.hour:02d}"
minute = f"{timestamp.minute:02d}"
second = f"{timestamp.second:02d}"
hour_int = timestamp.hour```
recordings_time can hold multiple like I showed yesterday
yes but when I look at the filenames for download and someones name shows multiple times
"Nastashia Friday/2025/07/25/21-35-57 4243394108.mp3",
"Nastashia Friday/2025/07/25/20-30-23 5013924661.mp3",
"Nastashia Friday/2025/07/25/20-19-58 4793525654.mp3",
if you look at depoproverafilenames.json
I was thinking about converting the generate_filename function to SQL. I cannot find the code that gets the rows from five9.five9... . If we use the rows from the table then we need to account that 1 call record can turn into multiple recordings
everything works perfectly right now
run the query, makes a csv, i run a file to build file paths, the script uses those file paths to download and transcribe, insert to db table i made, uploads to gcs
run the query, makes a csv, i run a file to build file paths, the script uses those file paths to download and transcribe, uploads to gcs
run the query, makes a csv, i run a file to build file paths, the script uses those file paths to download and transcribe, add to db table i made, uploads to gcs
run the query, makes a csv, i run a file to build file paths, the script uses those file paths to download and transcribe, insert to db table i made, uploads to gcs
i just did something for josh to auto fill questions with an LLM -- I can make a new script to globally do dummy data fills. The reason I took a hardcoded approach in the file you were referring to regarding lr_question.csv is it would cost 100's of API calls and we only have 10,000 per day. We do have gpt-oss now though and structured output support and we could do it locally. We need a local server or computer running a reverse proxy so we can hit its endpoint running on ollama. It would be a huge win for us.
Consider. I'll approach Joe about it.
Or we can just do it ourselves. But that would be hella expensive considering the container costs you were showing me and we need well over 16GB of RAM
*Thread Reply:* Can we find a smaller local model that can it?
*Thread Reply:* thats the smallest model that openai has open sourced that was trained to do structured data -- josh is looking into this topic as we speak
Stop bugging joe I will look soon
Did the medilens stuff come from @James Scott?
*Thread Reply:* If so, I feel like it should have been a separate branch/PR with the inital code he gave you then any changes
*Thread Reply:* its all in my branch
*Thread Reply:* well he only gave me a chopped up version of your app with a ton of technical debt, left over files, using an env, didnt async and load bigquery data later .. the list goes on
*Thread Reply:* i spent the entire day taking what he had a fixing all of that and making it work for our actual sanic setup
*Thread Reply:* it was not worth sharing his code
*Thread Reply:* he has it in his own github if you want me to pull it and send it to you
well he only gave me a chopped up version of your app with a ton of technical debt, left over files, using an env, didnt async and load bigquery data later .. the list goes on
Please remove the tort-finder route from the db it is called Predator now. When I pushed to main in the past, it updated the db it looks like with that route and now it shows in the nav.
The folder in the code is still called tort-finder in the code on your branch. Until that is updated it will constantly come back
i know, i havent pushed the next updates yet with that change, but it is done locally
once it is updated in github, I will fix the table
i will never push to main again, i apologize again
https://youtu.be/N5xhOqlvRh4?si=g2I5Qgg7X3TE6T
yeah llama.cpp the guy who made that is physicist and made it in his free time
its a pretty incredible project ... microsoft actually ripped him off and made a 1-bit version
really annoying to see they did that but that is their history right?
oh man that really took a dive on performance splitting the load didn't expect that, i never had watched one of these cluster videos but i didn't expect it to be like this
so single vertically scaled versus clustering is the way to go with these
i wonder why he couldnt even get the big models to load without rpc approach
i still think back propagation is the most interesting part of all of this an updating embedding dimensions (weights)
i still think back propagation is the most interesting part of all of this an updating embedding dimensions (weights)
https://www.infoq.com/news/2025/08/google-langextract-python/
@Dustin Surwill google has to catch up .. neat package i am reading over it
I've tested these locally. If anything does come up, it'll be a simple change. I don't forsee any issues though.
Can you make an agent that takes a payload.json file and checks for some of the logic in this file? https://docs.google.com/document/d/18orS2h5giBLjlnZopWlt7IfcUJ60QiV2/edit
You give comments of the google doc file to make it better for an agent
yes sir, let me finish up this defendant cluster tab in predator
AND document_response NOT ILIKE ANY(ARRAY['%test-%','%tester-%','%mock-%','%-test%','%-mock%'])
postgres.query('SELECT lawrulerfield, answer, leadid FROM lead_question_lawruler WHERE leadid=ANY(%s)', (lead_ids,))
https://github.com/shield-legal/internal-tools-site/pull/16 -- After two hours of dealing with asyncpg.exceptions._base.InterfaceError: cannot perform operation: another operation is in progress while making these concurrent, I found this issue MagicStack/asyncpg#738 - turns out you can't run multiple queries concurrently on a single PostgreSQL connection due to the protocol itself. I've optimized the main bottlenecks (control panel, chart data, chat assistant) by using separate connections from the pool for concurrent queries, but kept the rest sequential to avoid overcomplicating things.
How did you get the leadids for the contactids in the data import
i matched the contactids with the leadids
i am still not sure how this small subset got messed up
Like for this one newcontactid,leadid,oldcontactid 642216,576856,642238
in the last file upload (had to split them up)
LeadID 576856 also shows 642238 for the contactID
where do you see new_contactid 642216
For that contact id of 642216 I have leadid 750465
I do see where address1 got the shifted values
Whats so weird is it only did it to the last file it looks like
this is the file I used -- only the last file had this issue with the columns, the others are fine
Can you spot check the lead id / contact ID in the file i sent in the other chat then import it?
Yes. Also, it looks like when it hit line 84736, For some reason when it got to that line the address was 5800 Dr .. script must have got confused and existingdf = pd.readcsv(outputfile) _# Reads corrupted structure newdf = pd.DataFrame(batchresults) # New data in wrong format combineddf = pd.concat([existingdf, newdf]) combineddf.tocsv(outputfile, index=False) # Saves broken structure
diff.csv just shows which had bad lead ids (almost all)
ill do a spot check on these
Sorry dustin, ill make sure to not assume next time, what a weird bug
never crossed my mind that would happen
I had James put this together. I wanted to run it by you first https://docs.google.com/spreadsheets/d/1uAIh7MQWneMQY7i-ECSUHc6b_6-80Bq-qw5qdEeQA1E/edit?gid=0#gid=0
I dont think you need the cost, five9 data, leadspedia data (planned to be postgres). The only table you might need if you need the related firms per case type is the pctid_v2
I suggest starting with the smallest amount of tables / columns then slowly adding as nessessary
I need access to the specific BigQuery table tort-intake-professionals.<a href="http://tip_prod_application.io">tip_prod_application.io</a>_tip_lr_status_rates. I can access BigQuery but get permission denied on this table.
I am using this. I know its async. I'll strip it out later.
Error occurred: 403 POST https://bigquery.googleapis.com/bigquery/v2/projects/tort-intake-professionals/queries?prettyPrint=false: Access Denied: Table tort-intake-professionals:tipprodapplication.iotiplrstatusrates: User does not have permission to query table tort-intake-professionals:tipprodapplication.iotiplrstatusrates, or perhaps it does not exist.
What do you see here? https://console.cloud.google.com/bigquery?project=tort-intake-professionals&ws=!1m4!1m3!3m2!1stort-intake-professionals!2stip_prod_application
Almost there ... I need the BigQuery Job User role (or BigQuery User role) on the tort-intake-professionals project to run queries. Currently I can see the data but can't execute queries because I'm missing the bigquery.jobs.create permission.
Let me know when I have this .. still doesn't look like it works, same error -- when you have time
You should use the same logic that is in @James Turner code. (The SQL above)
It looks like theyre using
-- \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
-- Latest status snapshot
-- \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
latest_status AS (
SELECT
leadid,
tostatus,
ROW_NUMBER() OVER (PARTITION BY leadid ORDER BY status_change_timestamp_PST DESC) AS rn
FROM lead_status_history
),
They use CURRENT status to determine if billable, not earliest, can you confirm because that's what it looks like in James quries.
Ryan just changed his mind again on how this will work. I guess we're just doing this for hard validation for now versus for questions actually make sense to the answer.
check this out https://groq.com
im going to give this a whirl, i can use the same llama3.1:8b model but it does inference at 800+ t/s
Yes? In Ward's office...
In reply to your email last night, we can update the questions via the LR API...
If we do that, will update the same answer field though or can we move the old answer to a new column called "old answer"
same field. can mostly use lead_history for old answer
yeah i hit him up for his info and table names
bigquery does not require his credentials, just yours or a service account
Just delete both of them and Ill do it over with the correct naming
Ryan gave you admin. LMK once you copy those tables over. I'll get to work on refactoring bigquery out and updated with postgres queries
lmk when those bigquery stuff is done .. by Tuesday plz
I am only seeing tables in the drug_model dataset. All 119 tables?
you know what i can't even see the tables in the schemas
drugmodel.integtestdrugrankingllmrecommendation --- drugmodel.productionclinicaltrial --- kestraadverseeventsranking --- jupyteradverseevents_ranking
jupyter_adverse_events_ranking imported to jupyter_adverse_events_ranking: 4,884 rows (3 sec, 854 ms, 1628 rows/s)
production_clinical_trial imported to production_clinical_trial: 10,177 rows (15 sec, 461 ms, 678 rows/s)
kestra_adverse_events_ranking imported to kestra_adverse_events_ranking: 10,059 rows (4 sec, 131 ms, 2515 rows/s)
integ_test_drug_ranking_llm_recommendation imported to integ_test_drug_ranking_llm_recommendation: 4,714 rows (8 sec, 444 ms, 589 rows/s)
To properly fix the MediLens application, you need to address these specific issues:
score_results column contains malformed BigQuery export format
• Column name truncations break queries and make maintenance difficult
• Inconsistent naming conventions (hyphens vs underscores) require special quoting