Currently waiting on Google's Compose to form my env to use on my Vertex cronjob
Waiting on the 2PM scheduled cronjob of VertexAI
TODO: combinedcsvfile value needs to be changed to "../'drugadverseeventdatacombined_datasets.csv"
add unit tests for chunkified_get_case_entries_for_each_brand_name that checks if chunk_data resets properly
(When you get a moment, I'd like to discuss my solution with you and confirm that it doesnt compromise the training process)
My code finished after 30min, upserted 17,729 entries to integ_test_integ_test_adverse_events_ranking_logs .The number of entries matches the count for brand_names without cases
ssh-keygen -t rsa -f gcp-shield-legal -C dev-cronjob -b 2048
// Input was from my own integ_test_data_partial.csv
// response from bedrock api call using clincial_extraction.ipynb
bedrock api res: <response>
{
"results_conclusions": "SH-SY5Y-derived neurons express functional GR and MR, allowing the study of corticosteroid-induced transcription. Cortisol, dexamethasone, and aldosterone induced similar transcriptomic effects. Spironolactone mildly attenuated dexamethasone effects in some genes. Transcriptomic alterations were concordant with iPSC-derived neurons. Gene expression in these neurons showed stronger negative correlations with PTSD signatures than MDD signatures in postmortem brain samples.",
"score_results": [],
"injury": "Post-traumatic stress disorder (PTSD) and major depressive disorder (MDD)",
"remedy_type": "in vitro model",
"remedy_name": "SH-SY5Y-derived neurons",
"remedy_length": "Not specified",
"icd_chapter": "V: Mental and behavioural disorders",
"icd_code": "F43",
"icd10_name": "Reaction to severe stress, and adjustment disorders",
"risk_assessment": "neutral"
}
// my bedrock api response:
-- bedrock api res: {"ResponseMetadata": {"RequestId": "4c7d4758-6cdb-4e83-8c97-51c0380a59e3", "HTTPStatusCode": 200, "HTTPHeaders": {"date": "Tue, 25 Mar 2025 01:59:54 GMT", "content-type": "application/json", "content-length": "1008", "connection": "keep-alive", "x-amzn-requestid": "4c7d4758-6cdb-4e83-8c97-51c0380a59e3", "x-amzn-bedrock-invocation-latency": "5234", "x-amzn-bedrock-output-token-count": "219", "x-amzn-bedrock-input-token-count": "11364"}, "RetryAttempts": 0}, "contentType": "application/json", "body": "eyJpZCI6Im1zZ19iZHJrXzAxWGZMcjNuUHA2MVNDOWpSb2lMZjlITCIsInR5cGUiOiJtZXNzYWdlIiwicm9sZSI6ImFzc2lzdGFudCIsIm1vZGVsIjoiY2xhdWRlLTMtNS1zb25uZXQtMjAyNDA2MjAiLCJjb250ZW50IjpbeyJ0eXBlIjoidGV4dCIsInRleHQiOiI8cmVzcG9uc2U+XG57XG4gIFwicmVzdWx0c19jb25jbHVzaW9uc1wiOiBcIlRoZSBjYXNlIHJlcG9ydCBkZW1vbnN0cmF0ZXMgdGhlIGVmZmVjdGl2ZW5lc3Mgb2YgQlJBRiBhbmQgTUVLIGluaGliaXRvcnMgKGRhYnJhZmVuaWIgYW5kIHRyYW1ldGluaWIpIGluIHRyZWF0aW5nIHBsZW9tb3JwaGljIGNhcmNpbm9tYSB3aXRoIEJSQUYgbXV0YXRpb25zLCBldmVuIGluIGVsZGVybHkgcGF0aWVudHMuIFRoZSB0cmVhdG1lbnQgbGVkIHRvIHNpZ25pZmljYW50IGltcHJvdmVtZW50IGluIHN5bXB0b21zIGFuZCByYWRpb2xvZ2ljYWwgZmluZGluZ3MsIG1haW50YWluaW5nIHRoZSBwYXRpZW50J3MgY29uZGl0aW9uIGZvciBhYm91dCBuaW5lIG1vbnRocy5cIixcbiAgXCJzY29yZV9yZXN1bHRzXCI6IFtdLFxuICBcImluanVyeVwiOiBcIlBsZW9tb3JwaGljIGNhcmNpbm9tYSBvZiB0aGUgbHVuZ1wiLFxuICBcInJlbWVkeV90eXBlXCI6IFwibWVkaWNhdGlvblwiLFxuICBcInJlbWVkeV9uYW1lXCI6IFwiRGFicmFmZW5pYiBhbmQgVHJhbWV0aW5pYlwiLFxuICBcInJlbWVkeV9sZW5ndGhcIjogXCIxMSBtb250aHNcIixcbiAgXCJpY2RfY2hhcHRlclwiOiBcIklJOiBOZW9wbGFzbXNcIixcbiAgXCJpY2RfY29kZVwiOiBcIkMzNFwiLFxuICBcImljZDEwX25hbWVcIjogXCJNYWxpZ25hbnQgbmVvcGxhc20gb2YgYnJvbmNodXMgYW5kIGx1bmdcIixcbiAgXCJyaXNrX2Fzc2Vzc21lbnRcIjogXCJkZWNyZWFzZVwiXG59XG48L3Jlc3BvbnNlPiJ9XSwic3RvcF9yZWFzb24iOiJlbmRfdHVybiIsInN0b3Bfc2VxdWVuY2UiOm51bGwsInVzYWdlIjp7ImlucHV0X3Rva2VucyI6MTEzNjQsIm91dHB1dF90b2tlbnMiOjIxOX19"}
something fishy going on hmmm... why is it so different?
// the notebook's values in the validator()
-- cls: <class '__main__.QAResponseModel2'>
-- values: {'results_conclusions': "Loss of AKR1D1 promotes accumulation of iso-LCA through gut microbiome dysregulation, im
// my values in validator():
!!! values: ValidationInfo(config={'title': 'QAResponseModel2', 'extra_fields_behavior': 'allow'}, context=None, data=None, field_name=None)
I've setup an integration test with James's notebook code - but I need his help to verify that it's a valid test (bcuz the data seems good but it errors out for nan data)
i need faster turnarounds to debug properly
I understand the overall goal of the rank being joined BUT • why does the ranknet result in -1 ranks for the sample data? • why does the ranknet result in 87k results (sample data is just 1k)?
/Remind me tomorrow at 8:30AM TODO: Double check step8’s geticdretainedlabelsdata() function and how many times the notebook code invokes it
Current Concerns: • Step 9 integration test on input/output data length comparisons still FAILS • Dashboard rank table: the query is weird bcuz source table has rank1’s but the table has no rank 1’s displayed
SELECT **
FROM `tort-intake-professionals.lr_data.lead` as a
JOIN `tort-intake-professionals.lr_data.lead_question` as b ON a.id = b.leadid
WHERE a.id = 678018
username: <a href="mailto:jjosue@shield-legal.com">jjosue@shield-legal.com</a>
pw: lMZYWD35@&
<a href="http://tortintakeprofessionals.lawruler.com">tortintakeprofessionals.lawruler.com</a>
Question Custom ID Mapping
Mailing Address 2 <<Default9>> Contact/Address/0/Address2
State <<Default11>> 11
Middle Name <<Default3>> Contact/MiddleName
Date of Birth (Value must be a date) <<Default23>> 23
Marital Status <<Default89>> Contact/MaritalStatus
Primary Email <<Default16>> 16
First Name <<Default2>> Contact/Firstname
Suffix <<Default5>> Contact/Suffix
Business Phone <<Default14>> Contact/WorkPhone
Sex (Value must be M or F) <<Default22>> Contact/Gender
Date of Death (Value must be a date) <<Default91>> Contact/DateOfDeath
City <<Default10>> 10
Zip <<Default12>> 12
Full Name <<Default1>> 1
Last Name <<Default4>> Contact/Lastname
Mailing Address 1 <<Default8>> Contact/Address/0/Address1
Home Phone <<Default13>> Contact/HomePhone
Cell Phone <<Default15>> 15
SSN# <<Default20>> 20
Audit files with default columns be like:
2025-06-10 14:22:18,224 - INFO - Found lawrulerfield: 23 for question: Injured party's date of birth: - [23]
POI: audit_team_data_service.convert_audit_question_id_to_integrations_friendly_format()
POI: audit_team_data_service.convert_audit_question_id_to_integrations_friendly_format()
Printout the question_and_ids from defensureupdategcpentrywithanynewaudit_data()`
CHATGPT_API_KEY=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
this is a problem if q_id is c-91 and it comes across c-9123 first
def _search_for_custom_q_id_and_answer(self, q_id: str, gcp_entry: dict[str, str]) -> str:
res = ""
for gcp_q_verbiage in gcp_entry:
if q_id in gcp_q_verbiage:
res = gcp_q_verbiage
break
return res
Is the spreadsheet the source of truth as to whether or not the SA occurred during strip search?
Nick said they’ll work with Doug for a blanket statement for 6,7,8. If spreadsheet says yes, put blanket statement
If #4 cant be derived from the docs:
Ask Abe if #4 question about “why” will have blanket statement
Where do i get more sample docs to run thru tests?
Verified that field 4 is NOT guaranteed to answer the "why?"
left to create tests for: 248570, 251474, 252113
https://nlp.stanford.edu/projects/glove/
https://huggingface.co/docs/inference-providers/en/guides/structured-output