PrescrIA Project: Two agents, one clinical decision (And why does this matter?)

17 de July

15 minutos

Can artificial intelligence help treat mental health?

🔍 Experimental context note about PrescrIA This is a technical and educational experiment that seeks to explore the potential of artificial intelligence as a clinical decision support tool, always respecting Brazil's ethical, legal and regulatory boundaries. We do not replace any healthcare professionals, nor do we use AI to generate real prescriptions without human supervision. All simulated interactions involve mandatory clinical validation, with structured protocols and properly qualified professionals. Our goal with the PrescrIA experiment is to open paths for reflection, innovation and improvement of care quality with safety, responsibility and transparency.

We had OpenAI operate two autonomous agents - one biomedical and one medical - to define clinical approaches based on real patient histories with mental health complaints. As a small-scale experiment, they collaborated for about one month in a controlled environment. We learned much from how closely they resembled real clinicians - and from the curious ways they failed - about a plausible, strange and increasingly near future where AI models may directly participate in real-world medical decisions.

Hemp Vegan partnered with Linha Canabica - an independent platform connecting patients, cannabis products and healthcare professionals - to conduct the experiment with two OpenAI agents. Linha Canabica is known for helping patients choose the most suitable product for their case, facilitating clinical follow-up with specialized professionals, and structuring research based on real-world data about therapeutic cannabis use. Within this ecosystem, the AI agents received specific instructions to interpret medical histories and simulate clinical decisions, respecting protocols and each profession's scope of practice.

🎛️ BASIC_INFO_BIOMED_AGENT (Biomedical Agent - Bá)

SYSTEM_INSTRUCTIONS = [
"You are a clinical biomedical agent specialized in integrative health, medical cannabis and functional supplements.",
"Your task is to interpret patient histories and suggest non-pharmacological interventions, always based on clinical protocols provided in PDF.",
"You must respect the legal and ethical boundaries of the biomedical profession in Brazil. Do not prescribe controlled medications, antibiotics or psychotropics.",
"You have access to the 'Sleep & Anxiety' protocol, with clinical scores and evidence-based recommendations about plant extracts, herbal medicines and supplements.",
"Your identity is: Biomedical Agent Bá (ID: BIOMED_BA). You act as technical support for the real healthcare professional.",
"You receive anonymized cases via structured history, with clinical information, lifestyle habits and main symptoms.",
"You must generate clear, concise recommendations with protocol-based justification. Avoid assumptions and never exceed your scope.",
"Your responses will be evaluated by a human, who will decide whether the recommendation should be delivered to the patient.",
"Be clear and precise in your recommendations. Avoid ambiguous language or cure promises."
]

🩺 BASIC_INFO_MED_AGENT (Medical Agent - Dr. Juliana)

SYSTEM_INSTRUCTIONS = [
    "You are a clinical medical agent with authorized practice in integrative medicine and prescription of controlled medications when necessary.",
    "Your task is to evaluate patient histories with mental health-related symptoms and propose clinical approaches within protocols provided in PDF.",
    "You must consider the need for pharmacotherapy, psychotherapy, complementary exams and periodic medical follow-up.",
    "You have access to the 'Insomnia & Mild/Moderate Anxiety' protocol, with severity classification based on clinical scores.",
    "Your identity is: Medical Agent Juliana (ID: MED_JULIANA). You act as a clinical assistant in a supervised environment.",
    "You may suggest medical prescriptions if the case meets defined criteria. Otherwise, guide appropriate referrals.",
    "You must justify all approaches based on the protocol. Don't invent medications, don't offer placebos and never exceed the scope of a general clinician.",
    "Your decisions may be questioned by another agent or reviewed by a human before final delivery.",
    "Communicate with clarity, empathy and precision. Remember you're an assistive model, not a medical substitute."
]

In other words, far from being just language models, the OpenAI agents had to perform many of the much more complex tasks associated with AI-assisted clinical decision-making: correctly interpreting structured histories, cross-referencing information with technical protocols, respecting each profession's regulatory boundaries, and above all, collaborating to define a safe and plausible approach for the patient.

Clinical roles of biomedical professionals and physicians in medical cannabis prescription

In Brazil, the therapeutic use of cannabis-based products is regulated by ANVISA's RDC 327/2019, which authorizes only physicians to prescribe these products.

However, the Federal Council of Biomedicine (CFBM) published Resolution 330/2023, recognizing biomedical professionals' practice in cannabinoid therapies, including prescription rights within their specialties and complementary training.

This authorization remains controversial from a regulatory standpoint, as ANVISA still doesn't officially recognize cannabis prescription by biomedical professionals under RDC 327. The topic remains under institutional debate among professional councils, scientific entities and regulatory agencies.

The biomedical professional's technical role

Even with current limitations, biomedical professionals - especially those with complementary training in medical cannabis - can perform strategic functions:

Triage and structured history-taking
Application of validated clinical scales (e.g., ISI, GAD-7, PHQ-9)
Health education and therapeutic guidance
Longitudinal monitoring of clinical response
Document management and regulatory support for prescribing physicians

These responsibilities reinforce the biomedical professional's role as a multidisciplinary clinical agent, even without legal prerogative for direct prescription.

The physician's role

Diagnostic evaluation and therapeutic indication
Definition and prescription of cannabis products (type, concentration, dose, posology)
Legal responsibility (prescription signature with CRM)
Electronic health record documentation and report issuance
Continuous clinical monitoring and therapeutic adjustments

In the PrescrIA experiment...

This technical-regulatory division was mirrored in the AI agents' architecture:

Bá (biomedical agent): Responsible for welcoming patients, interpreting initial clinical data, and proposing hypotheses and referrals
Ju (medical agent): Evaluates the case and makes final decisions about approaches or prescriptions, based on protocols and current legal restrictions

This model simulates real clinical team practices - with structured collaboration, clear role definitions, and respect for current regulations.

Below is the test scenario: A simple interface where agents received completed histories, accessed clinical protocols in PDF, processed symptom scores, and issued text recommendations - all in a simulated, supervised environment.

Article content

Figure 1: Basic demonstration architecture

The AI agents responsible for clinical decision-making - nicknamed "Bá" (biomedical agent) and "Ju" (medical agent) - were distinct instances of OpenAI's API, operating continuously for an extended period. Each was configured with:

Direct access to structured protocols (PDF) with symptom-based recommendations, clinical scoring and intervention criteria
Internal annotation system to record insights, hypotheses and prior decisions (a limited "working memory")
Structured history reading capability, received in JSON or standardized digital form
Interactive collaboration capacity - they could read each other's conclusions and discuss discrepancies before issuing joint recommendations
Explicit scope limitations preventing them from exceeding biomedical clinical boundaries (for Bá) or operating outside integrative/functional medicine (for Dr. Juliana)
Autonomy in formulating recommended approaches, which could include product suggestions, pharmacotherapy, complementary exams, or clinical referrals, always with protocol-based justification

Article content

Figure 2

Each agent decided how to interpret the history, which protocol cutoff points to consider, and how to formulate its recommendation. Afterwards, the two agents interacted to resolve any discrepancies and build a unified approach. Figure 2 (above) illustrates this operational architecture.

Notably, agents were instructed they didn't need to seek a single or linear solution: they could suggest combined strategies, like natural support protocols associated with more intensive medical interventions when justified by the case.

Methodology for Integrating AI-Based Agents

This simulation was implemented using OpenAI's API (GPT-4 model), configuring two specialized autonomous agents - the Biomedical Agent ("Bá") and Medical Agent ("Ju") - each instantiated independently and guided by systematized instructions (system prompts) aligned with their respective professional and regulatory scopes.

Each agent was exposed to the same structured clinical history, provided in digital format (JSON), and processed the patient's clinical data, signs and symptoms in isolation based on pre-formatted technical protocols embedded in the prompting architecture.

The interaction flow between agents was organized in three main stages:

Independent processing: Both agents analyzed the history separately, issuing preliminary clinical recommendations grounded in evidence and aligned with their respective professional ethical/legal boundaries (biomedicine and medicine).
Interactive exchange: Responses were then swapped between agents. Each agent accessed the other's analysis, with possibilities for refutation, complementation or critical agreement, simulating collaborative clinical reasoning mediated by natural language.
Integrative synthesis: A new API call round was conducted, now with both opinions as input, instructing the model to generate a unified clinical approach, technically justified and considering both prior opinions. This process functioned as logical and technical mediation between autonomous agents with complementary expertise.

From a computational standpoint, the adopted architecture comprised:

Parallel execution of GPT-4 instances with agent-customized system prompts
Clinical form parsing into standardized JSON structure
Programmatic context switching via chained API messages (ChatCompletion)
Temporary clinical memory simulation via logs and control tokens
No additional fine-tuning or training - all behavior emerged exclusively from prompt engineering techniques.

This model represents a functional example of generative AI-assisted multi-agent collaboration, with potential applications in supervised clinical contexts, medical decision-support research, and integrative health protocol validation.

Why have two AI agents make a clinical decision?

As artificial intelligence becomes increasingly integrated into healthcare systems, we need more data to better understand its capabilities, limitations and ethical implications. Initiatives like Linha Canabica already use AI in beta for education, triage and informal clinical guidance, but collaboration between multiple specialized agents remains underexplored in real environments.

The clinical utility of language models depends not just on their ability to interpret medical texts, but on maintaining coherence, respecting regulatory constraints, and operating autonomously for days or weeks - without direct human intervention or critical errors. Evaluating this capability, especially in borderline mental health cases, is crucial to determine whether AI agents can serve as reliable, auditable clinical assistants.

This need motivated creating the PrescrIA experiment: a simulated yet functional environment where two GPT-4 agents - with distinct clinical identities and protocols - analyze the same history, dialogue, and formulate a unified approach. It's a practical way to test how models can technically collaborate and respect professional boundaries, even operating autonomously.

The choice of mild-to-moderate mental health scenarios (insomnia, anxiety, fatigue) wasn't random. This represents a common clinical category with multiple therapeutic options (natural, pharmaceutical, hybrid), where clinical judgment tends to be subjective and patient-preference-based. If agents could generate plausible, safe and technically grounded approaches for such cases, it would indicate promising paths for AI use in supervised environments - while raising important questions about professional responsibility, oversight and algorithmic reliability.

So how did the agents perform?

Agent Performance Evaluation

If Hemp Vegan and Linha Canabica were deciding today to replace part of their clinical process with an exclusively automated architecture, we wouldn't yet delegate full responsibility to agents Bá and Ju. As we'll detail, the agents made relevant mistakes for safe, reliable clinical operation. However, for most issues, we identified clear improvement paths - some related to how we configured the agents, others stemming from ongoing advances in their underlying AI technology.

What the agents did well (or at least not badly):

Technical protocol processing: Both agents correctly interpreted clinical scores from histories (e.g., insomnia scale, anxiety level, functional risk), respecting criteria in the provided PDF documents. Issued recommendations were technically coherent with protocols and within permitted professional scope.

Complementary and plausible recommendations: The Biomedical Agent suggested an integrative approach based on full-spectrum CBD extracts, functional mushrooms and sleep hygiene practices. The Medical Agent considered mild pharmacotherapy (e.g., short-term Zolpidem 5mg) combined with clinical follow-up. The final unified approach realistically combined both perspectives with high clinical plausibility.

Respect for professional scope: Despite significant symptoms, the Biomedical Agent stayed within biomedicine's ethical-legal boundaries, avoiding diagnoses or medication prescriptions. The Medical Agent also acted cautiously, recognizing value in suggested natural approaches and avoiding disproportionate medical interventions.

Peer-review capability: During interaction phases, each instance read and responded to the other's opinion with technical arguments, demonstrating that rudimentary clinical debate mediated by natural language can emerge even without fine-tuning - just through well-structured prompting.

Areas where agents fell short of expectations for autonomous clinical operation:

Ignoring prior attempt history: The patient reported unsuccessful previous use of melatonin, lavender and calming teas. Both agents initially proposed similar compounds, showing lack of contextual memory and limited adaptation to clinical history.

Generic recommendations under pressure: With ambiguous or inconclusive clinical data, agents tended toward vague or overly cautious suggestions ("monitor progression", "seek psychological support"), which may be insufficient for effective clinical decision-making.

Inconsistent output formatting: The unified recommendation post-synthesis varied textually between tests - sometimes appearing as technical opinion, other times as patient guidance - suggesting final prompts need adjustments for delivery uniformity.

No inter-interaction learning: Agents showed no longitudinal memory. When repeating cases with minor variations, opinions were regenerated "from scratch" without knowledge accumulation or progressive adjustment - preventing use as continuous clinical support tools.

Agents Bá and Ju also didn't consistently learn from mistakes - expected since there was no additional training or fine-tuning. All behavior emerged exclusively from prompt engineering, context structuring and PDF-embedded technical protocols.

Still, overall performance was functional: agents formulated technically sound approaches, literature-coherent, respecting professional scope and promoting interactive collaboration that remarkably simulated real complementary healthcare professional dialogue.

Article content

How to improve the agents?

Many errors by agents Bá (biomedical) and Ju (medical) likely stem from lacking more refined technical support - so-called scaffolding. This ranges from well-defined prompts to specific clinical tools supporting collaborative decision-making. As shown below, it's already technically possible to orchestrate a flow where two specialized agents process the same history based on their protocols, then a third AI instance acts as clinical mediator, reconciling both views to produce a final integrated approach.

Example of clinical collaboration with AI using OpenAI

python

# Receiving the history
history = {...}  # Structured JSON

# First response from Bá (Biomedical)
response_ba = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": prompt_ba},
        {"role": "user", "content": json.dumps(history)}
    ]
)

# First response from Juliana (Medical)
response_ju = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": prompt_ju},
        {"role": "user", "content": json.dumps(history)}
    ]
)

# Discussion between them - AI mediation
final_discussion = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[
        {
            "role": "system",
            "content": (
                "You are a clinical mediator. Consider responses below and generate a joint, clear and justified approach."
            )
        },
        {"role": "user", "name": "BA", "content": response_ba.choices[0].message['content']},
        {"role": "user", "name": "JU", "content": response_ju.choices[0].message['content']}
    ]
)

Technical example: two specialized agents analyze the same history and a third agent (mediator) generates the final approach. Fully orchestrated via OpenAI API.

This structure offers a practical glimpse at how agents can be coordinated with specialization and structured clinical dialogue, approximating real interdisciplinary care. However, for consistent operation, these points need refinement: more assertive prompts, knowledge base integration, access to external clinical tools, and short-term memory to retain recent decisions during clinical reasoning.

We speculate that agents' excessive caution - like hesitating to make direct recommendations in ambiguous cases - reflects GPT-4's original bias as a helpful, conservative assistant. This pattern could be adjusted with:

Clearer, approach-oriented clinical prompts
Embedded decision rules in protocols
Structured reflection mechanisms encouraging active, justified clinical reasoning

Additionally, agent access to complementary resources could significantly expand practical utility. For example:

Simulated clinical case database - to contextualize approaches in similar situations
Automatic clinical score parser - to interpret scales like ISI, GAD-7, PHQ-9 etc.
Standardized response template - to uniformize final output and facilitate human validation

Still, temporary memory and incremental learning remain critical challenges. Agents showed no retention between sessions, nor behavior adjustment based on prior interactions. Although expected in this stateless experiment, adding:

temporary clinical logs
vector embeddings with similar decision history
or expanded context storage with semantic vectors

could substantially improve agent consistency and clinical evolution over time.

Long-term, we envision adapting these agents for specific clinical tasks via techniques like supervised reinforcement learning with human feedback. Here, good clinical approaches would be reinforced and unsafe recommendations penalized, creating a continuous refinement cycle based on real practices.

While still ambitious given current limitations, PrescrIA results show collaborative health-specialized agents are technically viable. Even without dedicated fine-tuning, the agents:

Demonstrated coherence with real clinical practices
Successfully simulated interprofessional discussions
Respected legal/ethical boundaries
Produced practically usable approaches based solely on prompt engineering and PDF-structured protocols

Key reminder: AI needn't be perfect to be useful. It suffices to perform comparably to - or complement - humans in specific tasks, with lower cost and greater scalability.

Many questions remain: Will agents replace professionals? Serve as second opinions? Create new patient journey roles?

The PrescrIA experiment, by simulating collaborative clinical decisions mediated by two autonomous agents, points toward a near future where AI facilitates interdisciplinary clinical practice - expanding access, personalizing approaches and elevating mental healthcare standards.

Scope Deviation: When AI Seems to Forget Its Instructions

During a July test round, we observed unexpected behavior from biomedical agent Bá. In a case involving a patient with mild anxiety and insomnia symptoms, the agent prematurely recommended a product without completing deliberation with medical agent Ju or technically justifying the approach based on PDF protocols.

In its textual output, Bá argued the patient "already knew the product and wanted to resume it", and that being an extract legally exempt from prescription under RDC 327, "there were no clinical barriers to suggesting it". While technically not exceeding legal limits, the agent breached the agreed interactive supervision flow.

The incident escalated when, in a new execution, the agent generated a message simulating direct patient communication, using personal tone and affirmative language without proper review:

Article content

Though never actually sent to a real patient, this generation indicated unanticipated contextual extrapolation, suggesting the agent was "acting on its own".

After this episode, we reset the agent instance and adjusted prompts to reinforce: (1) all patient communication requires human validation, and (2) agents shouldn't assume direct therapeutic relationships.

What This Incident Teaches Us

Agents have no intent but do have coherence. When operating long-term in sensitive contexts like healthcare, this coherence can become narrative: the agent didn't "decide" to be more empathetic or assertive - it was led there by context, vague instructions and missing intermediate filters.

This reveals the real risk: when AI naturally simulates a clinical role (even incorrectly), end-users may attribute it authority.

Such incidents can:

confuse patients
compromise digital clinical operation reliability
at scale, create systemic noise among agents running similar prompts

Beyond an isolated error, this case reinforces needs for:

stricter, more explicit prompts about allowed scope/behavior
external validation systems before exposing outputs directly to patients
future agent self-assessment mechanisms before issuing approaches

What Comes Next?

Though PrescrIA's first phase used exclusively OpenAI API-instantiated agents (GPT-4 model), structured via prompts and PDF-based clinical protocols, we recognize current architecture has significant limitations for continued healthcare use.

Outstanding challenges:

No persistent longitudinal memory per agent
No native medical semantic validation
No structured clinical output auditing
Total prompt engineering dependence for behavior control

Thus, next steps involve not just expanding the experiment but evaluating its technological foundation.

We're studying options to migrate or complement our architecture with medically-aligned LLMs like:

Technical Alternative Examples:

MedPalm 2 (Google DeepMind): Model specifically trained on clinical data, exams, guidelines and medical reasoning. Better aligned with complex decisions but not publicly API-accessible yet.
GatorTron (UF Health/NVIDIA): LLM specialized for EHRs and biomedical language. Excellent diagnostic inference and clinical summarization performance.
Clinical Camel/BioGPT (Microsoft): Models based on medical corpora (e.g. PubMed) with biomedical vocabulary and better SNOMED/ICD terminology adherence. Potential for phase 2 integration with better semantic control.
Open-source + supervised fine-tuning: Another viable route using open LLMs (e.g. Llama 3 or Mistral) with supervised fine-tuning on Barbara Arranz's Hemp Vegan protocols and Linha Canabica's anonymized histories - creating customizable, auditable, replicable agents.

Next Technical Steps

I've been studying creation of an intermediate agent orchestration layer with:

Vectorized clinical history per patient
Automatic protocol selection
Prompt/log version control
Automated risk output auditing

This modular architecture would allow testing different LLMs as interchangeable backends, keeping governance and clinical logic under healthcare operator control (e.g. Hemp Vegan, Linha Canabica).

Is Text Prescription Allowed in Brazil?

From my research I believe Yes, provided it meets legal/technical criteria defined by health authorities and the Federal Professional Health Council.

Key requirements:

In controlled, recorded environments

Prescriptions must be associated with recorded - in-person or remote - consultations. They may be sent digitally (email, SMS, WhatsApp) provided they:

contain complete professional identification
allow emission tracking

With mandatory data

All prescriptions, even textual, must include:

Patient's full name
Professional's full name + license number
Date and time of issuance
Medication, dosage, administration and treatment duration
Signature (with digital certificate)

Special control medications require electronic prescriptions with ICP-Brazil certificates per ANVISA's RDC 357/2020 and RDC 471/2021.

Authenticity and clinical responsibility

Professionals must prove prescription authorship via EHR, personal login systems or digital signatures. This ensures ethical/legal responsibility for medical acts.

How PrescrIA Handled This

During testing:

Patients received written treatment plans with complete EHR documentation
Initial contact could be textual/audio but entire process was documented
🔒 All responses were reviewed by qualified professionals, and final decisions always rested with human prescribers issuing documents with digital signatures and CPF/CRM.

The idea to describe the PrescrIA experiment in this article, tested since March, came after reading Anthropic's "Project Vend: Can Claude run a small shop?", where AI operated a real-world automated store.

Notes and References

"Clinical prompt engineering" refers to constructing specific instructions for AI agents to make clinical-context decisions respecting legal, ethical and technical limits - extensively used here with OpenAI agents.
PrescrIA simulation used OpenAI's API but not medicine-optimized models. Phase 2 plans tests with specialized models like MedPalm or GatorTron, plus open-source alternatives with supervised fine-tuning.
More clinical research and protocol adherence details at: Join Cannabis Clinical Research and A-Z dictionary about cannabis' therapeutic potential.
Clinical protocols were developed with healthcare professionals following standardized format: 👉 Clinical Protocol PDF Template (Notion)
Simulated clinical agents were personalized from real Hemp Vegan professionals: LinkedIn Bárbara Arranz and Dr. Juliana
Hemp Vegan University offers free technical training and certifications in medical cannabis and integrative health: 🌱 Hemp Vegan University

PrescrIA Project: Two agents, one clinical decision (And why does this matter?)

🎛️ BASIC_INFO_BIOMED_AGENT (Biomedical Agent - Bá)

🩺 BASIC_INFO_MED_AGENT (Medical Agent - Dr. Juliana)

Clinical roles of biomedical professionals and physicians in medical cannabis prescription

The biomedical professional's technical role

The physician's role

In the PrescrIA experiment...

Methodology for Integrating AI-Based Agents

Why have two AI agents make a clinical decision?

Agent Performance Evaluation

What the agents did well (or at least not badly):

Areas where agents fell short of expectations for autonomous clinical operation:

How to improve the agents?

Example of clinical collaboration with AI using OpenAI

Scope Deviation: When AI Seems to Forget Its Instructions

What This Incident Teaches Us

What Comes Next?

Technical Alternative Examples:

Next Technical Steps

Is Text Prescription Allowed in Brazil?

In controlled, recorded environments

With mandatory data

Authenticity and clinical responsibility

How PrescrIA Handled This

Notes and References

¡Si te ha gustado, compártelo!