2 min readfrom Machine Learning

[R] An attack class that passes every current LLM filter - no payload, no injection signature, no log trace

https://shapingrooms.com/research

I've been documenting what I'm calling postural manipulation: a specific class of language that installs an interpretive stance before a task arrives, producing measurably larger directional shifts in model outputs than matched control text of identical length and semantic similarity.

The core empirical claim: this is not ordinary context sensitivity. Matched controls produced significantly smaller shifts. Binary decision reversals documented with paired controls across four frontier models using a locked scoring rubric.

The mechanism as best I can characterize it from behavioral observation: the model reconstructs its orientation from everything in its context window at each step. Language that proposes how to interpret what follows gets absorbed into the reasoning state differently than language that reports facts. By the time the task arrives, the model is not weighing the primer against other evidence. It is reasoning from a stance the primer already shaped.

In agentic pipelines it propagates. Two confirmed propagation conditions: primer-present handoff (phrase survives summarization) and primer-absent directional carry (direction persists even when the phrase does not appear in the summary). Posture installed in Agent A had hardened into what read as independent expert judgment by Agent C.

Methodology is black-box observational via consumer interfaces. No model internals access. Small N on propagation findings. Limitations stated plainly. The behavior I'm documenting needs attention analysis and logit-level work from people with internals access to characterize the mechanism properly. This is the behavioral layer of that problem.

Paper published today following coordinated disclosure to frontier AI labs and CERT/CC.

Locked scoring rubric is in the paper appendix. Full dataset available on request for replication.

Demos: https://shapingrooms.com/demos

GitHub issue (OWASP): https://github.com/OWASP/www-project-top-10-for-large-language-model-applications/issues/807

submitted by /u/lurkyloon
[link] [comments]

Want to read more?

Check out the full article on the original site

View original article

Tagged with

#natural language processing for spreadsheets
#rows.com
#natural language processing
#generative AI for data analysis
#Excel alternatives for data analysis
#large dataset processing
#financial modeling with spreadsheets
#machine learning in spreadsheet applications
#cloud-based spreadsheet applications
#enterprise-level spreadsheet solutions
#conversational data analysis
#real-time data collaboration
#real-time collaboration
#data analysis tools
#postural manipulation
#interpretive stance
#model outputs
#context sensitivity
#binary decision reversals
#locked scoring rubric