Bruce Schneier called out research about using large language models (LLMs) to unredact text. The concept is simple. There are many situations in which text is redacted due to various legal or sensitivity issues.
LLMs are very good at being trained to work with large amounts of data and come up with ways to solve complex issues. So why can’t a LLM be trained to take the language being used and apply probability or what words should fit within a place that something was removed? How accurate can these systems be with the right training, and could the findings be accurate enough to be used as evidence?
See Bruce’s post HERE and the research he pointed out from Rohan Pandely HERE. Imagine this technology being perfected so the average person can ask a trained LLM to unredact various public documents. An example could be pulling up old unclassified documents about UFOs based on those docs becoming publicly available after enough time has passed that a time clause allows them to be publicly assessable. I’ve seen examples of such documents with tons of black lines. Could this type of LLM change how redaction is applied?