A board of directors plays a critical role in shaping a company’s strategy, maintaining relationships with shareholders, and safeguarding the company’s reputation. Appointing a new director may bring welcome...
Chloe Silvester , Head of General Practice, Practical Guidance Stephen Tuck , Legal Writer, Practical Guidance Personal Injury Victoria Ben Newling , Legal Writer, Practical Guidance Personal Injury NSW...
Jennifer Raphael , Senior Legal Writer, Practical Guidance Construction, LexisNexis ® In 2024, several pivotal decisions were made across New South Wales, Victoria, and Queensland concerning Security...
Jennifer Raphael , Senior Legal Writer, Practical Guidance Construction, LexisNexis ® In the ever-evolving landscape of construction law, 2025 promises to be a pivotal year for legal practitioners...
Jada Lam , Practical Guidance Legal Writer – Employment and WHS The Fair Work Act 2009 has been updated with the 'Employee Choice Pathway,' offering new rights for casual employees. Read on for essential...
In this article from the December 2022 edition of the Privacy Law Bulletin expert author Dr Jonathon Cohen discusses Australia's rapidly changing privacy law framework. He also argues that lawyers and their clients would benefit from more clarity around the extent to which amendments to the Privacy Act might impact corporate governance requirements.
Subscribers can access the full bulletin HERE.
It covers three main questions. What are the key privacy risks that are associated with artificial intelligence and machine learning models? What is privacy leakage, and what are the implications of the concept? And what does the future of consumer control over data hold, and what needs to change to shape it?
Artificial intelligence and machine learning models introduce novel privacy risks, including leakage of personal information, and challenges in removing customers’ data from a model. We provide an overview for lawyers of how artificial intelligence and machine learning models are constructed and used. We also set out what privacy issues arise and how these characteristics may relate to proposed changes to the Privacy Act 1988 (Cth).
Introduction and context
The Australian Government’s Discussion Paper released as part of the government’s review of the Privacy Act puts forward detailed reform options in its aim to “ensure privacy settings empower consumers, protect their data and best serve the Australian economy”.
This article discusses how and why the proposed reforms may impact organisations that use artificial intelligence and machine learning algorithms and models. We provide an overview of how these algorithms are constructed and used, where privacy issues arise and how these characteristics may relate to proposed changes within the discussion paper. It is written from the perspective of a data science expert, with a focus on subtle technical issues that have a particular bearing on privacy and that may not be widely recognised outside of the artificial intelligence and machine learning research community.
Artificial intelligence and machine learning models
“Artificial Intelligence” refers to the ability of computers to perform tasks that we would normally associate with human intelligence. The current main approach to achieving this is to create systems based on a class of algorithms known as “machine learning”. These infer patterns of behaviour from data and differ from the previous generation of systems, which relied on painstaking construction of deductive rules and hand encoding of human knowledge. The acceleration of computational power in recent years has allowed machine learning approaches to make rapid strides in domains such as understanding language and recognising images.
In practical terms within commercial and government contexts, “artificial intelligence” and “machine learning” are often used interchangeably. Both terms are used to refer to a group of related algorithmic techniques that identify patterns in historical data that can be generalised to current and future populations and systems.
Most often, the objective is to create a model that uses known information about an individual to infer a quantity of interest. For example, a model may seek to predict the recidivism risk of a defendant using data from historical cases where a defendant did or did not re-offend, and the characteristics of the defendant in each case, such as the type of crime and the defendant’s demographics and criminal history. The data used in constructing the model is known as the training data.
Models such as these are pervasive across modern business and government. They predict how likely you are to purchase an item at a particular price, the likelihood of an insurance claim being fraudulent, and estimate the quantum of future social welfare benefits that an individual may receive over their lifetime.
Models are used for many purposes. They may be used directly in the operations of an organisation, such as recommending grocery items during online checkout, triaging injured people to the most appropriate model of care, or providing input to sentencing decisions in a courtroom. Or they may be used to support internal decision-making, such as informing government policy or organisational strategy.
Privacy questions relating to models
Privacy questions occur at several stages in the construction, storage and use of models. Specifically:
Of the proposed changes in the Discussion Paper, there are two areas with particular relevance for models and associated algorithms:
Amending the definition of personal information
The Discussion Paper proposes to amend the definition of personal information to make it clear that it includes inferred personal information, which it defines to be “information collected from a number of sources which reveals something new about an individual”. This cuts to the core of the models’ primary purpose of inferring new information based on available data.
If adopted, it is plausible that most inferences produced by models would fall under this revised definition of personal information. This would place additional governance requirements on organisations that currently apply different levels of risk control with different types of data, depending on the associated privacy risks.
One question that arises is the extent to which models themselves (rather than inferences made from the models) would be subject to additional governance requirements following changes to include inferred information in privacy definitions — for instance, due to providing the functionality for producing inferred information.
The concept of privacy leakage
More subtly, the models themselves may continue to carry personal information from the training datasets, a phenomenon known as “privacy leakage”. In this scenario, a user might be able to recover information from the training data given only access to the model and limited information about an individual of interest.
Representative examples of privacy leakage include the following:
Protecting against the sorts of privacy leakages described above is surprisingly challenging. For instance, research indicates that memorisation by a model of its training data can occur and sophisticated approaches to reduce this direct, but unintended, risk are often ineffective.
One alternative is a “differential privacy” approach that adds noise to the training data to ensure strong privacy protection. However, this approach requires a trade-off: “high-noise” models typically increase privacy protections but may reduce model utility to the point of futility; “low-noise” models may improve model utility but fail to substantially reduce the risk of privacy leakage. The drug dosage prediction model research discussed above, also demonstrated how differential privacy techniques to protect genetic privacy had substantially interfered with the main purpose of this model, increasing the risk of negative patient outcomes such as strokes, bleeding events and mortality beyond acceptable levels.
These observations indicate that privacy leakage is likely to be a risk in many models, particularly as the use of complex machine learning methods such as deep learning models with up to billions of internal parameters continues to grow.
Providing more control to consumers in how their data is used
The Discussion Paper lists several proposed changes that would provide consumers with increased control over their personal information. These include:
It is likely that these changes would require organisations to develop procedures for removing customers’ information from models, or risk penalties. For example, in March 2022 the United States Free Trade Commission ordered WW International (formerly known as Weight Watchers) to destroy models and algorithms built using personal information from children as young as eight, which had been collected without parental consent.
Removing a customer or group of customers’ data from a model would typically require rebuilding the model on new set of training data excluding the relevant customers’ data. There are many practical challenges with this, including the following:
These overheads and considerations mean that organisations will likely require time to adjust to any new privacy-related requirements. They may need to rebuild their relevant infrastructure, including modelling processes and data systems, so that they appropriately capture, categorise and apply individual customer permissions with sufficient efficiency.
Conclusion
Artificial intelligence and machine learning models introduce novel privacy risks, including leakage of personal information, and challenges in removing customers’ data from a model.
Lawyers and their clients alike would benefit from clarity around the extent to which amendments to the Privacy Act impact on governance requirements, and sufficient time to review and amend their processes to meet those requirements.