Like many people in the Higher Education sector, I’ve been watching Artificial Intelligence (AI) develop with interest and concern. AI can help significantly with the research process, but new complications come alongside those benefits. Some of the less talked about issues are copyright related: we need to consider what we choose to input into AI as well as the output.

It’s important to remember that when inputting information into a chatbot, that data can often be used to “train” the AI. ChatGPT explicitly states that one of the three main sources for the language model is information provided by human trainers or users. At the moment under UK copyright law, copyrighted material inputted into an AI dataset would count as a “copy” being made of that work and be in breach of copyright law. The recent government consultations on AI have talked about changing this, so data mining will be permitted to harvest copyright protected content. However, this is not presently codified in law as an exception. As such, we need to continue under the assumption that any material which remains in copyright cannot be added to an AI dataset. To take an example, you could ask the AI to summarise a piece of text for you. Depending on if the text was covered under copyright, inputting that could be an issue under copyright.

So when inputting data into an AI, make sure it’s not copyrighted data. That includes data that you intend to exist within copyright: if you’re publishing a journal article and input the data into an AI bot, that would count as a copy. For instance, one standard prompt engineering technique is using the AI to analyse a piece of writing. If you were to use the AI to analyse a piece of your writing, then seek publication in a journal, the publishers might consider copies stored within an AI training set as going against any embargoes they have in place.

You also need to consider if any data you’re inputting into the AI is sensitive or personal. This was an issue recently in Italy where ChatGPT was temporarily suspended as the Italian government felt that the AI training data was making use of personal data which its users hadn’t fully knowingly agreed to share. To take a more university-focused example, inputting the results of a research study into an AI might allow us to process the data more quickly and look for trends...but we also have no control over where that data goes after we’re finished using the AI.

On a completely different tack, AI can also create data out of nowhere: this is known as “hallucinating”. So it can recommend journal articles or books which don’t exist; espouse an academic’s views or a quotation which were never said. As a result, it’s worth never relying on an AI for reliable facts. It’s better to consider it a starting point for research.

So what can we do about this as researchers who want to take advantage of the new technology? I don’t think stopping using AI is an option. Like it or not, this technology is here to stay and has some fantastic capabilities which can help us. We just have to think carefully about what uses we’re putting AI to and what happens to any data we choose to share.

A few top tips:

  • First, always read the Terms and Conditions of the AI software you’re using. It will give you a clearer idea of what the AI is doing with your data and any privacy implications for your work.
  • Second, it’s best to work on the assumption that any data is being stored as part of their training base and act accordingly. Don’t input sensitive data. Check publishers’ views on AI having had access to a version of your manuscript.
  • Thirdly, when using AI for research purposes, make sure you cross-check any data it inputs out to make sure it’s correct and not a “hallucination” from its dataset.

Finally, if you’re unsure, don’t be afraid to come to the Library for advice! AI is a rapidly changing field and we want to help you keep abreast of any changes which will impact your work. FAQs are available with more advice and the Academic Support Team are the best initial port of call for any AI queries. For more general worries about AI and academic misconduct, take a look at our Academic Integrity web pages. If you’re more concerned with AI and copyright, please contact the Copyright Clearance Team at digitisation@leedsbeckett.ac.uk

More from the blog

All blogs