Skip to Main Content

Research with Generative AI: Overview

7 tips for Generative AI

  1. There’s more than ChatGPT: Many more GenAI tools exist, including ones designed for academic research.
  2. Probability, not accuracy: GenAI predicts probably responses. It doesn’t evaluate for credibility or accuracy.
  3. Prompts matter: Quality of the response directly relates to your prompt. Be specific and provide context.
  4. Not inherently neutral: GenAI is only as good as the data used to train it, which includes biases and omissions.
  5. No shortcuts to skills: Overreliance on GenAI can limit your own critical thinking and learning.
  6. Expect fakes and hallucinations: Deep fakes are easy to make. Cross-check answers with credible sources. Go beyond GenAI.
  7. Ethical Concerns: Consider environmental, privacy and labor impacts before deciding how to use GenAI.

Things to Consider

Lack of transparency and bias with datasets

Large language models (LLMs) are trained on a wide variety of datasets and aren't always transparent on which datasets are included and excluded. As a researcher, it is important to continuously critique the quality of generated content for bias and inclusivity. 

Fake citations

GenAI can combine results from its existing datasets into citations that don't actually exist. This is called a hallucination. As a researcher, check generated citations to ensure credibility and correctness before sharing or using.  

Plagiarism 

Using information generated from LLMs without stating so is plagiarizing. Since LLMs aren't always transparent, researchers must be careful not to take someone else's work without providing proper credit and acknowledgment. 

Ethical Concerns

Environmental Cost

Generative AI is a heavy consumer of water and energy. Data centers, which house servers to run generative AI technology, use fresh water and rely on community power grids to keep everything running. Data centers have also been found to release large amounts of CO2 into the atmosphere. Texas currently has over 400 data center facilities in five regions with 90 more under construction.  

Houston Advanced Research Center. (2025). Powering Texas’ digital economy: Data centers and the future of the grid [Policy brief]. https://harcresearch.org/research/powering-texas-digital-economy-data-centers-and-the-future-of-the-grid/ 

Privacy

Information you input into generative AI tools becomes the property of the platform and can be used for LLMs, training, or something else. AI companies aren't required to be forthcoming about how they use the data gathered through their tools, so there is no definitive answer about how inputs are used or if they're sold to or shared with third parties. Never input personally identifiable information or original research ideas into any AI platform.

Labor Impacts

AI companies require human labor for sorting through discriminatory or illegal data that is used to inform LLMs. This human labor is often outsourced to other countries or assigned to people who are paid low wages and are not offered benefits. In 2023, Open AI, the company that owns ChatGPT, used a Kenyan firm to filter through content for between $1.32 and $2 a day. The content was so extreme that many employees reported trauma, cuasing the firm to end their contract with OpenAI eight months earlier than scheduled. 

Perrigo, B. (2023, Jan.). OpenAI used Kenyan workers on less than $2 per hour to make ChatGPT less toxic. Time. https://time.com/6247678/openai-chatgpt-kenya-workers/ 

What is Generative AI?

Generative AI (GenAI) is a type of artificial intelligence that produces content such as text, images, or music. GenAI does not actually understand the content it produces. Instead, it makes predictions about the relationships between words, images, and sounds. 

GenAI is trained on datasets, called large language models, that allow it to know how grammar, vocabulary, and style contribute to text. It mimics the language structures learned from the data to create coherent sentences.

Machine learning makes it possible for computers to learn from large datasets without being explicitly programmed to do so. This means that performance is continually improved through more data exposure.

Need More Help?