INLG 2025 Tutorial: Large Language Models in Social Science: Methods, Applications, and Ethics

Half day tutorial on Oct 29 (afternoon)

This tutorial/workshop introduces Large Language Models (LLMs) in social sciences, offering hands-on experience (no coding or advanced maths needed) and critical discussion around methodological opportunities, limitations, and ethical concerns.

Organizers

Dr. Sree Ganesh Thottempudi
Centre for Augmented Intelligence and Data Science (CAIDS), UNISA
Prof. Dr. Ernest Mnkandla
Centre for Augmented Intelligence and Data Science (CAIDS), UNISA

Target Audience

Social scientists (faculty, researchers, graduate students) from disciplines such as political science, sociology, anthropology, communication, economics, and related fields. No prior programming experience required, but familiarity with social science research methods is assumed.

Workshop Overview

LLMs might be just the tool you need! This workshop introduces LLMs in social sciences, offering hands-on experience—no coding or advanced maths needed.

One can explore Natural Language Processing (NLP) and LLMs, with real-world applications like text classification, topic modeling, and text generation. We’ll use Python and Google Colab, even if you're new to programming—we'll guide you all the way.

The focus is on practical skills for your research. Need to analyze large, diverse texts like social media, interviews, or news? Want to detect trends, automate coding, or generate survey responses? LLMs are transforming text analysis in social sciences—and this is just the start.

You'll explore the basics of NLP and LLMs, diving into real-world applications like text classification, topic modeling, and text generation. We’ll be using Python and Google Colab, but don’t worry if you have no prior programming experience—we’ve got you covered every step of the way.

Recent advancements in LLMs—such as OpenAI’s GPT, Google’s Gemini, and Meta’s LLaMA—are transforming how social science research can be conducted. These models provide new tools for data collection, analysis, and theory-building, enabling researchers to work with text, language, and human behavior in innovative ways.

Our focus is on equipping you with practical skills that you can directly apply to your research. Need to analyze a large body of diverse literature, bulk social media discussions, interview responses, or news articles? Want to detect emerging trends, automate qualitative coding in interviews, or generate synthetic survey responses to test hypotheses? LLMs are revolutionising how we understand and process text-based data in the social sciences—and this is just the beginning.

This workshop introduces participants to the practical and theoretical uses of LLMs in social science research. It combines hands-on sessions with critical discussion around methodological opportunities, limitations, and ethical concerns.

Workshop Objectives

Participants will:

  1. Grasp the fundamental functions and limitations of LLMs.
  2. Explore applications of LLMs, including:

    • Text generation and summarization
    • Sentiment and discourse analysis
    • Simulating human subjects and interviews
    • Coding and classification in qualitative research
  3. Receive practical experience with open-source LLM tools and APIs such as OpenAI and Hugging Face.

  4. Critically assess the validity, biases, and ethical considerations of employing LLMs in social science research.

Tentative Agenda

Part 1: Foundations of Python and Intro to LLMs

  • Python coding and Google Colab
  • Foundational models at Hugging Face and proprietary models (OpenAI, DeepSeek, Gemini)
  • LLM workflow: data collection, prep, modelling, evaluation, improvement
  • Case: Analysing text dataset from loading to tasks like sentiment analysis, classification, summarisation, QA

Part 2: LLMs in Social Sciences

  • How LLMs work, use in social science research, and evaluating results
  • Word embeddings and Sentence transformers for social science
  • Limitations of pre-trained models, ways to improve results
  • Ethics, data, training, and use considerations
  • Case: data loading, tokenisation, embeddings, vector databases, inference, cost-cutting, results evaluation, insights

Part 3: Improving Results with LLM

  • Prompt Engineering for effective prompts
  • Fine-tuning and parameter-efficient tuning
  • Retrieval-Augmented Generation (RAG)
  • Reinforcement Learning from Human Feedback (RLHF)
  • Case: Customizing LLMs for domain-specific content, comparing approaches, and ethical considerations
  • Final: workshop, project feedback, resources, future collaborations

Learning Outcomes

By the end of the workshop, participants will be able to:

  • Identify appropriate LLM tools for their research questions
  • Construct meaningful prompts for data analysis or generation
  • Critically assess LLM outputs in light of social science standards
  • Navigate emerging ethical and methodological frameworks

Technical Requirements

  • Laptop with internet access
  • Access to an LLM platform (e.g., OpenAI, Claude, Hugging Face)
  • Optional: Jupyter notebooks or Google Colab (for hands-on session)

Additional Information

  • Supplementary readings and tutorials will be provided
  • Optional follow-up consultation for research design involving LLMs

Recommended Readings

HuggingFace official Getting Started Guide

https://huggingface.co/learn/

Tunstall, L., Von Werra, L., & Wolf, T. (2022). Natural language processing with transformers. " O'Reilly Media, Inc.".

https://learning.oreilly.com/library/view/natural-language-processing/9781098136789/