GenAI Risks and Benefits Blog

Feng et. al, source

This seminar focused on understanding the potential risks and benefits of advances in Generative Artificial Intelligence and Large Language Models. It was a research-focused seminar in which student groups would plan two week-long modules with required readings and discussion questions for which they would lead.

Topics Included:

  1. Introduction to Foundational Models
  2. Alignment (planned by my group)
  3. Prompting and Bias
  4. Capabilities of LLMs
  5. Hallucination
  6. Machine Translation
  7. Interpretability (planned by my group)
  8. GANs and DeepFakes
  9. Data Selection for LLMs
  10. Watermarking on Generative Models
  11. Multi-modal Models
  12. Economic Impacts of AI

Example Syllabus from Interpretability:

Monday 23 October


  1. Chaszczewicz highlights shared challenges in XAI development across different data types (i.e. image, textual, graph data) and explanation units (i.e. saliency, attention, graph-type explainers). What are some potential similarities or differences in addressing these issues?

  2. In cases where models produce accurate results but lack transparency, should the lack of explainability be a significant concern? How should organizations/developers balance the tradeoffs between explainability and accuracy?

  3. How can XAI tools could be used to improve adversarial attacks?

  4. In Attention is not not Explanation, the authors dispute a previous paper’s definition of explanation. Whose view do you find most convincing and why?

Wednesday 25 October

Required Readings


  1. (Softmax Linear Units) Elhage et al. present the Superposition Hypothesis which argues that networks attempt to learn more features than the number of neurons in the networks. By delegating multiple features to a single node, interpreting the significance of the node becomes challenging. Do you believe this hypothesis based upon their explanation, or do you suspect there is some separate obstacle here, such as the counter-argument that nodes could represent standalone features that are difficult to infer but often obvious once discovered?

  2. (Softmax Linear Units) Do you see any difference between SLU and ELU coupled with batch-norm/layer-norm? How does this relate to the reasons the LLM community shifted from ReLU (or variants like ELU) to GeLU?

  3. (Towards Monosemanticity) Could the identification of these “interpretable” features could enable training (via distillation, or other ways) smaller models that still preserve interpretability?

  4. (Towards Monosemanticity) Toying around with visualization seems to show a good identification of relevant positive tokens for concepts, but negative concepts do not seem to be very insightful. Try the explorer out for a few concepts and see if these observations align with what you see. What do you think might be happening here? Can it possibly be solved by changing the auto-encoder training pipeline, or possibly by involving structural changes like SLU? Are there other interesting elements or patterns you see?

Kasra Lekan
Kasra Lekan
Master’s Student

My research interests include natural language processing, human-AI interaction, and modelling complex systems.