CIS 3500

Introduction to Software Engineering
or how to JOYFULLY BUILD with a TEAM

Prof. Jérémie Lumbroso, Ph.D.

lumbroso@seas.upenn.edu

 

LECTURE 4:
Code Archeology

⚠️➡️ RIGHT NOW, go to https://sli.do

Code #23232323 

"Programming Archeology"

A scifi author imagines how a civilization in a thousand years might read our code written today

  • Unix Epoch (start): 00:00:00 UTC on January 1, 1970

  • Purpose: Acts as the reference point for time in Unix-like systems

  • Measurement: Time stored as the number of seconds elapsed since this epoch

  • Moon Landing: In July 20, 1969, Neil Armstrong and Buzz Aldrin are the first men to walk the moon (Apollo 11 mission)

Vernor Vinge

C.S. professor (died March 2024), author of many sci-fi novels, he popularized the notion of technological singularity.

  • The earliest ideas resembling the singularity date back to John von Neumann in the 1950s.
    • Stanisław Ulam, in 1958, recalled von Neumann discussing how technological progress was accelerating toward something incomprehensible.
  • I. J. Good (1965) introduced the concept of an “intelligence explosion,” suggesting that a sufficiently advanced AI could recursively improve itself, leading to superintelligence.
  • In 1983, Vinge explicitly used the term "singularity" in his essay "First Word" (Omni magazine).
  • His 1993 essay, "The Coming Technological Singularity," outlined the idea in a structured way, connecting AI, intelligence explosion, and runaway progress.

HW1

Diving Into A Codebase

HW1 Content

  • You are introduced to a static website generator called Hugo
    • This is the several generations after WordPress
    • Similar tools: Jekyll, Pelican, Gatsby, etc.
  • Static website generators create websites from content (in Markdown, a simplified document language) and themes (predefined)
  • You will first learn by doing 2-3 tutorial starter exercises, including a Hello World to understand how Hugo works
  • The main exercise is modifying an existing landing page

HW1 Hugo Theme's Example Site

Original

Student work

Student work

Student work

Student work

Student work

Sample HW1 From Spring 2024

Homework 1 – The "Tricycle" Version of Coding

  • Goal: Use Hugo (a static site generator) and an existing theme to create a fake landing page for a fake product.
  • This is a form of Code Archaeology—you're parachuting into an existing, somewhat unknown codebase (the Hugo theme).
  • You'll practice:
    1. Looking up the technology (Hugo basics).
    2. Doing easy tutorials or the official docs to see how everything works.
    3. Building up from an example or starter (the theme's exampleSite).
    4. Troubleshooting typical problems in static site compilation and theme usage.

Code Archaeology: Case Study Building A Hugo Website

"Programming Archaeology is about diving into an existing codebase—where everything's in place, but you don't know exactly how."

What's Special About Hugo and a Static Site?

  • It doesn't "run" like a normal app—instead it compiles Markdown + templates into static HTML/CSS/JS.
  • The theme (template) you use might require certain Node.js dependencies or an SCSS pipeline.
  • You can still break things: Typos in template partials, missing front-matter, or failing to install required dependencies can cause "compilation errors."

Why Hugo (and Why Now)?

  • Safe environment: "Tricycle version" of taking on unknown code. It's simpler than a large, multi-language codebase (like a big monolith in Java/Python).
  • No complicated runtime: The worst that happens is a failed build or a broken layout. No real "segfaults" or "service outages."
  • Fast feedback loop: hugo server can live-reload changes—perfect for exploring code step by step.
  • Realistic scenario: In the real world, new devs often join an existing project. You jump into a codebase where architecture isn't always clear.

Don't Worry!

  • Although we are starting with a static website generator, we will examine more difficult projects
  • You will explore a Chrome extension from past semesters and learn how to make changes
  • You will also do a code analysis of a Hackathon project
  • You will also write tests for an existing codebase
  • ... but let's start small!

Learning Goals

  1. Develop a Working Model: Figure out how Hugo organizes content vs. layouts.
  2. Configure an IDE or Editor to help navigate the code quickly (search across files, syntax highlighting for Hugo templates, etc.).
  3. Static & Dynamic Strategies:
    • Static: Searching for where certain CSS classes or partials are defined.
    • "Dynamic": Using hugo server to see live results of your changes.
  4. Troubleshooting: Identify typical pitfalls (missing dependencies, front-matter errors, node-sass issues, etc.).

Key Archaeological Principles

When diving into any codebase, especially for the first time, keep these principles in mind:

  1. You Won’t Understand It All

    • Accept upfront that no one truly understands every detail of a non-trivial codebase. Aim to build a mental map of the parts you do need to change or configure.
  2. Be Systematic

    • Don’t randomly click around or guess. Start with documentation (official or in the project’s README.md), then systematically investigate files and directories.
  3. Work Iteratively

    • Make one change at a time, see what happens. If something breaks, revert and re-check. This approach keeps you from being overwhelmed.
  4. Document as You Go

    • Take notes about what “partials” do, how _index.md differs from a typical content file, or how front matter influences build behavior.
    • Write down paths and filenames. It’s your personal “archaeologist’s field journal.”

How to Tackle This Hugo Codebase

We will see that the principle here apply to any codebase.

1. Leverage Previous Experience

  • If you've done front-end dev or theming before, your instincts help (HTML/CSS, placeholders, partials).

2. Consult Documentation

  • Hugo's docs: https://gohugo.io
  • The "Building With Hugo" book chapter.
  • Your theme's README and exampleSite.

3. Use an IDE / Search Tools

  • Tools like Visual Studio Code or grep or ag to find relevant pieces.
  • Example: "Where is the string ‘Hugo Bootstrap Theme' in the theme?" => can lead you to the partial for the features section.

4. Trial & Error

  • hugo server -D to see draft posts, watch for errors.
  • Modify your content or layout in small steps, see if it breaks or compiles.

The "Big Pile of Code" – Hugo Theme Edition

  • A Hugo website typicall includes:
    • config.toml or .yaml with site configs.
    • themes/hugo-bootstrap-theme/ (or whichever theme you chose).
    • Example content files, partials, SCSS, JS, images.
  • The code structure can feel overwhelming at first: "Where does the real layout come from?!"

Key Tip: Try to see how "content" maps to "layouts" by looking at the theme's partials and searching for references to them in your _index.md or config.toml.

"You Will Never Understand the Entire System!"

  • This is especially true if you peek into themes/hugo-bootstrap-theme and see hundreds of SCSS, partials, shortcodes, etc.
  • Your goal is not to fully master the entire theme. Instead:
    1. Focus on the entry points relevant to your landing page.
    2. Find the partials or pages that produce the "hero banner," "features list," "footer," etc.

Software is Massively Redundant

  • Often you can find a snippet of code or front matter in the exampleSite directory.
  • Copy/paste as a starting point—just rename or tweak images, text, etc.

"Code Must Run to Do Stuff"

  • In a static-site world, "run" means "compile or build successfully."
  • If something breaks, you get a build error in the console (like a snippet of Go template syntax or missing variable).
  • Keep an eye on your terminal output when you call hugo server.

Entry Points

  • Hugo's main "entry" is the hugo command, which reads config.toml and scans content/.
  • The theme's partials get called from the layout files.
  • If you get lost in a partial, search the code to see who references it.

Code Must Exist (But Where?)

  • Binaries or Node modules might be in node_modules/ if the theme uses them.
  • Actual site code is in your content/ folder and layouts/.
  • The theme's library code is in themes/hugo-bootstrap-theme/.

Probing Code Transparency

  • Transparent (local) vs. Translucent (precompiled theme code) vs. Opaque (some minified CSS or JS).
  • In your assignment, you mostly deal with transparent code because you have the theme source.

Strategy: How to Tackle Your Hugo Theme

  1. Start with the Official Docs

    • Hugo has thorough documentation and an active community.
    • Checking the docs can clarify how front matter or partials work, so you know what to search for in the code.
  2. Peek at the Project’s README.md

    • Some themes or code samples have mini guides. They might detail how to compile assets or reference special partials.
  3. Cloning, Building, and Running

    • git clone <the-hugo-theme> and see if there’s a package.json that requires npm install.
    • Try hugo server or npm run dev—does everything build? If not, read any build errors carefully.
    • Fix environment issues: Do you have the right Node.js version? The extended version of Hugo?
  4. Look for Where to Insert Content

    • Usually, _index.md drives the homepage content.
    • Layouts might be in themes/theme-name/layouts/. The _default/ folder often has baseof.html, single.html, list.html.
    • Hugo partials (like header.html, footer.html) are in partials/.
  5. Use Your IDE and Text Search

    • Searching for a relevant keyword (like nav, header, hero) across the code helps locate the partial that controls the top banner.
    • Searching for the string “title” may help you see how the site title is injected. This is especially useful if you suspect a naming or config discrepancy.
  6. Make Small Changes

    • Add a console log in the JS, or an extra <p> in a partial to confirm you’ve found the correct place.
    • Rebuild the site. If something breaks, revert quickly or review your commits.
  7. Review the “Front Matter” Requirements

    • Hugo uses front matter to define data (e.g., draft = true, title, tags).
    • If you need new fields (like landingPagePromoImage), you can add them to front matter and reference them in your layouts.
  8. Document Findings

    • You may want to keep a short wiki or notes doc: “To change the hero banner, edit themes/mytheme/layouts/partials/hero.html.”
    • This practice cements your learning and helps if you or your teammates revisit the code later.

Steps to Understand the New Hugo Codebase

  1. Look at the README (or theme docs).
  2. Clone or copy the exampleSite config into your root to see working defaults.
  3. Build & run with hugo server; confirm no errors.
  4. Search for text or images you want to replace.
  5. Edit content or partials in small increments, checking the browser each time.

A.2 – Experts Solve Simpler Problems First

A.2 – Experts Solve Simpler Problems First

from Experts Keep It Simple

"Experts do not try to think about everything at once. When faced with a complex problem, experts often solve a simpler problem first, one that addresses the same core issue in a more straightforward form. In doing so, they can generate candidate solutions that are incomplete, but provide insight for solving the more complex problem that they actually have to solve."

How Does This Apply?

  • Context: You're parachuting into a large Hugo theme. It can be overwhelming to see dozens of partials, layouts, SCSS files, etc.
  • Simplify: Instead of trying to overhaul everything at once, modify or customize just one small piece—for example, the header or a single partial.
  • Why It Helps:
    • You gain quick wins: See how the theme is structured.
    • You avoid confusion: Focusing on a small part clarifies how changes propagate, building confidence.

Extra Thoughts & Practical Tips

  • Tip 1: Start by editing text in one .md file, confirm your build, then proceed to more complex tasks (like altering SCSS or partial templates).

  • Tip 2: If you're stuck, define an even smaller subproblem—like changing just the site title—before tackling the entire homepage layout.

  • Recommended Tools: Markdown preview (VS Code), hugo server --verbose to see immediate feedback.

C.13 – Experts Prefer Solutions That They Know Work

C.13 – Experts Prefer Solutions That They Know Work

from Experts Borrow

"Experts have no desire to ‘reinvent the wheel’. If they have a solution that works, or know of one elsewhere, they will adopt that solution and move on to other parts of the design task. Of course, they know to re-assess the existing solution within the context of the current project, to make sure that it actually fits. As long as it does, and as long as it is legally and ethically allowed, they choose to borrow rather than build, reuse rather than re-implement, and copy rather than draft."

How Does This Apply?

  • Context: The Hugo theme's exampleSite folder likely has references, partials, and standard content structures. Reusing them can save time.
  • Practical Application:
    • If you see a "features" section that's close to what you need, copy and adapt that block for your own new section.
    • If there's existing CSS for a hero banner, don't re-implement; extend or tweak it.

Extra Thoughts & Practical Tips

  • Tip 1: Use grep (or ag) or your IDE's "Find All References" to see if the code snippet you're about to copy is widely used or specialized.
  • Tip 2: Check licenses or theme docs to ensure you can replicate that snippet. (Usually fine, but good habit!)
  • Tool Highlight: Git's blame can reveal who wrote a snippet, and sometimes why.

IDE & Code Navigation

  • Recommended: VS Code with the Hugo extension.
  • Search Across Files:
    • In VS Code, use Ctrl+Shift+F (Windows) or Cmd+Shift+F (macOS) to find where a snippet occurs.
    • Exclude themes/* if you want to focus on your local site files first; or include it if you suspect the logic is in the theme.
  • Explorer: Expand and collapse directories to see how partials, layouts, and content are organized.

Observation: Tools & Techniques

  • Static Gathering:
    • Searching the theme directory for strings or partial names.
    • Reading the top-level config.
  • Dynamic Gathering:
    • hugo server -v (verbose) if you need logs.
    • Reload the site in your browser, open DevTools for console errors, etc.

C.14 – Experts Look Around

C.14 – Experts Look Around

from Experts Borrow

"In the same way that architects walk cities to examine and take inspiration from existing buildings, software experts examine the designs of other software to ‘see how they did it’. They frequently do so in response to a particular challenge they face, but they often also spend time looking around just to add to their repertoire of possible design solutions to draw upon in future."

How Does This Apply?

  • Context: With Hugo, you're exploring a new static site theme—possibly large or unfamiliar.
  • Practical Application:
    • Search across the codebase to observe how the theme authors structure partials or front matter.
    • Compare your project with the projects of students from previous terms, or other users of this theme (you may find some through GitHub's fork of the theme or by searching for the theme in GitHub)

Extra Thoughts & Practical Tips

  • Tip 1: In your local clone, open multiple .html partials side by side to compare patterns for different pages (like "site-header.html" vs. "site-footer.html").
  • Tip 2: Check out official Hugo Themes to see how others approach the same problem. This broadens your "archeological" perspective.
  • Tool Highlight: Use code browsing tools, or throw the code in an LLM and ask for an explanation, or Visual Studio Code's "Go to Definition" to jump around references efficiently.

Software is Full of Patterns

  • Hugo has conventions:
    • layouts/_default/single.html
    • layouts/partials/ for small reusable chunks
    • The archetypes/ folder for default front matter if you run hugo new commands

Observation: If your theme has a pattern, replicate it for new features (e.g., if it uses shortcodes for call-to-action banners, copy the existing pattern for your new "feature highlight").

Hugo Site Structure: A Quick Refresher

A typical Hugo project may look like this:

└── my-portfolio
    ├── config.toml
    ├── content
    │   ├── _index.md
    │   ├── about.md
    │   └── blog
    │       └── first-post.md
    ├── layouts
    │   └── ...
    ├── static
    │   ├── images
    │   └── css
    └── themes
        └── my-hugo-theme
            ├── layouts
            ├── assets
            ├── static
            └── ...
  • config.toml: Site-wide configuration (e.g., site title, baseURL).
  • content/: Markdown files that hold actual site text.
  • layouts/ and themes/: The architectural heart. Houses templates, partials, and the scaffolding that stitches content to rendered pages.
  • static/: Unprocessed assets (images, CSS, JS).

In your homework, you’ll have to figure out where to make changes so that the site’s final design (like your “fake landing page” for your “fake product”) matches specs.

E.24 – Experts Externalize Their Thoughts

E.24 – Experts Externalize Their Thoughts

from Experts Sketch

"Experts sketch when they think. They sketch when alone. They sketch in meetings with colleagues or clients. They sketch when they have no apparent need to sketch. They sketch on paper, on whiteboards, on napkins, in the air. Experts know that sketching is a way to interact with their own thoughts, an opportunity to externalize, examine, and advance what they have in their minds."

How Does This Apply?

  • Context: A Hugo project has many interlinked config files, layouts, partials, and content directories.
  • What This Means:
    • Draw a small diagram of the site structure or how a particular partial references a given SCSS file.
    • Keep a simple "site map" or "architecture sketch" in your notes (or in NOTES.md in your repo).
    • By turning intangible connections into visible sketches, you catch oversights or repeated patterns.

Extra Thoughts & Practical Tips

  • Recommended Tools:
    • A literal pen & paper or whiteboard to outline folder structures.
    • Online mind-mapping/diagramming (Miro, Draw.io, etc.).
  • Key Benefit: You can revisit your sketch to see if it still matches reality when the theme evolves (or your changes become more extensive).

Keeping everything in your mind is hard!

From Andrew Lindesay - Software & Data Engineering

Sample Diagram Explaining Hugo Rendering Workflow

Document & Share Findings

  • Keep notes in your repository's README or a separate NOTES.md.
  • Summarize what you learned about partials, what file controls the homepage, etc.
  • This helps you (and future you) to remember how you set up or modified the theme.

Common Pitfalls & Observations

  1. Theme Dependencies:

    • Some themes (including the one we use here) require precompilation
      • I.e., with Node.js, you to run npm install (for SCSS, PostCSS, etc.).
    • If you skip this, your site might have missing CSS or fail to build.
  2. Base URL / config.toml:

    • A wrong baseURL can lead to broken images.
    • Double-check your site is accessible locally.
  3. Front Matter:

    • Missing or malformed front matter in a Markdown file can cause blank pages or errors.
    • Check for the triple-dash lines (---) or correct parameters.
  4. Search & Replace Mistakes:

    • Be cautious when globally renaming placeholders.
    • ALWAYS check commits individually.
  5. Node-sass / SCSS Issues:

    • If your theme is old or pinned to a certain node-sass version, you might see compilation errors.

More Common Pain Points in Hugo

  • Theme Asset Compilation

    • If your theme uses Node-based tooling (e.g., Webpack, Babel, Sass), you might see instructions like npm run build or gulp build.
    • A typical error: “cannot find module” or “invalid command” because you forgot to npm install.
  • Partial Overriding

    • If your custom partial or layout shares the same name/path as the theme’s partial, your local version will override the theme’s.
    • This can be helpful or confusing if you’re not aware you’re overriding a default partial.
  • Content Not Showing

    • Possibly the front matter draft is set to true. Or your _index.md file is in the wrong folder (like content/index.md instead of content/_index.md).
    • Another possibility is that the “archetype” for your content type is not recognized, or you have a mismatch in folder naming.
  • Permalink or URL Issues

    • You might see broken links if your baseURL in config.toml is incorrect.
    • Or you might have an old link from a template referencing an absolute path like http://localhost:1313/... when you want relative links.

F.32 – Experts See Error as Opportunity

F.32 – Experts See Error as Opportunity

from Experts Embrace Error

"Design regularly involves error: things that 'go amiss’, misunderstanding, obstacles, wrong turns, emergent issues. Rather than fearing error, experts embrace error as opportunity. They accept it as an inherent part of design and take time to explore both the failure and the context around it. Understanding what happened often reveals insights about the problem – or about the solution – such as assumptions, misconceptions, misalignments, and emergent properties."

How Does This Apply?

  • Context: Editing a Hugo partial or config can trigger error messages in the console, or a broken layout in the browser.
  • What to Do:
    • The point of the homework is to learned in a structured environment. Every failure is useful and should be learned from.
    • Investigate each broken build: the error message often reveals which partial or front matter is missing or mismatched.
    • If a layout doesn't display, it might highlight an unseen dependency, forcing you to look deeper into how the theme is structured.

Extra Thoughts & Practical Tips

  • Tip 1: Keep a note of each compilation error you hit. Later, you can form a "common pitfalls" doc or FAQ to help others.
  • Tip 2: Don't panic—breakage is a normal part of learning new code. Embrace each bug as a lens to see how the system truly works.
  • Tip 3: When seeking help (TAs or Slack or...) come prepared: With a clear plan of what you have tried, and ready to learn what you are missing. The goal is not to finish the homework, the goal is to become independent.
  • Recommended Tools: The Hugo --debug or -v flag can add detail to your error logs. LLMs such as ChatGPT, Claude or DeepSeek are very good at make error messages explicit if you find on cryptic.

Closing: Goals of HW1

  1. Your job: Create a landing page using an unfamiliar theme.
  2. Archaeology angle: Skim, search, hypothesize how the theme is structured, then experiment.
  3. By the end, you'll have a functioning site. Next homework, you'll see how to deploy it automatically.

Tips for Future Code Archeology

  1. You’ll Never Master Every File: That’s OK. Focus on the parts that matter for your immediate tasks.
  2. Value of Good Documentation: When you tweak code, leave behind helpful comments or short docs (like a NOTES.md) for yourself and future students.
  3. Learn to Investigate: Don’t be afraid of building from local docs, or looking at the official Hugo docs. Searching is fundamental.
  4. Stay Curious: Always wonder, “Why is it done this way? Where is that variable coming from? Is it in the config or front matter?” This curiosity fosters deeper understanding.

The COBOL Story

A Real Case Study In Software Archeology

Software Archaeology & COBOL

  • COBOL (1959): Among the oldest high-level languages still in use, primarily in finance, government, and administration.

  • Why It Matters: Despite its age, COBOL underpins critical systems worldwide—but finding experts to maintain it is getting harder.

COBOL's Staying Power

  • History & Design
    • Created in 1959, influenced by computing pioneer Grace Hopper.
    • Tailored for business operations (easy-to-read, data-centric).
  • Widespread Adoption
    • Banking & Finance: Manages transactions at scale.
    • Government Agencies: Processes taxes, social security, and benefits.
    • Insurance: Policy management, claims processing.
  • Why It’s Still Around
    • Proven reliability over decades.
    • Migration costs to newer tech are huge.
    • Regulatory requirements favor stable, well-tested systems.

COBOL systems are deeply ingrained in critical infrastructure and remain functional, leading many organizations to keep them rather than replace them.

The COBOL Programmer Shortage

  • Demographic Shifts
    • Many experienced COBOL developers are retiring.
    • Few new programmers are choosing to learn COBOL.
  • Educational Gaps
    • University curricula rarely include COBOL or mainframe topics.
    • Training mostly happens on-the-job or through specialized programs.
  • Economic Implications
    • Limited talent drives up costs for maintenance and modernization.
    • Outages or slow updates can cause severe financial and reputational damage.

As COBOL experts retire, a looming skills gap endangers the stability of crucial applications running on legacy systems.

Unemployment System Crisis (2020)

  • Context
    • Early in the COVID-19 pandemic, unemployment claims skyrocketed.
    • Many state unemployment systems (some built in COBOL decades ago) couldn’t handle the surge.
  • Problems Faced
    • System Overload: Outdated COBOL mainframes buckled under unprecedented demands.
    • Payment Delays: People waited weeks or even months for benefits.
    • Public Outcry: Governments faced pressure to fix these failing systems quickly.
  • Emergency Response
    • State governments issued pleas for retired COBOL programmers to come back.
    • Quick patches and scaling attempts were made to handle mounting claims.
  • Lessons Learned
    • Dependency on Legacy Systems: Vital public services depend on old tech.
    • Need for Preparedness: Systems must be ready for sudden spikes in usage.
    • Importance of Skills Retention: A shortage of COBOL talent made crisis recovery harder and slower.

The 2020 unemployment surge highlighted how vulnerable essential services can be when they rely on decades-old, under-maintained COBOL systems.

Looking Ahead – Possible Solutions

  • Training & Education
    • Corporate and university initiatives to teach COBOL basics.
    • Mentoring programs pairing new developers with retiring experts.
  • Modernization Paths
    • Replatforming: Move COBOL apps to modern hardware or cloud environments without fully rewriting them.
    • Incremental Refactoring: Gradual transition of core components to newer languages or microservices.
    • API Wrapping: Expose COBOL functions via APIs, allowing modern front ends to interact with legacy back ends.
  • Strategic Planning
    • Retaining retirees as consultants to transfer knowledge.
    • Documenting systems thoroughly to reduce reliance on individual expertise.

Organizations can either modernize legacy systems or invest in COBOL training—but doing nothing increases the risk of critical failures.

Conclusion

  • COBOL’s persistence underscores how “old” doesn’t always mean “obsolete.”

  • It highlights the importance of software archaeology—understanding and maintaining legacy code to ensure the continued stability of society’s most essential services.

THANK YOU

Questions?

CIS 3500: Lecture 4

By Jérémie Lumbroso

Private

CIS 3500: Lecture 4