Quantifying Shortfalls in Students’ AI-Supported Programming Practices

Generative AI assistants are permeating programming classrooms, yet little is known about whether students adopt sound usage habits. Fifty undergraduates in CS / CIS courses rated how often they performed 11 recommended behaviors spanning query planning, prompt iteration, verification, explanation seeking, and AI-human code integration on a 1–5 scale (5 = ideal “Always”). Gap scores (5 − rating) exposed the most significant deficits in checking whether the AI has already solved a similar problem (mean gap= 3.14), combining AI code with personal code (mean gap = 2.62), and rewriting prompts after poor answers (mean gap = 2.34). Verification steps such as running, testing, and error-checking AI output, lagged (mean gap ≈2.0), whereas self-reliant actions, such as understanding the problem first (mean gap = 0.94), were relatively strong. Students who frequently applied AI to complex programs (> 50 lines of code, n = 9) showed significantly smaller gaps on seven of the eleven behaviors compared with peers who used AI mainly on simpler tasks (n = 41). Mann-Whitney tests confirm marked gains in code integration and all verification items (p ≤ .002), prompt rewriting (p = .011), and even the top-ranked forethought behavior of scouting for similar AI solutions (p = .001). However, advanced users still left a sizeable deficit in that scouting step and tended to persist with unhelpful AI. To close the observed gaps, five low-overhead scaffolds are proposed: pre-flight AI scans, split-merge coding tasks, prompt-engineering sprints, unit-test checkpoints, and a “three-prompt” reflection rule. Although limited by self-report data and sample size, the gap framework offers actionable metrics for improving AI literacy and invites replication with interaction logs and larger cohorts.

Pratibha Menon
Pennsylvania Western University
United States
menon@pennwest.edu