Quantifying Shortfalls in Students’ AI-Supported Programming Practices
Generative AI assistants are permeating programming classrooms, yet little is known about whether students adopt sound usage habits. Fifty undergraduates in CS / CIS courses rated how often they performed 11 recommended behaviors spanning query planning, prompt iteration, verification, explanation seeking, and AI-human code integration on a 1–5 scale (5 = ideal “Always”). Gap scores (5 − rating) exposed the most significant deficits in checking whether the AI has already solved a similar problem (mean gap= 3.14), combining AI code with personal code (mean gap = 2.62), and rewriting prompts after poor answers (mean gap = 2.34). Verification steps such as running, testing, and error-checking AI output, lagged (mean gap ≈2.0), whereas self-reliant actions, such as understanding the problem first (mean gap = 0.94), were relatively strong. Students who frequently applied AI to complex programs (> 50 lines of code, n = 9) showed significantly smaller gaps on seven of the eleven behaviors compared with peers who used AI mainly on simpler tasks (n = 41). Mann-Whitney tests confirm marked gains in code integration and all verification items (p ≤ .002), prompt rewriting (p = .011), and even the top-ranked forethought behavior of scouting for similar AI solutions (p = .001). However, advanced users still left a sizeable deficit in that scouting step and tended to persist with unhelpful AI. To close the observed gaps, five low-overhead scaffolds are proposed: pre-flight AI scans, split-merge coding tasks, prompt-engineering sprints, unit-test checkpoints, and a “three-prompt” reflection rule. Although limited by self-report data and sample size, the gap framework offers actionable metrics for improving AI literacy and invites replication with interaction logs and larger cohorts.