Path: utzoo!attcan!uunet!tut.cis.ohio-state.edu!mailrus!ncar!ico!vail!rcd From: rcd@ico.ISC.COM (Dick Dunn) Newsgroups: comp.software-eng Subject: Re: C source lines in file Summary: lines don't count...and Two Rules Message-ID: <16018@vail.ICO.ISC.COM> Date: 17 Aug 89 18:10:24 GMT References: <35120@ccicpg.UUCP> Organization: Interactive Systems Corp, Boulder, CO Lines: 107 swonk@ccicpg.UUCP (Glen Swonk) writes: > Does anyone have a program or a method of determing > the number of C source lines in a source file? > My assumption is that comments don't count as source > lines unless the comment is on a line with code. If you're on a UNIX system or have comparable tools, a simple awk script can do this much. However, you don't learn much from it. In particular, given the question: > Are there any other tools to measure the complexity > of a source file? it's clear you're off on the wrong foot. A count of source lines is NOT a useful measure of program size or complexity. Incidentally, be careful about the difference between size and complexity! As noted by flint@gistdev.UUCP (Flint Pellett): > Comment lines don't count? What are you going to use the count for when you > get it? ... If you're counting for purposes of > measuring productivity, then comment lines certainly do count, otherwise > you're going to be encouraging people to not document their code. Pellett is correct about the effect of not counting comment lines. However, if you go off counting lines as a measure of work, you'll see a useful comment like: /* lexcom - scan (a piece of) a comment * Return either T_COM if end of comment found or T_NULL if end of * line found first. * Also handles instate and comment counting. */ turn into a baroque display like: /************************************************************************/ /* */ /* FUNCTION NAME: lexcom */ /* */ /* RESULT TYPE: int */ /* */ /* ARGUMENTS: (none) */ /* */ /* PURPOSE: blather babble... */ /* */ [etc., ad nauseam...no sense wasting netbandwidth on it...] /* */ /************************************************************************/ The same thing will happen if you associate some reward or figure of merit with source-line count, or identifier length, etc...you'll see: for (p = s; *p; p++) { [stuff] } turn into: for (string_search_pointer = target_string; *string_search_pointer != STRING_TERMINATOR; string_search_pointer++) { [stuff] } When I've tried to measure C source-file size and complexity, I've used a program which does a simple analysis of the source but gives several measures, including the following: blank lines lines containing only comment text lines containing only code lines containing comment and code average comment length or histogram of lengths average number of tokens per line, per nonblank line average identifier length or histogram of lengths average nesting level (requires tedious explanation) count of occurrences of each keyword count of occurrences of literal constants, by type The result, of course, does NOT reduce program size or complexity to a single number. The token count is far more useful than a line count if you want to know "how much code" you've got, but it's still woefully inadequate. I offer two rules about measuring program size/complexity: 1. Any variant of "source line count" is useless as a measure of the program. I've heard countless times the rationalization that "Well, it may not be good, but it's the best we can do." This is WRONG! It's worse than no measure at all. It implies that you have information you don't really have. If it's used as a measure of productivity, it's particularly bad, because there are obvious ways to pervert any obvious measures--and all of them make for worse programs. 2. Programs are supposed to be good, not big. A program should be measured against what it is supposed to do. Sheer size is often unrelated to apparent complexity, and both may be unrelated to actual complexity (in terms of programming effort). Talking among various people I know, we've all come up with a joke about "negative productivity". You start the day with, say, a thousand lines of crappy code and end the day with 300 lines of clean code--thereby having produced -700 lines of code for the day. -- Dick Dunn rcd@ico.isc.com uucp: {ncar,nbires}!ico!rcd (303)449-2870 ...Are you making this up as you go along?