This is a fast, customizable and fairly accurate stats generator for programming projects, in the...

This is a fast, customizable and fairly accurate stats generator for programming projects, in the form of a CLI executable, written in Rust, with minimal...

code_stats

About

This is a fast, customizable and fairly accurate stats generator for programming projects, in the form of a CLI executable, written in Rust, with minimal dependencies. It is used for counting total lines, code lines, and user defined keywords like classes.

Example run:

Table of contents

  • How To Run
  • Details
  • Supported Languages
  • Accuracy and Limitations
  • Performance
  • Similar Projects

How To Run

You can run the project directly by dowloading the "executable/release" folder that contains the executable and the neccessary "data" folder. Alternatively, you can build the project yourself cargo b --release

Format of arguments: <path_here> --optional_command1 --optional_commandN

The program, expects a path to a directory or a code file, that can be provided as cmd argument, or if not, you will be prompted to provide it after running the program. The program also accepts a lot of optional flags to customize functionality, see the next section for more info or use the --help command.

Details

The generated stats are the following:

  • Number of files
  • Lines (code + others) and percentages
  • Size (total and average)
  • Keyword occurances
  • Percentage comparisons between languages

The program requires a "data" dir to be present on the same level as the executable(or 2 levels up in the folder hierarchy). In the "data" dir, a"languages" dir must be present, that contains the supported languages as seperate txt files. An optional "config" dir may be present too, where the user can specify persistent settings, more on that later.

The program counts the lines of files in the specified directory. In order for a file to be considered for counting, its extension must be supported, meaning that a .txt language file specifying the particular extension as an entry in its 'Extensions' field must be present in the "data/languages" dir see Supported Languages.

The program distinguishes the total lines in code lines and "extra" lines (all the lines that are not code). Note that braces "{ }" are not considered as code by default, but this can be changed by using the --braces-as-code flag. Also, the program can search for user-defined keywords that are specified in the language files and count their occurances, while identifying them correctly in complex lines, see Accuracy and Limitations for details.

Below there is a list with all the commands-flags that the program accepts.

--help
    Display this message on the terminal. No other arguments or commands are required.
    
--path
    The path to a directory or a single file, in this form: '--path <path_here>'
    It can either be surrounded by quotes: "path" or not, even if the path has whitespace.

    The path can also be given implicitly (in which case this command is not needed) with 2 ways:
    1) as the first argument of the program directly
    2) if it is present in a configuration file (see '--save' and '--load' commands).

--exclude 
    1..n arguments separated with whitespace, can be a folder name, a file name (including extension), 
    or a full path to a folder or file (unfortunately whitespace in the path is not allowed here)

    The program will ignore these dirs.

--languages 
    1..n arguments separated with whitespace, case-insensitive

    The given language names must exist in any of the files in the 'data/languages/' dir as the
    parameter of the field 'Language'.

    Only the languages specified here will be taken into account for the stats.

--threads
    1 argument: a number between 1 and 8. Default: 4 

    This reprisents the number of the consumer threads that parse files,
    there is also always one producer thread that is traversing the given dir.

    Increasing the number of consumers can help performance a bit in a situation where
    there are a lot of big files, concentrated in a shallow directory structure.
    
--braces-as-code
    No arguments in the cmd, but if specified in a configuration file use 'true' or 'yes' to enable,
    or anything else to disable. Default: disabled

    Specifies whether lines that only contain braces, should be considered as code lines or not.

    The default behaviour is to not count them as code, since it is silly for code of the same content
    and substance to be counted differently, according to the programer's code style.
    This helps to keep the stats clean when using code lines as a complexity and productivity metric.

--search-in-dotted
    No arguments in the cmd, but if specified in a configuration file use 'true' or 'yes' to enable,
    or anything else to disable. Default: disabled

    Specifies whether the program should traverse directories that are prefixed with a dot,
    like .vscode or .git.

--show-faulty-files
    No arguments in the cmd, but if specified in a configuration file use 'true' or 'yes' to enable,
    or anything else to disable. Default: disabled

    Sometimes it happens that an error occurs when trying to parse a file, either while opening it,
    or while reading it's contents. The default behavior when this happens is to count all of
    the faulty files and display their count.

    This flag specifies that their path, along with information about the exact error is displayed too.
    The most common reason for this error is if a file contains non UTF-8 characters. 

--no-visual
    No arguments in the cmd, but if specified in a configuration file use 'true' or 'yes' to enable,
    or anything else to disable. Default: disabled

    Disables the colors in the "overview" section of the results, and disables the visualization with 
    the vertical lines that reprisent the percentages.

--save
    One argument in the form of a file name (whitespace allowed, without an extension)

    If we plan to run the program many times for a project, it can be bothersome to specify,
    all the flags every time, especially if they contain a lot of exclude dirs for example.
    That's why you can specify all the flags once, and add this command to save them
    as a configuration file. 

    Doing so, will run the program and also create a .txt configuration file,
    inside 'data/config/' with the specified name, that can later be loaded with the '--load' command.

<b>--load</b>
    One argument in the form of a file name (whitespace allowed, without an extension)
    
    Assosiated with the '--save' command, this comman is used to load the flags of 
    an existing configuration file from the 'data/config/' directory. 

    There is already a configuration file named 'default.txt' that contains the default of the program,
    and gets automatically loaded with each program run. You can modify it to add common flags
    so you dont have to create the same configurations for different projects.

    If you provide in the cmd a flag that exists also in the provided config file,
    then the value of the cmd is used. The priority is cmd> custom config> default config. 
    You can combine the '--load' and '--save' commands to modify a configuration file.

Supported Languages

All the supported languages can be found in the folder "data/languages" as seperate text files. The user can easily specify a new language by replicating the format of the language files and customizing it accordingly, either by following the rules below or by copy pasting an existing file.

The format of the languages is as follows(and should not be modified at all):

Language
<name of the language>

Extensions
<name of file extensions like cpp hpp or py>

String symbols
<either 1 or two string symbols seperated by space, like: " ' >

Comment symbol
<single line comment symbol like: //>

all the following lines are optional and can be omitted

Multiline comment start symbol
<a symbol like: /*>

Multiline comment end symbol
<a symbol like: */>

Keyword
    name
    <the name of the keyword to be shown in the results, like: classes>
    aliases
    <any word that constitutes an instance of this keyword, like: class, record>
Keyword
    name
    <the name of the keyword to be shown in the results, like: classes>
    aliases
    <any word that constitutes an instance of this keyword, like: class, record>

Accuracy and Limitations

The program is able to understand and parse correctly arbitrarily complex code structures with intertwined strings and comments. This way it can identify if a line contains something other than a comment, even if the comment is partitioned in multiple positions and it can identify valid keywords, that are not inside strings or comments. For example in a line like /*class"*/" class" aclass, it will not count "class" as a keyword since the first is inside a comment, the second inside a string and the third has a prefix. Additionally:

  • It checks for escaped characters, for example /" will not be counted as a string symbol
  • It resolves symbols that are side by side, for example */* is normally identified as both a closing and an opening comment symbol, but the program will understand the correct usage.

With that said, it is important to mention the following limitations:

  • The program cannot understand language specific syntax or details, this would require a handwritten, complex, language-specific parser for most different languages. For example, in a .php file that contains html or js, the destinction will not be made. Also, the keyword counting doesn't take any measures to ensure that a valid keyword has the user-intended meaning. For example the word "class" may also appear in the syntax of a programming language as a semantic different than declaring a class. This may lead to some false positives.

  • The program assumes that if a line contains any odd number of the same string symbols, then this is an open multiline string. This works for most cases but it may create inaccuracies, for example if a line in python has """ then the program will consider a multiline string everything until the next " symbol and not the next """ symbol. If a language doesn't support multiline strings, then you would not expect to see odd number of string symbols either way in a valid syntnax.

  • A language can only declare either one or two string symbols in the .txt, not more.

  • The program doesn't take into account gitignore files, the unwanted dirs have to be added manually in a configuration file

Performance

On a cold run, performance is mainly limited by how fast the producer thread can traverse the directory and find relevant files, so the consumers can parse them.

The performance will also vary depending on how deep and wide the directory structure is, how big the code files are and how many keywords are specified to be counted.

Here are some metrics for both hot and cold executions on my laptop (i5-1035G1, 2 keywords per language):

  1. reletively deep and wide directory with big files (6 consumers)
4,066 files - lines 5,625,944 - average size 75 KBs

Hot
 1.13 secs (Parsing: 3649 files/s | 5,050,219 lines/s)
Cold
 1.61 secs (Parsing: 2528 files/s | 3,498,721 lines/s)
  1. relatively deep and wide directory with average to small files (4 consumers)
3,824 files - lines 793,751 - average size 8.7 KBs

Hot
 0.29 secs 
Cold
 1.23 secs (Parsing: 3106 files/s | 644,801 lines/s)
  1. very very deep and wide directory, my entire drive (4 consumers)
32,078 files - lines 15,101,949 - average size 21 KBs 

Hot
 11.59 secs (Parsing: 2807 files/s | 1,317,336 lines/s)
Cold
 36.21 secs (Parsing: 891 files/s | 418,475 lines/s)

Similar Projects

If you don't require the keyword counting functionality of this program and the alternate-than-usual visualization, use the scc project written in GO, that is honestly impressive.

Other alternative projects you can check are:

  • loc
  • cloc
  • sloc
  • tokei
Issues

Collection of the latest Issues

vincentdephily

vincentdephily

1

Without this, depending on your project structure, you're going to mostly measure dependencies and generated code. Which can be useful sometimes but should not be the default. Se tokei's --no-ignore-* flags for inspiration. I know mezura has --exclude, but it isn't as convenient and featureful. There are crates to parse various ignore files.

Versions

Find the latest versions by id

v1.0.0-beta - Oct 18, 2021

See Changelog

v0.9.2 - Sep 22, 2021

See the Changelog for details

v0.9.1 - Sep 20, 2021

A small patch that improves the README, improves some error messages, and removes the wait for input operation after program execution, when arguments were provided

v0.9 - Sep 18, 2021

Solidified feature-set of initial production release.

Information - Updated May 14, 2022

Stars: 0
Forks: 0
Issues: 0
Misc

248

A CLI tool to easily get a new project up and running by using pre-made...

A rust cross platform rust boilerplate template to get up and running quickly

A CLI tool to easily get a new project up and running by using pre-made...

CLI tool to generate C++ / SPIR-V enumerations and tables for SpvGenTwo

CLI tool to generate C++ / SPIR-V enumerations and tables for spirv

CLI tool to generate C++ / SPIR-V enumerations and tables for SpvGenTwo

A cli to setup, and generate new components for, a thruster app

Still a work in progress, so please report bugs!

A cli to setup, and generate new components for, a thruster app

Rust CLI template

A quick and dirty CLI boilerplate template for Rust

Rust CLI template

A CLI tool to visualize GraphQL schemas and to output a graph data structure as...

A CLI tool to visualize GraphQL schemas and to output a graph data structure as a graphviz

A CLI tool to visualize GraphQL schemas and to output a graph data structure as...

Rust CLI Template

Template for creating Rust-based CLI tools

Rust CLI Template

solana-cli-program template is a sample app demonstrating the creation of a minimal CLI application written...

solana-cli-program template is a sample app demonstrating the creation of a minimal CLI application written in Rust to interact with Solana and programs

solana-cli-program template is a sample app demonstrating the creation of a minimal CLI application written...

CLI QR code generator and reader written in rust

QRrs is a simple, cross-platform, command-line utility written in rust for working with qr codes (what actually follows from the name)

CLI QR code generator and reader written in rust

CLI manager of time-based one-time password generators

Grab an appropriate binary from the MIT License

CLI manager of time-based one-time password generators
Facebook Instagram Twitter GitHub Dribbble
Privacy