Hacker News

Nieuws | Newest | Tonen | Ask | Jobs

Show HN: Fdir – Faster Node.js glob alternative: 31

Hey everyone,

I recently created fdir mostly out of curiosity about how fast a program written in Node.js could be. It so happened that I (accidentally) created the fastest directory crawler in the NodeJS environment. fdir can easily crawl around 1 million files in under 1 second. 1 million files distributed in about 100k directories. (your mileage may vary depending on hardware).

It's also < 1kb in size (gzipped). Supports all node versions (> 6).

Feel free to give it a run and ask me any questions (if any) :D

Blog post: https://dev.to/thecodrr/how-i-wrote-the-fastest-directory-cr...

Take care, thecodrr - thecodrr 2 months ago


This here looks wrong: https://github.com/thecodrr/fdir/blob/master/index.js#L86 It's calling a blocking method from async in Node <10

Another thing I noticed, it looks you only handle `dirent.isDirectory()`, but not `dirent.isSymbolicLink()` (meaning the library won't find files in symlinked folders, e.g. lerna node_modules) - lhorie 2 months ago


This has absolutely nothing to do with globbing. The title is very misleading – I was super excited to see a new glob library. - zXuPh94rt 2 months ago


I took it for a ride

They all list all files recursively in a sync fashion, from the node_modules dir (like in the benchmark), excluding dirs, and print a total count.

Here are the results calculated with hyperfine (https://github.com/sharkdp/hyperfine):

    hyperfine "bash test.sh" --warmup 5
    Benchmark #1: bash test.sh
    Time (mean ± σ):       7.5 ms ±   0.5 ms    [User: 4.5 ms, System: 4.2 ms]
    Range (min … max):     6.9 ms …  10.5 ms    332 runs

    hyperfine "perl test.pl" --warmup 5
    Benchmark #1: perl test.pl
    Time (mean ± σ):      25.6 ms ±   1.2 ms    [User: 16.8 ms, System: 8.8 ms]
    Range (min … max):    24.0 ms …  30.8 ms    97 runs

    hyperfine "python3.7 test.py" --warmup 5
    Benchmark #1: python3.7 test.py
    Time (mean ± σ):      43.4 ms ±   1.4 ms    [User: 32.6 ms, System: 10.9 ms]
    Range (min … max):    40.9 ms …  46.8 ms    66 runs
    
    hyperfine "ruby test.rb" --warmup 5
    Benchmark #1: ruby test.rb
    Time (mean ± σ):      66.5 ms ±   2.0 ms    [User: 52.1 ms, System: 14.4 ms]
    Range (min … max):    63.2 ms …  70.3 ms    42 runs

    hyperfine "node test.js" --warmup 5
    Benchmark #1: node test.js
    Time (mean ± σ):      83.7 ms ±   4.0 ms    [User: 74.7 ms, System: 15.6 ms]
    Range (min … max):    79.4 ms …  95.3 ms    36 runs
Here are the results of an hello world with each runtime for comparison:

    hyperfine "bash test.sh" --warmup 5
    Benchmark #1: bash test.sh
    Time (mean ± σ):       1.2 ms ±   0.3 ms    [User: 1.1 ms, System: 0.3 ms]
    Range (min … max):     0.9 ms …   3.8 ms    1521 runs

    hyperfine "perl test.pl" --warmup 5
    Benchmark #1: perl test.pl
    Time (mean ± σ):       1.3 ms ±   0.3 ms    [User: 1.3 ms, System: 0.3 ms]
    Range (min … max):     1.0 ms …   5.3 ms    1103 runs

    hyperfine "python3.7 test.py" --warmup 5
    Benchmark #1: python3.7 test.py
    Time (mean ± σ):      19.5 ms ±   0.9 ms    [User: 16.2 ms, System: 3.4 ms]
    Range (min … max):    18.3 ms …  23.7 ms    144 runs

    hyperfine "ruby test.rb" --warmup 5
    Benchmark #1: ruby test.rb
    Time (mean ± σ):      55.2 ms ±   2.2 ms    [User: 47.2 ms, System: 8.1 ms]
    Range (min … max):    52.2 ms …  61.9 ms    51 runs
    
    hyperfine "node test.js" --warmup 5
    Benchmark #1: node test.js
    Time (mean ± σ):      55.4 ms ±   1.8 ms    [User: 49.5 ms, System: 7.0 ms]
    Range (min … max):    53.0 ms …  60.0 ms    53 runs
Now, my machine is not setup for a clean benchmark. The disk cache is warmed up. Hyperthreading is on. Other softwares are running.

Plus the scripts all found a slightly different number of files :) I suspect they all treat symlinks/dotted dirs differently, and I didn't take the time to normalize. Although I don't think this makes up for the difference.

Still the result is a bit interesting. The non JS tests are not using any 3rd party libs. Ruby and Node seems to have the same cost for VM startup.

I'm quite surprised that node is last frankly, especially on a uber optimized code. I'm expecting V8 code to be blazing fast as it's Google made and C++. - BiteCode_dev 2 months ago


This is a replacement for walk, not glob. - mayank 2 months ago