webfp-crawler-phantomjs ~ tool to create dataset for testing Website Fingerprint (WF) attacks on Tor.
webfp-crawler-phantomjs is A python crawler for crawling Tor and collect network traces using wireshark. Used to create dataset for testing Website Fingerprinting (WF) attacks on Tor.
+ Linux packages: python tcpdump wireshark Xvfb & phantomJS
+ Python packages: selenium requests stem psutil(version < 3) tld xvfbwrapper scapy
– Ubuntu 14.04/16.04 and Python 2.7
– Debian 7 – 9.0 and python 2.7.13
– Kali Linux/Rolling.
* Configure the environment
– We recommend running crawls in a VM or a container (e.g. LXC) to avoid perturbations introduced by the background network traffic and system level network settings. Please note that the crawler will not only store the Tor traffic but will capture all the network traffic generated during a visit to a website. That’s why it’s extremely important to disable all the automatic/background network traffic such as the auto-updates. See, for example the instructions for disabling automatic connections for Ubuntu.
– You’ll need to set capture capabilities to your user: sudo setcap ‘CAP_NET_RAW+eip CAP_NET_ADMIN+eip’ /usr/bin/dumpcap
– Download the TBB and extract it to ./tbb/tor-browser-linux<arch>-<version>_<locale>/.
– You might want to change the MTU of your network interface and disable NIC offloads that might make the traffic collected by tcpdump look different from how it would have been seen on the wire.
– Change MTU to standard ethernet MTU (1500 bytes): sudo ifconfig <interface> mtu 1500
– Disable offloads: sudo ethtool -K <interface> tx off rx off tso off gso off gro off lro off
– See the Wireshark Offloading page for more info.
git clone https://github.com/pankajb64/webfp-crawler-phantomjs && cd webfp-crawler-phantomjs
python main.py -h
Run a crawl with the defaults
python main.py -u ./etc/localized-urls-100-top.csv -e wang_and_goldberg