
phishingdetect – A phishing detect system with NLP/OCR/HTML features.
PhishingDetect is A simple machine learning model to identify phishing pages by looking at:
+ HTML text
+ HTML structure
+ IMAGE text
Dependencies:
+ Python 2.7.x
+ tesseract OCR
+ nltk data
+ libraries for machine learning: numpy, scikit-learn, matplotlib and scipy
Use and Download:
1 2 3 4 5 6 7 |
git clone https://github.com/ririhedou/phishingdetect && cd phishingdetect chmod +x install.sh ./install.shpython predict_crawl.py -h python predict_crawl.py --img=test/100022538-facebook.com.png --html=test/100022538-facebook.source.txt python predict_crawl.py -h python predict_crawl.py --img=test/100022538-facebook.com.png --html=test/100022538-facebook.source.txt |
Source: https://github.com/ririhedou