angler-fishThe Vulnerability History Project

Add checks against spoofing attempt at top domains

      Remove diacritic marks from a hostname and calculate the confusability
skeleton of the accent-free name. Look it up in the pre-calculated list of
the skeletons of top 10k domains.

Removing diacritic marks from a hostname is equivalent to comparing names with
the primary collation strength in the root locale. To make them equivalent,
three mappings are added (�� > l; �� > o; �� > d) on top of the diacritic-removal.
Also add two more mappings ([������] > k,  �� > n) to supplement the Unicode's
confusables list.

Binary file size increase: ~ 59kB for the DAFSA representation of top
domain name skeletons.

The IDN display policy check takes ~ 2��s longer on the average (3.3 ��s => 5.5��s)
on my machine per the test run over ~1 million IDNs in com TLD).

It adds about 1500 domains to the list of domains to display in Punycode out
of ~ 1 million IDNs in com TLD. (3018 => 4571)

In addition, disallow combining diarctic marks unless they're preceded by
Latin-Greek-Cyrillic.

BUG=703750,714628,719199,722639
TEST=components_unittests --gtest_filter=*IDNToUni*

Review-Url: https://codereview.chromium.org/2784933002
Cr-Commit-Position: refs/heads/master@{#473109}
    
commit a8add0308ba6067eb3de5a8fe82f9c2f2460ad91
-2
+8 -104
-15
-32
-16
-9176
-9186
-55
-122
-11
+5 -33
expand_less