Lädt...


🔧 [Development Tips] How to Get the Number of Strokes in Chinese Characters?


Nachrichtenbereich: 🔧 Programmierung
🔗 Quelle: dev.to

Background

During the development of a simple divination script, I encountered this interesting problem. If it's just a specific few Chinese characters, we can hard-code a dictionary in the script, but what if we want to get the stroke count for any Chinese character?

pypinyin library

from pypinyin import pinyin, Style

def get_strokes_count(chinese_character):
    pinyin_list = pinyin(chinese_character, style=Style.NORMAL)
    strokes_count = len(pinyin_list[0])
    return strokes_count

character = input("Please enter a Chinese character:")
strokes = get_strokes_count(character)
print("Character'{}'stroke numbers:{}".format(character, strokes))

I tried it and found that the result is actually the number of results in the normal pinyin format for that character.

pypinin wrong

Unihan Database

The Unihan database is a Chinese character database maintained by the Unicode Consortium, which seems quite reliable and also provides online tools.

In its online query tooUnihan Database LookupI found that the query results contain the kTotalStrokes field, which is the stroke count data we need.
As the official database of Unicode, the current version fully meets the basic needs of Chinese character queries.

Nice! One step closer to success!
unihan_lokup

Getting Stroke Information from Unihan Database

I initially planned to send query requests directly through lookup, but it was too slow, and the address is abroad from China. I found that the database file itself is not large, so I downloaded it directly.

Unihan

AfterAfter opening the compressed package, there are several files.

Unihan_files.png

By looking up the results, we need the kTotalStrokes field in the IRG Source. Extract this file.
I tested the regex on regex101 to extract the desired Unicode part and stroke count part, and saved them separately for querying.

Coding

  • Extracting Stroke Information
file = Path("Stroke/Unihan_IRGSources.txt")
output = Path("Stroke/unicode2stroke.json")
stroke_dict = dict()
with open(file,mode="r") as f:
    for line in f:
        raw_line = line.strip()
        pattern = r"(U\+.*)\skTotalStrokes.*\s(\d+)"
        result = re.findall(pattern=pattern, string=raw_line)
        if len(result) == 0:
            continue
        unicode_key = result[0][0]
        unicode_stroke = result[0][1]
        print(f"{unicode_key}: {unicode_stroke}")
        stroke_dict[unicode_key] = unicode_stroke

with open(file=output, mode="w", encoding="utf-8") as f:
    json.dump(stroke_dict,f, ensure_ascii=False, indent=4)

exported to json for easy access.

  • Writing the Acquisition Function
with open(output) as f:
    unicode2stroke = json.load(f)

def get_character_stroke_count(char: str):
    unicode = "U+" + str(hex(ord(char)))[2:].upper()
    return int(unicode2stroke[unicode])

test_char = ""
get_character_stroke_count(char=test_char)

When obtaining, note that Unicode converts the character to its corresponding hexadecimal code

Success! The expected result is achieved!

...

🔧 [Development Tips] How to Get the Number of Strokes in Chinese Characters?


📈 75.2 Punkte
🔧 Programmierung

🔧 KISS Principle: Giữ Mọi Thứ Đơn Giản Nhất Có Thể


📈 31.24 Punkte
🔧 Programmierung

🔧 Có thể bạn chưa biết (Phần 1)


📈 31.24 Punkte
🔧 Programmierung

🔧 Tìm Hiểu Về RAG: Công Nghệ Đột Phá Đang "Làm Mưa Làm Gió" Trong Thế Giới Chatbot


📈 31.24 Punkte
🔧 Programmierung

📰 All Five Nights At Freddy’s Characters 2022 FNAF Characters


📈 28.17 Punkte
📰 IT Security Nachrichten

📰 How the Majority of Strokes Could Be Prevented


📈 27.42 Punkte
📰 IT Security Nachrichten

📰 Microsoft Researchers Introduce StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis


📈 27.42 Punkte
🔧 AI Nachrichten

📰 Ransomware’s Impact Could Include Heart Attacks, Strokes & PTSD


📈 27.42 Punkte
📰 IT Security Nachrichten

📰 Malware Uses Trigonometry to Track Mouse Strokes


📈 27.42 Punkte
📰 IT Security Nachrichten

📰 Air Pollution 'Triggers Hundreds More Heart Attacks and Strokes,' Study Finds


📈 27.42 Punkte
📰 IT Security Nachrichten

🪟 How to automate mouse gestures and keyboard strokes on Windows 10


📈 27.42 Punkte
🪟 Windows Tipps

📰 How to automate mouse gestures and keyboard strokes on Windows 10


📈 27.42 Punkte
🖥️ Betriebssysteme

📰 Spider Venom Might Protect Us From Deadly Strokes


📈 27.42 Punkte
📰 IT Security Nachrichten

🔧 Count possible number of arrangements of characters for each query


📈 23.01 Punkte
🔧 Programmierung

🔧 3121. Count the Number of Special Characters II


📈 23.01 Punkte
🔧 Programmierung

📰 6 Wc Command to Count Number of Lines, Words, and Characters in File


📈 23.01 Punkte
🐧 Unix Server

🔧 PowerShell | Script output garbled Chinese characters


📈 21.86 Punkte
🔧 Programmierung

🐧 Nerd-font glyphs displayed as Chinese characters. How do I fix.


📈 21.86 Punkte
🐧 Linux Tipps

📰 Phone auto messaged in Chinese characters


📈 21.86 Punkte
📰 IT Security Nachrichten

🐧 Tips to Quickly Access Screenshots and Special Characters in GNOME


📈 20.96 Punkte
🐧 Linux Tipps

🐧 Tips for working with filenames which contain special characters


📈 20.96 Punkte
🐧 Linux Tipps

🪟 All Overwatch 2 heroes: Tier list, characters, abilities, tips, and tricks


📈 20.96 Punkte
🪟 Windows Tipps

🪟 All Overwatch 2 heroes: Tier list, characters, abilities, tips, and tricks


📈 20.96 Punkte
🪟 Windows Tipps

🔧 Amazon Q Developer Tips: 25 tips to supercharge your development


📈 19.23 Punkte
🔧 Programmierung

🪟 You get to choose three of the six main characters in Trials of Mana


📈 18.72 Punkte
🪟 Windows Tipps

🪟 Gears 5 multiplayer 'Operations' will get new characters, maps, and more


📈 18.72 Punkte
🪟 Windows Tipps

📰 Apple Number One, Microsoft Number Two in World’s Most Valuable Companies


📈 17.85 Punkte
📰 IT Security Nachrichten

📰 Facebook has got your number – even if it’s not your number


📈 17.85 Punkte
📰 IT Security Nachrichten

🔧 1342. Number of Steps to Reduce a Number to Zero


📈 17.85 Punkte
🔧 Programmierung

🔧 - binary numbers, other number representations - conversions between different number representations


📈 17.85 Punkte
🔧 Programmierung

matomo