How to prevent spam registration and messages? This should involve two steps: identification and interception. Interception can be done either by freezing or deleting accounts. Spam can also be shielded; but what to shield and what to intercept depends on junk recognition technology. In order to block junk registration and spam messages, they must be understood first. Garbage registration is usually done in bulk by an automated program as it is impractical to spread spam manually.
But when there is a specific interest or a difficulty registering automatically, garbage registration may become semiautomatic or manual. For example, if a site requires a verification code for registration, and if it is difficult to decode, fraudsters employ a group of people to manually enter the verification code from various Internet cafes. For them, compared with the high returns, employment cost is almost negligible.
Some characteristics of bulk and automated registration are
- Same client repeatedly requesting the same URL address
- Abnormal jump flow between web pages (page 1 → page 3, unlike normal user behavior)
- Short time interval between two requests from the same client
- A client that does not look like a user agent browser
- Inability of the client to parse JavaScript and Flash
- In most cases, the authentication code is vali
On analyzing garbage registration information, we may find the following patterns:
- The username used for registration may be randomly generated strings, rather than natural language.
- Information contained in different accounts may have the same content, such as advertising.
- The content may contain sensitive words such as politically sensitive words and commercial advertisements.
- Possible deformation of the text (e.g., changes to words [half width to full width] or the splitting of words [suitable written as suit and able]).
If it is about business, additional patterns may be found, like in an IM:
- If a user sends messages to different users for which recipients are not replying, then that user might be sending spam messages.
- If a user registered to different IM groups is sending messages with the same content, he may be sending spam messages.
Following these patterns, we can set up rules and models. Most of these systems are relatively simple; rule combinations can also create more complex models. A widely used method in the field of spam recognition or antispam is machine learning.
To formulate a good spam recognition algorithm, we need algorithm experts and business experts to work together; it is a continuous process. Currently, we do not have a universal algorithm to fight spams. Business-related security systems must constantly research and develop methods to deal with emerging problems. Many large Internet companies have set up in-house business intelligence teams to handle security issues. However, details about the implementation of such algorithms are beyond the scope of this book.
After carefully analyzing the spam behaviors, they can be roughly divided into the following: the characteristics of the content, behavioral characteristics, and the client itself. Different rules to fight spam can be formed from these three aspects:
- Content-based rules: This includes natural language analysis and keyword matching
- Behavior-based rules: implementing business logic rules
- Client identification rules: using a verification code, CAPTCHA, or allowing the client to parse JavaScript
These three rules used together can make a good effect, and can eventually help establish a sound risk control system—monitor users and intercept high-risk behaviors, trace malicious users, conduct forensic analysis, and assess statistical loss, and all these can provide the basis for further decision making.
After identifying an unauthorized or illegal act, attention needs to be paid to strategy and tactics in blocking these behaviors. Rules cannot be rules if unauthorized people come to know them; hence, confidentiality of rules is very important. When using the rules to confront malicious users, its content is likely to get exposed and then bypassed. So they need to be protected from falling into wrong hands.
How can these rules be protected? When using the rules to identify spam accounts, the usage can be spaced over a period of time so that the malicious users are not aware of which rule is violated. In this way, most of the accounts can be regulated and risks controlled from the defense point of view.
Confrontation against junk registration and spam will eventually accelerate. Security teams need to keep up with the changes in the enemy so that they can always be defeated.