Dreged up from the depts of Rage3D: formula to calculate a member's value!

outside looking in

<b>Registered Member</b>
All in good fun of course. I was reminded of this by a question PeterTutor recently asked me. This was a formula I created when I was extremely bored one night, and it seemed to produce some fairly insightful results. It took a while to compile the stats though, and it actually caused some members fingered to be "pure spammers" to increase their spamming efforts! :D

Anyway, here's the post I made at Rage3D explaining the whole concept back when I conceived of it. Some of the comments are obviously directed at Rage3D members, or the accompanying charts I put up of the top 100 (number of posts) members' ratings.

Equation:

(PC x MACh) x [((MACh / FACh) - 1) x (MPPD / FPPD) + 1]

PC = Member's postcount
MACh = Member's average characters per post (post length)
FACh = Forum's average characters per post (post length)
MPPD = Member's average number of posts per day
FPPD = Forum's average number of posts per day (per user)


Now, my rationale behind the equation:

(PC x MACh) x [((MACh / FACh) - 1) x (MPPD / FPPD) + 1]

(PC x MACh) This is simply the product of number of posts and average characters per post giving the total number of characters posted by a given user. I'll call this "bulk volume."


(PC x MACh) x [((MACh / FACh) - 1) x (MPPD / FPPD) + 1]

(MACh / FACh) This is the ratio of a member's average characters per post to the forum's average characters per post. The result is greater than one if a user is above average in post length, and less than one if they are below average. Subtracting 1 from this term gives the "percent above or below" the forum average. Therefore, above average members are positive, below average members are negative. This is important. I'll call this the "post rating."


(PC x MACh) x [((MACh / FACh) - 1) x (MPPD / FPPD) + 1]

(MPPD / FPPD) This is the ratio of a member's post per day average and that of the forum's post per day (per user) average. A value greater than one means they post more often than the average user, less than one means less often. Call this their "post frequency."


(PC x MACh) x [((MACh / FACh) - 1) x (MPPD / FPPD) + 1]

((MACh / FACh) - 1) x (MPPD / FPPD) This is the interesting part. Recall that the first term, the "post rating," can returen a positive, zero, or negative value. Multiplying this by the member's relative "post frequency" has a most curious result, which I call the member's "spam rating."

If a member has an average post length roughly the same as the forum average, then the first term will be nearly zero. No matter how frequently or infrequently they post, the product of the two will remain close to zero.

If a member has an above average post length, then their "post rating" will be positive. If they post infrequently (relative to the forum average), then that positive number will be reduced (but still positive). If they post frequently, then that positive number will be increased accordingly.

If a member has a below average post length, then their "post rating" will be neagive. If they post infrequently, it will become "less negative," and if they post frequently, it will become "more negative."

It is important to note that post frequency can't change a negative "post rating" into a positive "spam rating" or positive "post rating" into a negative "spam rating." It can only increase or decrease the maginitude of the positive or negative value.

I think this part of the equation very nicely captures the essence of what defines "spam." Intuitively, if a member posts fairly long messages, then a high post rate indicates very useful activity in the forum, and their "spam rating" will be significantly greater than zero (good). If they're posting fairly short messages at a high rate, then it is likely a great many of them fall under the "spam" category, and their "spam rating" will be significantly less than zero (bad).

For members that have average post length, no rate of posting can effect their "spam rating," as it hovers near zero (as it should). For members that post infrequently (or have been around for ages), then the effect of their average post length is reduced, and their "spam rating" is brought closer to zero.

The final term is the addition of a "1" resulting in a percentage of their "bulk volume" that is returned by the equation. For spammers, their bulk volume is reduced by their "spam rating," and for members who spend a great deal of time on posts their "bulk volume" is increased. Put simply, if you post short posts, then posting often hurts you. If you post long posts, then posting often helps you. The baseline is members with average post length, in which case post frequcency doesn't matter, and their "bulk volume" is unmodified (as it should be for an average poster).


Comments:

(1) Data was taken over the last three or so days, so individual members may have changed slightly since then. All members who posted (or quoted) insanely long posts just after the avg. characters stat was introduced have numbers taken from just before they did that. This wasn't difficult to do.

(2) All "forum averages" are only averages of the top 100 posting members (postcount-wise). It would be more accurate to include the top 200, or 500, or 1000... but I was limited in patience in tabulating these values. Additionally, these are "averages of averages" after all, and probably reflect the general posting tendencies of most active members.

(3) Yes, my name is close to the top, and that could very well be because I fudged data or the equation to make it that way. I don't deny the possibility. However, you can check individual users yourself to see if I miscopied data, and I have explained every part of the equation in detail. You are free to disagree, but I think it works quite well.

(4) There could be "coefficients" added in many places to enhance or reduce the effects of individual variables to the final outcome. Some would result only in a scaling of all members ratings, and some would result in reshuffling of the order. Since these coefficients would be completely arbitrary, I decided that unity was as good as any other, and possibly better.

(6) The "average characters per post" stat unfortunately includes quotations of other members or sources. There's nothing I can do about that, and I'm not sure that it really affects things that much except for those members that tend to never quote, or like some (billyhansome :) ) quote the entire posts repeatedly.

(5) "What about old members, doens't that reduce their rating?" Not really. It only reduces their rating if they have posted infrequently. As long as they maintained a somewhat average posting rate, then their "spam rating" would not affected. Additionally, if they have been here a long time and had reasonable posting rates, then their "bulk volume" should be quite large (assuming they were decent length posts).

(6) "But doesn't that mean then that members who stop posting have their ratings decreased?" Only if they had above average post lengths. This positive value would have increased their bulk volume, and with the passage of time their inactivity will bring it back to it's original value - and I think that is appropriate. As their average post per day number drops, their member rating does as well.

(7) "But then, if someone who had a negative post rating (short posts) stopped posting, their rating would actually increase!" Yes, and I think this is appropriate as well. I for one don't miss spammers when they leave, and I think things are better as a result.

(8) "How can someone have a negative rating... is that a mistake?" Unfortunately, no, it's not a mistake. It appears that (according to this particular equation) there is a certain threshold of postcount and postlength combination that results in this curiosity. Many members have a negative "spam rating," but it is only slightly less than zero. After the addition of 1 to turn it into a percentage for modifying their "bulk volume" it is a positive (but less than 100%) value. In a few rare cases, the "spam rating" was so negative that turning it into a percentage still left a negative percentage. The effect is that, essentially, all of their "bulk volume" was removed - and then some. Once this threshold of length and frequency is reached, continued posting by the member (at those rates) serves to only decrease their rating further (since more "bulk volume" is getting multiplied by a negative number).

Remember, posting often doesn't do this in itself. Take a look at mulciber: he has an average of 13.35 - right up there with the best of them. However, his average post length is nearly identical to the forums, so his "spam rating" stays near zero. The end result is that he has a large "bulk value" made in a relatively short timespan, but it didn't take a billion posts to get there... so his rating reflects that. Zardon as well has a high post frequency, but his "post rating" is positive, so his "spam rating" is well into the positive, giving him a high member rating.

Best case and worst cases:

Best case for a rating would be a member that joined and started posting out his wazzoo, both in number and length. The high length and frequency would combine to give him a very positive "spam rating" (good), and as long as he continued at this rate his member rating would continue to skyrocket. I woud consider these to be useful members, if they can sustain that depth of discussion in each of their posts.

Worst case is a member who joins and starts posting out his wazzoo only in post number. If the posts are below average length, then his "spam rating" is going to be negative (bad), and it will decrease his member rating. If his posts are reasonably long, or he doesn't post at an insane rate, then his rating will actually increase positively, just slowly. Cross those lines though, and it gets more and more negative.



"Why did you spend so much time doing this?" We've been slack at work this week, and I was bored. I have to do something there to earn my pay. Besides, I like the challenge of taking data and organizing it in a way that the result is concise and meaningful. Any comments (or suggestions for the equation) are welcome... good or bad.

And no... this post wasn't figured into my rating.
 
would be nice if the board included these stats. I'm not about to sift through 2000 posts and count every character.
 
At Rage3D the admins were able to add a script that displayed the current "average characters per post" for each user.

The "rating" forumla wasn't automated though; I had to enter the stats for each member (I just took the 100 most active) into a spreadsheet which calculated everything.
 
I know what query to run to figure a user's average, but it'd be so intensive we'd get booted off the server in a hurry. Maybe if we were on a dedicated. :(
 
Isn't there any other way? This sounds really interesting? Somehow...some way i must discover if i have worth!:crying5:
 
I really could shit less what the numbers actually come out to be... I'm just proud of the equation! It actually has a lot of thought put into it, and it's probably my only "original" equation to date (I tend to just use others'... much easier that way :D).
 
So, now that it's 2 1/2 years later, are we on a good enough server to install the script and maybe get this member rating figured out? I'd be interested to see what it would spit out.
 
Back
Top