| Welcome to The New Coffee Room. We hope you enjoy your visit. You're currently viewing our forum as a guest. This means you are limited to certain areas of the board and there are some features you can't use. If you join our community, you'll be able to access member-only sections, and use many member-only features such as customizing your profile, sending personal messages, and voting in polls. Registration is simple, fast, and completely free. Join our community! If you're already a member please log in to your account to access all of our features: |
- Pages:
- 1
- 2
| How much we talk | |
|---|---|
| Tweet Topic Started: Dec 23 2014, 02:05 PM (740 Views) | |
| Klaus | Dec 23 2014, 02:05 PM Post #1 |
![]()
HOLY CARP!!!
|
Number of words per post for users with > 2000 posts in my database. I report the median, mean, and standard deviation. ***musical princess***: Median = 7, Mean = 12, σ = 23 1hp: Median = 15, Mean = 23, σ = 28 AlbertaCrude: Median = 9, Mean = 18, σ = 41 Aqua Letifer: Median = 10, Mean = 19, σ = 29 Axtremus: Median = 8, Mean = 14, σ = 18 Copper: Median = 7, Mean = 11, σ = 12 CrashTest: Median = 7, Mean = 12, σ = 19 Daniel\: Median = 7, Mean = 19, σ = 68 Dewey: Median = 13, Mean = 45, σ = 94 DivaDeb: Median = 9, Mean = 18, σ = 29 Fizzygirl: Median = 7, Mean = 14, σ = 23 Frank_W: Median = 6, Mean = 11, σ = 18 George K: Median = 6, Mean = 25, σ = 59 Horace: Median = 10, Mean = 17, σ = 19 Improviso: Median = 7, Mean = 14, σ = 28 JBryan: Median = 6, Mean = 13, σ = 33 Jack Frost: Median = 8, Mean = 24, σ = 78 John D'Oh: Median = 11, Mean = 15, σ = 16 Jolly: Median = 6, Mean = 21, σ = 69 Kincaid: Median = 9, Mean = 17, σ = 29 Klaus: Median = 11, Mean = 18, σ = 36 KlavierBauer: Median = 14, Mean = 24, σ = 30 LWpianistin: Median = 6, Mean = 10, σ = 14 Larry: Median = 8, Mean = 21, σ = 48 Luke's Dad: Median = 8, Mean = 14, σ = 20 Mark: Median = 4, Mean = 11, σ = 25 Mikhailoh: Median = 6, Mean = 11, σ = 23 Nobody's Sock: Median = 10, Mean = 18, σ = 29 OperaTenor: Median = 7, Mean = 15, σ = 43 Optimistic: Median = 7, Mean = 13, σ = 23 Phlebas: Median = 8, Mean = 16, σ = 37 Piano*Dad: Median = 8, Mean = 17, σ = 29 QuirtEvans: Median = 15, Mean = 29, σ = 60 Red Rice: Median = 7, Mean = 11, σ = 16 Renauda: Median = 7, Mean = 12, σ = 19 Rick Zimmer: Median = 29, Mean = 53, σ = 83 Riley: Median = 6, Mean = 12, σ = 32 RosemaryTwo: Median = 8, Mean = 14, σ = 19 Steve Miller: Median = 8, Mean = 19, σ = 43 The 89th Key: Median = 7, Mean = 14, σ = 31 TomK: Median = 8, Mean = 17, σ = 39 VPG: Median = 8, Mean = 14, σ = 22 apple: Median = 6, Mean = 15, σ = 42 bachophile: Median = 8, Mean = 26, σ = 69 big al: Median = 16, Mean = 29, σ = 57 blondie: Median = 11, Mean = 18, σ = 23 brenda: Median = 7, Mean = 13, σ = 21 dolmansaxlil: Median = 14, Mean = 25, σ = 36 ivorythumper: Median = 6, Mean = 14, σ = 25 jon-nyc: Median = 6, Mean = 11, σ = 25 justme: Median = 5, Mean = 12, σ = 26 kathyk: Median = 13, Mean = 31, σ = 78 kenny: Median = 7, Mean = 16, σ = 33 kentcouncil: Median = 8, Mean = 16, σ = 26 musicasacra: Median = 8, Mean = 21, σ = 45 pianojerome: Median = 7, Mean = 20, σ = 50 plays88keys: Median = 10, Mean = 18, σ = 26 sue: Median = 10, Mean = 14, σ = 14 |
| Trifonov Fleisher Klaus Sokolov Zimmerman | |
![]() |
|
| Klaus | Dec 23 2014, 02:06 PM Post #2 |
![]()
HOLY CARP!!!
|
The same data, but excluding posts that contain two words or less. ***musical princess***: Median = 8, Mean = 13, σ = 24 1hp: Median = 16, Mean = 25, σ = 28 AlbertaCrude: Median = 10, Mean = 21, σ = 44 Aqua Letifer: Median = 13, Mean = 22, σ = 30 Axtremus: Median = 10, Mean = 16, σ = 19 Copper: Median = 9, Mean = 13, σ = 13 CrashTest: Median = 9, Mean = 15, σ = 21 Daniel\: Median = 11, Mean = 25, σ = 79 Dewey: Median = 17, Mean = 52, σ = 100 DivaDeb: Median = 12, Mean = 21, σ = 31 Fizzygirl: Median = 9, Mean = 17, σ = 24 Frank_W: Median = 9, Mean = 15, σ = 20 George K: Median = 11, Mean = 35, σ = 68 Horace: Median = 12, Mean = 19, σ = 19 Improviso: Median = 9, Mean = 18, σ = 31 JBryan: Median = 8, Mean = 15, σ = 36 Jack Frost: Median = 9, Mean = 26, σ = 82 John D'Oh: Median = 12, Mean = 17, σ = 16 Jolly: Median = 11, Mean = 29, σ = 80 Kincaid: Median = 11, Mean = 19, σ = 30 Klaus: Median = 13, Mean = 20, σ = 38 KlavierBauer: Median = 16, Mean = 27, σ = 30 LWpianistin: Median = 9, Mean = 13, σ = 16 Larry: Median = 11, Mean = 26, σ = 53 Luke's Dad: Median = 10, Mean = 17, σ = 21 Mark: Median = 8, Mean = 16, σ = 29 Mikhailoh: Median = 8, Mean = 13, σ = 25 Nobody's Sock: Median = 11, Mean = 21, σ = 30 OperaTenor: Median = 9, Mean = 18, σ = 47 Optimistic: Median = 10, Mean = 16, σ = 25 Phlebas: Median = 9, Mean = 19, σ = 40 Piano*Dad: Median = 10, Mean = 20, σ = 31 QuirtEvans: Median = 17, Mean = 31, σ = 62 Red Rice: Median = 9, Mean = 14, σ = 17 Renauda: Median = 9, Mean = 14, σ = 20 Rick Zimmer: Median = 31, Mean = 56, σ = 84 Riley: Median = 8, Mean = 17, σ = 37 RosemaryTwo: Median = 11, Mean = 16, σ = 20 Steve Miller: Median = 11, Mean = 23, σ = 46 The 89th Key: Median = 10, Mean = 17, σ = 35 TomK: Median = 10, Mean = 20, σ = 41 VPG: Median = 11, Mean = 16, σ = 23 apple: Median = 8, Mean = 19, σ = 47 bachophile: Median = 11, Mean = 31, σ = 75 big al: Median = 17, Mean = 30, σ = 58 blondie: Median = 12, Mean = 20, σ = 24 brenda: Median = 8, Mean = 15, σ = 22 dolmansaxlil: Median = 16, Mean = 28, σ = 37 ivorythumper: Median = 9, Mean = 18, σ = 27 jon-nyc: Median = 8, Mean = 13, σ = 28 justme: Median = 9, Mean = 17, σ = 31 kathyk: Median = 16, Mean = 35, σ = 82 kenny: Median = 9, Mean = 20, σ = 36 kentcouncil: Median = 10, Mean = 19, σ = 27 musicasacra: Median = 11, Mean = 27, σ = 50 pianojerome: Median = 14, Mean = 29, σ = 58 plays88keys: Median = 12, Mean = 21, σ = 27 sue: Median = 11, Mean = 16, σ = 14 |
| Trifonov Fleisher Klaus Sokolov Zimmerman | |
![]() |
|
| Klaus | Dec 23 2014, 02:13 PM Post #3 |
![]()
HOLY CARP!!!
|
Next up: A ranking of which users curse the most. Any guesses who will be in the top spot? |
| Trifonov Fleisher Klaus Sokolov Zimmerman | |
![]() |
|
| George K | Dec 23 2014, 02:13 PM Post #4 |
|
Finally
|
You're like a kid at a candy store, aren't you? (10 words) |
|
A guide to GKSR: Click "Now look here, you Baltic gas passer... " - Mik, 6/14/08 Nothing is as effective as homeopathy. I'd rather listen to an hour of Abba than an hour of The Beatles. - Klaus, 4/29/18 | |
![]() |
|
| Moonbat | Dec 23 2014, 02:27 PM Post #5 |
![]()
Pisa-Carp
|
You realize you're screwing up your own future statistics. Btw. I'm now |
| Entia non sunt multiplicanda praeter necessitatem | |
![]() |
|
| jon-nyc | Dec 23 2014, 02:29 PM Post #6 |
|
Cheers
|
Are you excluding quotes here? Not that it's foolproof. Quirt and Rick Zimmer often posts article text directly, not in quotes, which is probably why they beat Dewey. |
| In my defense, I was left unsupervised. | |
![]() |
|
| Klaus | Dec 23 2014, 02:33 PM Post #7 |
![]()
HOLY CARP!!!
|
Very cool! Are you scraping the data anew or are you using the zip file I uploaded on Dropbox? I'd love to have better input data that includes dates and contains the thread IDs, the quotes and internal structure of posts, and sequence of posts to a thread, but I was too lazy too incorporate that into my scraping engine (I used scrapy, by the way). If you have something better please give me your data ![]() With regard to screwing up future statistics, I have actually even considered to add a keyword to these posts that I can use to filter them out in future downloads of the forum ![]() By the way, although I have used Python and scrapy to download the forum, I've then switched to Haskell for the actual data processing because I became too tired of working in a dynamically typed language. |
| Trifonov Fleisher Klaus Sokolov Zimmerman | |
![]() |
|
| Klaus | Dec 23 2014, 02:33 PM Post #8 |
![]()
HOLY CARP!!!
|
Yes, I'm excluding quotes. Only "top-level" text is included. |
| Trifonov Fleisher Klaus Sokolov Zimmerman | |
![]() |
|
| Nobody's Sock | Dec 23 2014, 02:36 PM Post #9 |
![]()
Fulla-Carp
|
Jesus wept. |
| "Somewhere, something incredible is waiting to be known." | |
![]() |
|
| Moonbat | Dec 23 2014, 02:43 PM Post #10 |
![]()
Pisa-Carp
|
At the moment I'm just using your dropbox data (which looks incomplete to me - I can't believe I only said the word "Ivory" 32 times). I've written scrapers before but I think I always did it from scratch using urllib2/beautifulsoup/lxml might get round to writing my own scraper but for the moment I'm interested in the analysis am going to use tf-idf for the scoring. I'm doing it on my laptop and python is much slower than Haskell computing the tf-idfs for the entire corpus is taking ages. Damnit I have at least 3 projects to do over the holidays and a paper to write, this is not what I'm supposed to be doing! |
| Entia non sunt multiplicanda praeter necessitatem | |
![]() |
|
| jon-nyc | Dec 23 2014, 02:45 PM Post #11 |
|
Cheers
|
Klaus - why is Moonbat missing? He has more than 2k posts. |
| In my defense, I was left unsupervised. | |
![]() |
|
| Klaus | Dec 23 2014, 02:46 PM Post #12 |
![]()
HOLY CARP!!!
|
I just wondered why Moonbat is not in that list since he has more posts. Then I checked again and found out that the number of posts in my database is only 600727, hence I'm still missing around 50% of the data. I don't think this has a huge influence on the results I reported so far, but it is still annoying. Moonbat, I guess your Python wizardry is needed to fix that problem! The decisive code in my TNCR adaptation of scrapy is here. Maybe you can see the bug? |
| Trifonov Fleisher Klaus Sokolov Zimmerman | |
![]() |
|
| Klaus | Dec 23 2014, 02:49 PM Post #13 |
![]()
HOLY CARP!!!
|
Hey, we can turn this into an open source project on github
|
| Trifonov Fleisher Klaus Sokolov Zimmerman | |
![]() |
|
| George K | Dec 23 2014, 02:54 PM Post #14 |
|
Finally
|
Oh FFS, you're right. I'd like to participate, but I left my pocket protector in my locker - in college. |
|
A guide to GKSR: Click "Now look here, you Baltic gas passer... " - Mik, 6/14/08 Nothing is as effective as homeopathy. I'd rather listen to an hour of Abba than an hour of The Beatles. - Klaus, 4/29/18 | |
![]() |
|
| Klaus | Dec 23 2014, 02:59 PM Post #15 |
![]()
HOLY CARP!!!
|
After investigating a little, I have the impression that I only scraped the first page of each thread. Is it possible that half of the posts to TNCR are in page 2... of theads? |
| Trifonov Fleisher Klaus Sokolov Zimmerman | |
![]() |
|
| jon-nyc | Dec 23 2014, 03:00 PM Post #16 |
|
Cheers
|
Oh yes. Easily. |
| In my defense, I was left unsupervised. | |
![]() |
|
| Moonbat | Dec 23 2014, 03:01 PM Post #17 |
![]()
Pisa-Carp
|
That might explain why I am particularly effected. Edit: I guess it's because your scraper is only following links with 'forum' (even though it's also extracting from 'topic') so I guess you need to add topic to the following rule. rules = ( # Extract links matching 'category.php' (but not matching 'subsection.php') # and follow links from them (since no callback means follow=True by default). Rule(LinkExtractor(allow=('forum','topic'))), # Extract links matching 'item.php' and parse them with the spider's method parse_item Rule(LinkExtractor(allow=('topic')), callback='parse_topic'), ) Edited by Moonbat, Dec 23 2014, 03:06 PM.
|
| Entia non sunt multiplicanda praeter necessitatem | |
![]() |
|
| jon-nyc | Dec 23 2014, 03:02 PM Post #18 |
|
Cheers
|
Especially for some posters. Most of moonbats posts are in epic 20 pp threads with he and IT contributing almost all of the posts. |
| In my defense, I was left unsupervised. | |
![]() |
|
| Mikhailoh | Dec 23 2014, 03:08 PM Post #19 |
|
If you want trouble, find yourself a redhead
|
He's not the only one. |
|
Once in his life, every man is entitled to fall madly in love with a gorgeous redhead - Lucille Ball | |
![]() |
|
| Klaus | Dec 23 2014, 03:13 PM Post #20 |
![]()
HOLY CARP!!!
|
Ah of course. I assumed that links to page 2... also contain the keyword "forum", but obviously they do not. Thanks! |
| Trifonov Fleisher Klaus Sokolov Zimmerman | |
![]() |
|
| jon-nyc | Dec 23 2014, 03:36 PM Post #21 |
|
Cheers
|
And he's a fucking computer science professor. |
| In my defense, I was left unsupervised. | |
![]() |
|
| Aqua Letifer | Dec 23 2014, 03:42 PM Post #22 |
|
ZOOOOOM!
|
Don't go trying to up your average now, jon. Everybody knows I have the next one in the bag. |
| I cite irreconcilable differences. | |
![]() |
|
| Copper | Dec 23 2014, 04:26 PM Post #23 |
|
Shortstop
|
Maybe you could get at least a partial solution by increasing the number of posts per page. |
|
The Confederate soldier was peculiar in that he was ever ready to fight, but never ready to submit to the routine duty and discipline of the camp or the march. The soldiers were determined to be soldiers after their own notions, and do their duty, for the love of it, as they thought best. Carlton McCarthy | |
![]() |
|
| Axtremus | Dec 23 2014, 04:45 PM Post #24 |
|
HOLY CARP!!!
|
A Professor is not necessarily a practitioner. |
![]() |
|
| John D'Oh | Dec 23 2014, 04:56 PM Post #25 |
|
MAMIL
|
Are you talking about the computer science or the f*cking? |
| What do you mean "we", have you got a mouse in your pocket? | |
![]() |
|
| Go to Next Page | |
| « Previous Topic · The New Coffee Room · Next Topic » |
- Pages:
- 1
- 2












4:33 PM Jul 10