Page 2 of 3

Re: Basic Stylometry Beta (early access)

Posted: Sat Jun 06, 2015 3:15 am
by Peter Kirby
Very good feedback. Thanks. Glad just to know anyone's actually interested in using it, other than me. Those all sound like good features. And 'automated word discovery' could itself lead to increased accuracy and/or decreased subjectivity. Thanks again for this valuable feedback.

Also yes if I loaded the Greek on the back end you could (a) save the wait on the upload and (b) use a 'get' request with all the data referenced in the URL. Meaning you could share results by URL.

Re: Basic Stylometry Beta (early access)

Posted: Sat Jun 06, 2015 3:22 am
by Peter Kirby
The program right now is very raw. It has about 600 lines of Perl, written over 3 to 4 days or so. Point is, I am by no means opposed to putting more work into it.

Of course the biggest problems are more 'theoretical' or scientific than technical.... Ie, finding what techniques offer increased accuracy, better detection of unreliable results, and/or or allow results using less data.

Re: Basic Stylometry Beta (early access)

Posted: Sun Jun 07, 2015 10:01 pm
by Aleph One
Hey this is a tip that I'd venture other users could find useful:

Selected (highlighted) text in a program like Note/Wordpad can be unproblematically dragged right into any of the boxes in Peter's program.

To quickly select a single, entire line of text in (e.g.) Wordpad, triple click anywhere within it. (OR a single click in the empty margin to the left of the line's start will work too). This is especially helpful when adding the 'word formulas' you want into the program. Download and open Peter's "greek.txt" from this thread, then simply triple-click and drag each common word formula you want to use right into the appropriate boxes in the program. This method works especially well because each word formula (i.e. all the grammatical permutations of single base word) is on it's own (non word-wrapped) line, so even though a huge word formulas may appear to occupy many lines of text, triple-clicking will still select the entire word formula without problem.

And, as all should have learned in kindergarten: [Ctrl]+[A] is the keyboard shortcut for Select All! So adding a document into the program (when correctly formatted as Peter specified in this thread's original post, like "justin.txt" from the collection provided earlier in this thread, for example) is as easy as opening the text file, pressing [Ctrl]+[A], and dragging the highlighted body into your box-of-choice in the program.

If you narrow your program web browser window to just the left or right half of your monitor, and open a your text files in a Note/Wordpad window occupying the other half, simply clicking and dragging selections right into the program's boxes makes the whole process much faster and easier than you might expect!

This was just the way that worked best in my own case, of course, so YMMV!

Jeff

P.S.: So, what's this "Mac" word I keep hearing mean, again? :D

Re: Basic Stylometry Beta (early access)

Posted: Sun Jun 07, 2015 10:06 pm
by Peter Kirby
An advanced text editor like EditPlus is recommended for the time being. It has useful features such as "join"/"split" lines and the ability to 'pre-process' text in order to remove quotations, for example, with regular expressions. (I can share the regular expressions I'm using to remove quotes.)

https://www.editplus.com/

(IMO the best feature is really that EditPlus has never stalled, hung, or stuttered no matter how many MB's of text I throw into it...)

I agree with the general idea that Jeff is talking about. What I ended up doing is putting each author (or each sample) on a single line in my text files. When 'word wrap' is turned off in EditPlus, it becomes very easy to scroll through them and copy them.

And I haven't tried them personally but there are various extensions with form "Memory":

https://chrome.google.com/webstore/deta ... d?hl=en-US
https://chrome.google.com/webstore/deta ... fgno?hl=en

Thanks for sharing what you've learned here, Jeff.

Please let me know if you think of anything else.

(If the program is popular, I will want to spin up a new server--my blog becomes inaccessible if the program is busy churning through text!)

Re: Basic Stylometry Beta (early access)

Posted: Sun Jun 07, 2015 10:19 pm
by Aleph One
Peter Kirby wrote:I agree with the general idea that Jeff is talking about. What I ended up doing is putting each author (or each sample) on a single line in my text files.
Hell yea heh! :idea: That's a great idea! Put each document as one long line in your text file (along with the common word formulas and all, if you wish) and shuffling them in and out of the program to compare possibilities and different combinations would be quite easy.

Another potentially helpful tip I've found is that (at least in Chrome), you can right-click on the tab-header and use "Duplicate tab" in order to make an additional copy of whatever state you have program in right then (in other words, any text within the boxes IS duly duplicated, along with the program tab). And I don't think this causes any problems for the program's functioning (as far as I can tell).

Re: Basic Stylometry Beta (early access)

Posted: Sun Jun 07, 2015 10:25 pm
by Peter Kirby
Peter Kirby wrote:(I can share the regular expressions I'm using to remove quotes.)
Put everything on a single line first!

This removes anything in fancy curly double quotes.

Code: Select all

“[^”]*”
This removes anything in fancy curly single quotes.

Code: Select all

‘[^’]*’
This removes anything in Greek-style / German-style quotes. Apply in this order. (this is the most common type in the TLG)

Code: Select all

«[^»]*»

Code: Select all

»[^»]*»
This removes anything in <diamond-shaped brackets>.

Code: Select all

<[^>]*>
This removes anything in [square brackets]. Maybe I could have done something simpler that works in EditPlus but IDK...

Code: Select all

\[[ A-Za-z0-9\\\/\(\)\*\|\.\:\;\,\–\=\+\'\{\}\"\<\>\“\”\«\»\?\!\&\†\#]*\]
(If you understand regular expressions and are wondering why I don't just use "lazy" regular expressions, EditPlus doesn't seem to have that feature.)

This removes any stray non-Greek text that is occasionally found in the TLG (with lower case letters). Enable 'case sensitive' first!

Code: Select all

[A-Z]*[a-z][A-Za-z]*
If it's the TLG/Diogenes that you don't have, the link is in the OP....
Another potentially helpful tip I've found is that (at least in Chrome), you can right-click on the tab-header and use "Duplicate tab" in order to make an additional copy of whatever state you have program in right then (in other words, any text within the boxes IS duly duplicated, along with the program tab). And I don't think this causes any problems for the program's functioning (as far as I can tell).
Good one. I didn't know that.

Re: Basic Stylometry Beta (early access)

Posted: Mon Jun 08, 2015 6:34 pm
by Ben C. Smith
Peter, do you happen to have more text files available with Greek samples, like those samples you provided up the thread of Josephus, Origen, Mark, and Justin?

Re: Basic Stylometry Beta (early access)

Posted: Mon Jun 08, 2015 6:45 pm
by Peter Kirby
Ben C. Smith wrote:Peter, do you happen to have more text files available with Greek samples, like those samples you provided up the thread of Josephus, Origen, Mark, and Justin?
Yes. I will prepare a nice set of them, in a single text file formatted one to a line.

Re: Basic Stylometry Beta (early access)

Posted: Mon Jun 08, 2015 6:46 pm
by Ben C. Smith
Peter Kirby wrote:
Ben C. Smith wrote:Peter, do you happen to have more text files available with Greek samples, like those samples you provided up the thread of Josephus, Origen, Mark, and Justin?
Yes. I will prepare a nice set of them, in a single text file formatted one to a line.
That would be most helpful! Thanks.

Ben.

Re: Basic Stylometry Beta (early access)

Posted: Tue Jun 09, 2015 10:20 am
by Peter Kirby
Ben C. Smith wrote:
Peter Kirby wrote:
Ben C. Smith wrote:Peter, do you happen to have more text files available with Greek samples, like those samples you provided up the thread of Josephus, Origen, Mark, and Justin?
Yes. I will prepare a nice set of them, in a single text file formatted one to a line.
That would be most helpful! Thanks.

Ben.
Okay, here it is. :thumbup: