Conversation with #inferno at Fri Mar 19 14:42:18 2010 on powerman-asdf@irc.freenode.net (irc) (14:57:52) visof [~visof@41.238.235.23] entered the room. (15:03:48) bvalek2 left the room (quit: Ping timeout: 252 seconds). (15:04:41) mkn left the room (quit: Disconnected by services). (15:19:27) vsrinivas [~me@c-68-50-218-93.hsd1.md.comcast.net] entered the room. (15:35:21) bvalek2 [~c11a2f4b@gateway/web/freenode/x-gerunlqltewrnypa] entered the room. (15:38:01) visof left the room (quit: Remote host closed the connection). (15:59:51) bvalek2 left the room (quit: Quit: Page closed). (16:03:12) mkn [~7c7cdbfa@gateway/web/freenode/x-yqsfhwtuqfzorpjs] entered the room. (16:11:35) robot12 left the room (quit: Quit: Ухожу я от вас (xchat 2.4.5 или старше)). (16:50:19) mennis left the room (quit: Quit: mennis). (16:51:31) mennis [~mennis@adsl-068-016-104-079.sip.asm.bellsouth.net] entered the room. (16:55:28) less1 [~pravin@32.97.110.63] entered the room. (17:41:31) mennis left the room (quit: Quit: mennis). (17:43:29) mennis [~mennis@adsl-068-016-104-079.sip.asm.bellsouth.net] entered the room. (17:43:45) wrtp left the room (quit: Quit: wrtp). (17:45:17) wrtp [~rog@89.241.218.64] entered the room. (18:02:37) mennis left the room (quit: Quit: mennis). (18:22:59) mkn left the room. (18:55:20) mennis [~mennis@adsl-068-016-104-079.sip.asm.bellsouth.net] entered the room. (20:46:53) mennis left the room (quit: Remote host closed the connection). (20:47:00) mennis [~mennis@adsl-068-016-104-079.sip.asm.bellsouth.net] entered the room. (22:11:49) jas left the room (quit: *.net *.split). (22:11:50) mjl- left the room (quit: *.net *.split). (22:14:30) jas [~jas@adsl-69-215-39-41.dsl.chcgil.ameritech.net] entered the room. (22:15:10) mjl- [~mjl@82-171-55-43.ip.telfort.nl] entered the room. (22:50:56) adelfino [~username@201-212-160-19.net.prima.net.ar] entered the room. (22:51:22) adelfino left the room. (23:10:22) mennis left the room (quit: Quit: mennis). (23:34:49) adelfino [~username@201-212-160-19.net.prima.net.ar] entered the room. (23:57:29) anth_x left the room (quit: Quit: Leaving.). (00:10:45) anth_x [~a@adsl-99-25-148-5.dsl.bcvloh.sbcglobal.net] entered the room. (01:05:06) less1 left the room (quit: Ping timeout: 248 seconds). (01:15:45) adelfino left the room (quit: Quit: Leaving). (01:16:28) less1 [~pravin@cpe-66-68-151-36.austin.res.rr.com] entered the room. (01:25:14) less1 left the room (quit: Quit: Leaving.). (01:52:28) wrtp left the room (quit: Quit: wrtp). (02:13:04) digi9 [none@csplan9.rit.edu] entered the room. (02:58:23) less1 [~pravin@cpe-66-68-151-36.austin.res.rr.com] entered the room. (08:45:39) rapidfx [~host666@vl-cen-ce1.avtlg.ru] entered the room. (08:53:44) mkn [~7aaebec6@gateway/web/freenode/x-xezwkywykhniingi] entered the room. (09:00:44) mkn_ [~7aaebec6@gateway/web/freenode/x-svzfpgkfqaloaobe] entered the room. (09:02:32) mkn left the room (quit: Ping timeout: 252 seconds). (09:07:18) mkn_ left the room (quit: Ping timeout: 252 seconds). (09:07:31) less1 left the room (quit: Quit: Leaving.). (11:44:12) wrtp [~rog@89.241.218.64] entered the room. (12:36:47) mjl-: üüüüüüüüüüüüüüDDDD››› (12:36:52) mjl-: yay cat (12:38:32) mjl-: ºÐ (13:12:18) mkn [~7aaebec6@gateway/web/freenode/x-bfhlwdbyhwzrxkpx] entered the room. (13:12:47) powerman: mjl-: did you tried 'exceptions notifyleader'? is it used in any production code? (13:13:51) powerman: in last days I've seen at least two bugs somewhere near it, in both cases emu crashed: one way is because of segfault, second way is because of 'panic: lost pid' (13:15:57) powerman: in both cases there race condition :( which makes harder to develop short code example (13:45:14) mkn left the room (quit: Ping timeout: 252 seconds). (13:54:33) wrtp left the room (quit: Quit: wrtp). (14:26:11) wrtp [~rog@89.241.218.64] entered the room. (14:27:57) wrtp left the room (quit: Client Quit). (14:52:30) visof [~visof@41.238.235.23] entered the room. (15:52:09) visof left the room (quit: Remote host closed the connection). (17:00:08) wrtp [~rog@89.241.218.64] entered the room. (17:22:28) less1 [~pravin@cpe-66-68-151-36.austin.res.rr.com] entered the room. (18:11:14) powerman: inferno's regex is awful (18:11:40) ***powerman just had to port few perl's regex (18:12:19) anth_x: ?!? (18:12:25) anth_x: what did you need? (18:12:49) powerman: /^.*?\b(?:Evs?|Evens?)\b.*?$/ (18:12:54) powerman: simple thing (18:13:01) powerman: turned into (18:13:12) powerman: "^(.*[^a-zA-Z0-9_])?([eE][vV][sS]?|[eE][vV][eE][nN][sS]?)([^a-zA-Z0-9_].*)?$" (18:13:19) powerman: that's just sucks (18:17:02) powerman: also, lack of search&replace in regex(2) is annoying... but this probably can be implemented in some way (18:17:58) anth_x: are you sure those strings are equivalent? the paren's don't seem to bind the same way. (18:18:51) powerman: parens ain't important - in perl's regex you can see non-capturing parens (?:...), only needed thing is match/not match (18:19:49) powerman: incoming data contain this crap so there is no other choice than just prepare regexp to match all possible data (18:47:19) wrtp left the room (quit: Quit: wrtp). (19:44:36) rapidfx left the room (quit: Quit: Leaving.). (19:50:17) uriel: powerman-asdf: PCRE are an abominable spawn from hell that have totally destroyed the regexp implementations in almost every known language (19:51:48) uriel: and actually, i find the inferno (ie., standard sane regexps) version of the regexp you pasted much more clear and simple to understand (19:52:31) uriel: (perhaps a way to use case-insensitive regexps would help, but it is not a big deal) (19:59:39) less1 left the room (quit: Ping timeout: 240 seconds). (20:14:25) less1 [~pravin@cpe-66-68-151-36.austin.res.rr.com] entered the room. (23:35:49) adelfino [~username@201-212-160-19.net.prima.net.ar] entered the room. (23:44:46) adelfino left the room (quit: Quit: Leaving). (00:33:32) izaki [~izaki@89.211.106.214] entered the room. (00:52:49) izaki left the room (quit: Quit: izaki). (01:09:41) mjl-_ [~mjl@82-171-55-43.ip.telfort.nl] entered the room. (01:13:58) mjl- left the room (quit: *.net *.split). (02:10:53) izaki [~izaki@89.211.106.214] entered the room. (02:44:03) izaki left the room. (03:19:59) powerman: uriel: i know about pcre. actually, I'd like to see rsc's re2 library linked with inferno kernel as driver with limbo module interface just like $Sys (03:22:18) anth_x: i'm not sure anyone's ever gotten that to work with a c++ lib. (03:22:32) anth_x: (mind you, i'm not sure anyone's ever *tried*) (03:24:14) uriel: powerman-asdf: my problem with PCRE goes way beyond the implementation, the syntax is hideous and nauseating (03:24:53) uriel: re2 only fixes the implementation issues, which while important, to me are less important and clear and simple syntax (03:26:37) anth_x: re2 also provides the more "traditional" interface. (03:27:03) powerman: uriel: i've tried two ways to work with text in limbo, just to see how it's going - with and without regexp. and you can easily compare syntax and other complexity using these examples. first one: limbo - http://pastebin.com/2t8Z3DKD and perl - http://pastebin.com/qpgcZxNZ (03:28:16) uriel: powerman-asdf: you could just call sed ;) (03:29:45) powerman: second one, limbo - http://pastebin.com/M3b1UugC and perl - http://pastebin.com/0ZRfbiit (03:30:34) powerman: i think no one can honestly say limbo version is more simple and readable than perl one, even with all pcre disadvantages (03:30:53) uriel: I'd use awk for that (03:31:29) powerman: and I'll be overwhelmingly grateful if someone show me how to solve these tasks in limbo in simpler way - using sed, or anything (03:32:19) uriel: btw, aren't there functions to trim spaces and so on? (03:32:38) powerman: with support for  ? don't think so... (03:32:56) uriel: you can do the   elsewhere (03:33:23) uriel: the sed regexps are almost identical to the perl ones for your first problem, and would fit nicely in a single line (03:33:45) uriel: (you would have to make the \s explicit, but that is just two extra chars) (03:34:01) uriel: as for the second problem, seems obvious and trivial enough in awk (03:34:06) powerman: in same ugly way, I suppose... for now, only distress with inferno for me is text processing (03:34:58) uriel: limbo is not a text processing language, it is more friendly to text processing than say C (03:35:02) uriel: but that is about it (03:35:31) powerman: sadly, but I didn't know sed/awk deep enough. maybe you can call me "infected by perl" :) (03:35:33) uriel: and I don't see what is ugly about sed (03:35:40) uriel: man! (03:35:42) uriel: :( (03:36:13) powerman: i used both many years ago, before I start using perl (03:36:57) anth_x: powerman-asdf: i've had similar issues in the past. we could do with some good libs for encapsulating the creation of the regexp, passing the string, and handling the result. (03:37:09) uriel: sed 's/ / /g; s/^ *(.*) *$/\1/; s/ +/ /g' (03:37:10) uriel: done (03:37:33) anth_x: on the format of the actual regexps, i'm less convinced. perl's have more options, but i think most inhibit, rather than assist, readability. (03:37:38) uriel: (I combined the two middle regexps, just for fun, you can do it separately too) (03:38:00) anth_x: folding case is probably something we could do better that'd increase readability. (03:38:15) anth_x: but i find work boundry stuff to be better written explicitly. (03:38:42) powerman: uriel: cool. now I need this applied to limbo variable. I suppose there should be way to load sed/awk as limbo modules and call them without doing ugly things like pipe to stdin/stdout just to provide my data and get result back..? (03:39:04) uriel: powerman-asdf: I doubt it (03:39:04) anth_x: my biggest question, of course, is what the heck are "Italian odds"? ;-) (03:39:25) uriel: but still, I think limbo just seems like the wrong tool for what you are doing, just write it in sh (03:40:01) anth_x: powerman-asdf: you can call sh stuff (and thus sed) in a not-too-bad way. (03:40:24) powerman: anth_x: Italian odds just sucks :) when we'll have to parse site with odds in this format, we'll have additional issues with autodetecting odds format just because they too much like american odds (03:41:00) anth_x: we don't have awk (unless i missed that, which would be awesome), and i don't know how you feel about going to the host OS. (03:41:16) anth_x: are these formats described anywhere? i'm curious, now. (03:41:29) powerman: uriel: this data mangling is part of large task, which involve enough file2chan and adt's to make it too hard to write in sh... and sh is toooo slow (03:41:32) uriel: sh is probably just as good as awk for this kind of stuff (03:42:28) anth_x: i would spend a little bit looking at calling the shell module from your program and using sed. (03:42:30) powerman: anth_x: i feed good about going to the host OS - but not for each time I need to do something with string more complex than just concatenation (03:42:57) anth_x: fair enough. (03:53:25) anth_x: now i'm reading on odds format. american seems pretty stupid to me. i can't find a description of italian. (03:53:34) anth_x: clearly i've not done nearly enough international gambling. (03:57:42) powerman: anth_x: we're providing service for calculating arbitrages. that mean parse bets from ~100+ bookmakers every minute, do a lot of combinatorics and calculations, and provide arbs to members. there cluster of servers and a lot of services (now - mix of perl and inferno ones, hope in future it will be more inferno and less perl) (04:01:34) powerman: i think parsers itself can't be implemented in limbo at all - too much complex regexps, but i'd like to replace all other parts of system with limbo applications and switch from plain tcp+json to 9p (04:03:08) powerman: (sadly, but emu crashes make moving a lot of production services to inferno questionable, but I wanna try, starting from less critical services first) (04:05:55) powerman: in perl we have to use a lot of non-blocking i/o, epoll and event-based programming. event-based applications are much more complex than equivalent apps in limbo with threads and channels - that one of main reasons to go. another one is i'm just tired to see how perl devour tens megabytes of memory for simple tasks. (04:08:32) uriel: you could switch to Go ;) (04:08:36) powerman: inferno sh is just too slow for our tasks. so while something can be simpler to implement using sh, we usually have to use limbo instead. (04:09:02) anth_x: i agree, sh's performance is pretty disapointing. (04:09:13) anth_x: especially when i'm used to being able to write so much so quickly in rc. (04:09:45) uriel: rc is a work of pure sublime beauty (04:10:02) powerman: Go... I doesn't checked it too deeply to make such decision. At a glance, it looks like 'limited limbo for poor mans who unable to use inferno'. Maybe this first impression is just plain wrong. (04:10:32) uriel: in my limited experience I found Go much more polished and better documented than Inferno (04:12:04) uriel: as a language, it has some neat improvements over Limbo (although the loss of tuples can be a bit annoying at first) (04:12:10) anth_x: they're different things. go's a great language, but you don't get to make the same sorts of assumptions about the environment you do with inferno. (04:12:20) uriel: and the libraries are *much* more finished and in much better shape (04:12:46) uriel: but yea, you obviously don't get the whole Inferno environment (04:12:47) powerman: inferno and dis provide some environment, which is absent in Go, afaik. i mean namespaces, mount, bind, rstyx, remote debug, etc. or i'm too ignorant about this and should go learn Go :) instead of saying crap in irc? (04:12:59) uriel: (which can be a curse or a blesing depending on the task) (04:13:16) uriel: powerman-asdf: no, you are completely correct (04:13:53) uriel: inferno is great, but there seems to always be some friction when interfacing with the outside world... (04:17:38) powerman: limbo, compared to perl, give me more low-level control over what's going on (like about allocating memory) and ease to use threads - using nearly same amount of lines of code as in perl. strict type checking doesn't annoy me much (but I noticed it can make crazy other perl developers I work with), I even found it a little funny. :) (04:20:31) powerman: inferno, compared to linux, give me *much* simpler environment with nearly same features (minus symlinks, unix sockets, some other less used things). to write reliable apps in linux I had to know Stevens APUE book and huge amount of other things. all available docs for inferno can be read in about 4-5 days, including most of man pages. that's just amazing! (04:21:06) uriel: heh, that is a good point (04:21:26) anth_x: yup, i agree. having the consistent (internally and portably) environment is a big win. (04:21:29) powerman: so, inferno+limbo compared to linux+perl give me ability to develop much simpler apps in much simpler environment with equivalent functionality as linux/perl scripts (04:24:09) powerman: keeping in mind our current system are fairly large (I work on it for about 8 years, with some other developers helping me), and I need to keep it simple at all costs or it will not be reliable and secure anymore... I think switching to inferno/limbo is reasonable enough. And I'm not sure Go is equivalent to inferno/limbo in this case. (04:27:19) powerman: actually, even with continue development in perl, just learning inferno gives me huge advantage - I start developing in perl in completely different way, with much simpler and cleaner code, plus I've changes overall project architecture to use more simple services (we already was on this way, but after learning inferno amount and simplicity of services was greatly increased) (04:30:01) powerman: back to topic, for now only annoying thing in limbo is lack of flexible and simple enough text processing tools. using sh/sed doesn't looks like a real choice. probably implementing some lib on top of existing regex(2) will helps. but something like powerful regexp like re2 or even pcre will helps much more. :( (04:30:08) anth_x: absolutely. the concurrency/CSP stuff has helped me think about problems even in domains where i can't apply it directly. (04:31:28) anth_x: do you really think it's just about the regexp support itself? setting aside case folding for the moment, it looks more like limbo's problem in those examples is that perl's got good primitives for the sorts of substitutions you're doing. (04:32:39) anth_x: limbo doesn't have anything which encapsulates the whole operation as well. (04:32:56) anth_x: also: are italian odds just american odds but always in the + format, without the sign? (04:33:10) powerman: such primitives can be implemented as usual limbo module. for example, search&replace with support for $1,$2,... in replace part (i.e. not backtracking) should not be too complex to implement using wrapper function for regex(2) (04:34:19) anth_x: i agree. i'm just saying i don't think your *primary* issue here is regexp(2) vs pcre/re2. (04:34:23) powerman: anth_x: american positive bets sometimes lack + sign, thus making them looks too much like italian. anyway, I'll bother about this issue later. (04:35:36) anth_x: s'okay with me. i learned something new! (04:36:00) powerman: case-insensitive regex also can be implemented as wrapper - just parse regexp string and replace every a-zA-Z outside of character class "[...]" by [aA] - I think this should work for most, if not all cases. and this is syntax sugar anyway (04:40:08) powerman: i wonder is there another (simpler) way for these tasks, maybe something using existing functions like sys->tokenize or functions from string(2), but I don't see it (04:42:48) powerman: it's too late here, and I have to sleep a little. bye! (08:17:43) vsrinivas left the room (quit: Ping timeout: 260 seconds). (08:19:38) vsrinivas [~me@c-68-50-218-93.hsd1.md.comcast.net] entered the room. (08:43:14) less1 left the room (quit: Quit: Leaving.). (09:59:25) acmeuser [~acmeuser@z202.124-44-37.ppp.wakwak.ne.jp] entered the room. (09:59:36) acmeuser left the room (quit: Read error: Connection reset by peer). (10:49:31) eno left the room (quit: Ping timeout: 276 seconds). (10:50:30) eno [~eno@nslu2-linux/eno] entered the room. (12:19:13) powerman: forget that case-insensitive regex crap I said yesterday. just use str->tolower() before regex (12:30:50) wrtp [~rog@89.241.218.64] entered the room. (12:34:02) powerman: wrtp: roger, I think you may know answer to this question :) is it possible to link rsc's re2 library (in c++) with inferno, to access it from limbo? (12:51:19) C-Keen: powerman-asdf: you can always make a C wrapper around the C++ interface and call that from limbo (12:52:34) powerman: C-Keen: thanks. i suppose so, but I've no C++ experience and very limited C, so I'd like to know task is solvable at all before trying (13:07:12) wrtp: poweer (13:07:15) wrtp: oops (13:07:36) wrtp: powerman-asdf: i think it might not be so easy... (13:07:59) wrtp: powerman-asdf: C-Keen's approach is probably the right one (13:11:08) powerman: wrtp: why it shouldn't be easy? i've some experience implementing perl XS modules to wrap access to system calls and C libs - it was trivial enough when you have any working example (13:12:13) powerman: i suppose it's not a problem to find examples how to wrap C++ lib in C, even without real C++ knowledge, and then write usual inferno driver in C using this wrapper (13:12:55) powerman: probably driver itself may be such a wrapper for C++ code (13:16:09) powerman: I see re2 install libre2.a - is static linking with c++ lib will be a problem? I have to implement .so wrapper? (13:17:49) C-Keen: powerman-asdf: http://developers.sun.com/solaris/articles/mixing.html (13:18:11) powerman: C-Keen: thanks (13:18:37) C-Keen: what you do is: write a small C++ library that exports the original lib api via C calling conventions (13:18:45) C-Keen: build a static lib of that (13:18:55) C-Keen: use that lib in your program (13:20:05) C-Keen: you just need to take care of the different memory allocations (memory created with new should be deleted with delete and not mixed with free() for example) (13:20:29) C-Keen: also you will need the libstdc++ for this, which might be a show stopper (13:21:10) C-Keen: (was that the right name? I was talking about the C++ runtime lib) (15:44:12) adelfino [~username@201-212-160-19.net.prima.net.ar] entered the room. (16:03:43) less1 [~pravin@cpe-66-68-151-36.austin.res.rr.com] entered the room. (16:16:14) wrtp left the room (quit: Quit: wrtp). (16:17:53) EthanG left the room (quit: Quit: *foompf*). (16:23:30) powerman: C-Keen: it works :) (16:24:13) powerman: emu and emu-g linked with libstdc++ and libpthreads, with wrapper lib to re2 (16:25:18) powerman: limbo module implementation is truly minimalistic, but it call real function from re2 using my wrapper :) (16:25:18) powerman: *f->ret = PartialMatchN("axbyc", "a.*b.*c"); (16:26:26) powerman: now we've to do not so funny part - implement wrappers for most of re2 functions and find a way how to safely convert data between limbo-c-c++ types (16:27:30) powerman: thanks caerwyn for lab 92! (16:47:27) adelfino left the room (quit: Quit: Leaving). (16:58:37) less1 left the room (quit: Quit: Leaving.). (17:41:43) acmeuser [~acmeuser@95.79.193.25] entered the room. (17:41:58) acmeuser left the room. (18:11:03) anth_x: powerman-asdf: that's awesome! (18:11:23) anth_x: if you haven't, you should mail inferno-list (and maybe russ). (18:12:27) mennis [~mennis@adsl-065-012-170-146.sip.asm.bellsouth.net] entered the room. (18:13:50) mennis left the room (quit: Remote host closed the connection). (18:13:58) mennis [~mennis@adsl-065-012-170-146.sip.asm.bellsouth.net] entered the room. (18:30:26) less1 [~pravin@cpe-66-68-151-36.austin.res.rr.com] entered the room. (18:47:58) mkmks [~nf@90-230-91-71-no148.tbcn.telia.com] entered the room. (18:51:39) less1 left the room (quit: Quit: Leaving.). (19:02:47) EthanG [~EthanG@sourcemage/guru/eekee] entered the room. (20:19:05) wrtp [~rog@89.241.218.64] entered the room. (20:31:16) vsrinivas is now known as me__|old (20:32:30) powerman: anth_x, C-Keen, wrtp: ok, here it is: http://code.google.com/p/inferno-re2/ (20:34:19) powerman: i'm afraid it's full of newbie bugs and memory leaks, but I think it's better to do something than do nothing (20:34:37) powerman: so you now have one more thing to criticize/improve :) (20:34:53) me__|old: powerman-asdf: neat (20:40:34) eno left the room (quit: Ping timeout: 264 seconds). (20:41:43) powerman: i'll now update Gentoo ebuild for inferno to include this driver if built with USE=re2 (20:42:19) eno [~eno@nslu2-linux/eno] entered the room. (20:50:50) less1 [~pravin@cpe-66-68-151-36.austin.res.rr.com] entered the room. (20:56:05) less1 left the room (quit: Quit: Leaving.). (21:01:51) uriel: powerman-asdf: very cool! make sure to post about it to 9times (21:03:43) powerman: uriel: me? you joking. not with my English. if you like - post yourself. I'll better post in Russian somewhere. :) (21:04:52) powerman: and I prefer to wait for feedback from at least one person who know C&C++ about quality of my code before making any posts :) (21:05:24) uriel: powerman-asdf: your english is as good as mine ;P (21:05:34) uriel: just write up a couple of sentences about it (21:11:48) less1 [~pravin@32.97.110.64] entered the room. (21:23:05) less1 left the room (quit: Quit: Leaving.). (21:59:09) powerman: gentoo ebuilds for dev-libs/libre2 and updated to hg tip ebuild with optional re2 support for dev-inferno/inferno are ready, available as usually from standard overlay "powerman" (22:00:04) powerman: now, I can return to yesterday's work and try to clear some ugly code using re2 library. huh. (22:45:30) olegfink: hm, isn't re2 written in c++? (22:48:20) anth_x: yup. he did the wrapper. (22:48:43) anth_x: that's why i thought it was so noteworthy. while it's always been theoretically possible, i'm not aware of anyone having actually done so before. (22:51:36) powerman: as far as I see, emu linked with libstdc++ and pthreads works just fine (22:51:53) powerman: probably only noticeable change is increased memory usage (22:53:19) powerman: about +150KB both main and heap (22:55:10) me__|old is now known as me___ (22:55:29) wrtp: powerman-asdf: the only thing i'd note from a totally cursory look at the interface is that it's definitely worth separating re compilation from matching - re2 is much much faster at matching than compiling (22:55:52) powerman: wrtp: of course (22:56:41) powerman: it's just PoC - one function, but with ability to send params from limbo to re2 and recv reply, with all type conversions (23:29:52) powerman: btw, is it possible in kernel module to detect some variable (previously returned from this module) was garbage-collected? (23:31:00) powerman: after compiling regexp I'll have RE2 object in C++ wrapper lib, but I doubt it's possible to deliver it to limbo and then back to re2 (23:31:39) powerman: chances are only way is cache it inside C++ wrapper lib, and return some, say, integer id for cached compiled regex (23:32:44) powerman: but it's better to avoid requirement for limbo code to not forget to call free_compiled_regex(this_int_id) after stop using this regex (23:33:39) powerman: so, if there a way for kernel module to detect free of this_int_id variable, this can helps a lot (00:08:15) wrtp left the room (quit: Quit: wrtp). (01:00:31) powerman: ok, i managed to hook on gc, question not actual anymore :) (01:09:55) powerman: one more stupid question: I always think acquire() should lock some resource, and release() should unlock it. and critical code part in usual threaded app should looks this way: "acquire(); critical_code(); release();". but in emu/port/inferno.c everything looks inverse - first release() then few lines of code later acquire() (01:11:03) powerman: this may have sense if all these functions in emu/port/inferno.c called after locking resource, and they need to release lock temporarily inside functions (01:11:30) powerman: or my understanding of acquire/release is completely broken :( (01:42:01) mkmks left the room ("ERC Version 5.3 (IRC client for Emacs)"). (02:30:38) powerman: ok, i see, functions really called with lock set. that has sense inside os kernel (02:50:51) adelfino [~username@201-212-160-19.net.prima.net.ar] entered the room. (03:09:45) adelfino left the room (quit: Remote host closed the connection). (03:40:55) me___ left the room (quit: Ping timeout: 260 seconds). (04:12:49) vs|birthday [~me@c-68-50-218-93.hsd1.md.comcast.net] entered the room. (04:23:01) anth_x: powerman-asdf: how did you manage the gc hook? (04:23:47) powerman: anth_x: just like $Sys do it for Sys->FD (04:25:31) vs|birthday: cute (04:27:49) powerman: probably i'm just too sleepy now, but I don't understand why free() in my code works :) (04:28:05) powerman: memory allocated in re2wrap c++ library (04:28:12) powerman: using system's malloc (04:28:18) vs|birthday: are you sure? (04:28:27) vs|birthday: the system malloc is a weak symbol. (04:28:28) powerman: and then free'd in my inferno driver re2 (04:28:57) powerman: free in inferno driver, I suppose, isn't system free(), but instead free() from emu/port/alloc.c (04:28:58) vs|birthday: so when you have malloc fn in your code, you will call yours instead of the system one. (04:29:04) vs|birthday: powerman-asdf: correct. (04:29:17) vs|birthday: but if your library is linked in, it uses emu/port/alloc.c:malloc too. (04:29:31) powerman: c++ lib wrapper built using g++... oh, I see what you mean (04:29:50) vs|birthday: weak symbols are horrifying but useful (04:30:08) powerman: yeah, now it has sense. ok, it's good to know (again) how my code works. :) (04:30:19) vs|birthday: :) (04:30:31) vs|birthday: :): (04:32:11) powerman: so, I can test for any sort of memory leaks using just /dev/memory. that's nice (04:36:51) vs|birthday: the inferno malloc has pretty good tracing facilities (04:37:08) vs|birthday: they're not exported to limbo atm, but it'd be possible (05:59:36) jas left the room (quit: Remote host closed the connection). (06:54:41) mkn [~7c7cdbfa@gateway/web/freenode/x-ugmjqytdlzfhoejw] entered the room. (07:24:29) rapidfx [~host666@vl-cen-ce1.avtlg.ru] entered the room. (07:26:13) anth_x: anyone know if caerwyn describes how he generated those "acme art" images anywhere? (07:29:15) anth_x: heh. googling for "acme art", i found this page: (07:29:15) anth_x: http://www.acme.org.uk/ (07:29:36) vs|birthday is now known as vs (07:29:51) anth_x: amusingly, the structure of the windows, levels of stone, and boarders did, in fact, remind me of acme. (07:34:48) jas [~jas@adsl-69-215-39-41.dsl.chcgil.ameritech.net] entered the room. (07:36:42) robot12 [~robot12@szhilkin.broker.freenet6.net] entered the room. (07:47:57) bvalek2 [~bela@unaffiliated/bvalek2] entered the room. (08:31:50) bvalek2 left the room (quit: Quit: I've seen things you people wouldn't believe). (09:45:37) wrtp [~rog@89.241.218.64] entered the room. (10:17:20) C-Keen: powerman-asdf: to get rid of the memory bugs: write a small c program that uses your lib and run it through valgrind, this will also show you places where you mixed new/malloc and friends (10:17:27) C-Keen: powerman-asdf: congratulations :) (10:25:47) bvalek2 [~c11a2f4d@gateway/web/freenode/x-grbtwandhchvtfng] entered the room. (10:39:43) wrtp: powerman-asdf: it should be possible to deliver it to limbo and then back to re2 (10:40:03) wrtp: powerman-asdf: for instance, the bignum stuff does that. (10:40:38) wrtp: powerman-asdf: you can tag hidden fields onto the end of an adt. (10:41:07) wrtp: powerman-asdf: the main thing you need to get right is the memory management. (12:01:24) powerman: wrtp: what is bignum stuff? (12:02:44) powerman: i don't see anything related to bignum - is it included with inferno distribution? (12:02:44) C-Keen: the number handling code in limbo (12:04:07) C-Keen: ahum (12:05:01) powerman: actually, i'm really not sure it's a good choice. RE2 object isn't just a simple thing with few fields, I suppose it contain complex syntax tree, a lot of options, etc. and probably not in "public" class interface (12:05:46) powerman: converting all this data to/from plain C struct promises to be real pain in the ass (12:08:43) C-Keen: all you need to provide as a C data type are the data going out. I don't know the RE2 interface so I cannot say how much work it is (12:09:00) C-Keen: (or going in of course) (12:09:29) C-Keen: the object's internal representation stays untouched (12:10:40) powerman: C-Keen: i don't get it. this is already done. but when we start talking about adding ability to compile regexp, we have to keep compiled regexp somewhere. compiled regexp returned as main RE2 objects, as far as I see. (12:11:05) C-Keen: Ah I see (12:11:06) mjl-_: libmp is the bignum stuff i suppose (12:11:47) mjl-_: powerman-asdf: isn't just saving a pointer (void* would even do) to the re2 main object enough? (12:12:29) mjl-_: so have a c struct on the limbo side that helps with the gc, and put a pointer to the whole re2 stuff in it as wrtp said (12:12:41) C-Keen: powerman-asdf: well I assume that this is just a handle to the stuff so, I would hand back pointer to a static object you keep in your wrapper and then for example a method match(string input) would come down to match( HRE2_OBJ obj, char*) (12:13:02) powerman: mjl-_: yeah, I think so. that's why I spend time yesterday trying to hook on gc - to be able to find out user stop using this compiled regexp and so my module should delete() it (12:14:04) mjl-_: ok (12:14:15) mjl-_: i found libinterp/keyring.c to be a good example (12:14:23) mjl-_: struct XBFstate (12:14:23) mjl-_: { Keyring_BFstate x; BFstate state; (12:14:24) mjl-_: }; (12:21:10) powerman: mjl-_: do you know why Keyring_blowfishsetup() begin with destroy()ing *f->ret? (12:24:16) mjl-_: i think it's because f->ret can contain a reference to an object (12:24:31) powerman: yeah, I see, sys->aprint doing same (12:24:37) mjl-_: yes (12:25:02) mjl-_: "ret" is like a register that holds a value. when you put something in a vm register you increase the refcount (12:25:27) powerman: but who will put something in ret before calling these functions? (12:25:36) mjl-_: if you wouldn't do that, the gc could free an object behind your back, when the only reference to it is in the vm registers (12:25:57) mjl-_: i don't know who puts something in ret (12:27:21) wrtp: powerman-asdf: sorry, when i said the bignum stuff, i meant IPint (12:27:58) wrtp: there should really be a writeup on how to do C stuff called from limbo (12:28:48) wrtp: powerman-asdf: you don't need to convert any of the re2 structures to a limbo adt - just treat it as opaque (12:29:28) wrtp: powerman-asdf: for bonus points, duplicate the regex module interface (12:30:10) wrtp: (except that the Re type becomes an opaque pointer to the underlying re2 data structure (12:30:11) wrtp: ) (12:35:30) powerman: wrtp: re2 doesn't send me start/end offsets, it return substrings instead. probably I can convert substring to offsets by searching for these substrings starting from same point in the string as regexp started... but looks like emulating regex(2) using re2(2) is task for separate pure-limbo module (12:36:46) powerman: sort of alternative implementation for same Regex interface (12:51:37) powerman: I think about how to avoid duplicating module interface (one bundle of functions which work with regex as string, and another one which work with regex as RE - compiled regex). (12:52:22) powerman: having match(s, "re") and cmatch(s, compiled_re) isn't nice (12:53:17) powerman: maybe it's possible to do this: match(s, RE("re")) and match(s, compiled_re)? (12:55:44) powerman: in adt RE instead of dummy "x: int;" and hidden state we'll have "re: string;" and hidden state. RE.re normally will be nil, and that will mean it's compiled regex and we should use hidden state (void* ref to RE2 object). if RE.re isn't nil, then it's mean user manually create instance of RE adt, and there no hidden state and we should use string instead of compiled regex (12:57:22) powerman: but I bother about freeing memory. limbo doesn't aware about hidden state, so if user manually create instance of adt with hidden state I suppose it will not contain hidden state at all. (13:00:48) mjl-_: true (13:00:53) mjl-_: i think you can check for that (13:01:03) powerman: the good news it will not be in same heap map (or whatever it called - I mean thing initialized by dtype()) and so my destructor (which free RE2 object) won't be called. but all other functions like match() should be aware about this - they will get adt RE as param, but this can be either adt created by limbo, which size is less than for adt create by my compile() (13:02:11) powerman: mjl-_: only way to distinguish I think is just if(f-re==H) (13:02:59) powerman: but if user do match(s, RE(nil)) this check will fail and I'll try to use memory outside of adt to access pointer to compiled regex, which doesn't exists (13:06:33) powerman: maybe it's better to avoid this by requiring all regex to be compiled first. if user bother about performance - he will compile anyway. if user doesn't - additional functional call to compile regex before using it doesn't matter (13:07:47) powerman: name function to compile regex shorter, something like "re" instead of "compile", import it, and it will works this way: match(s, re("regex")) (13:34:11) powerman: what about encoding in string2c and c2string? afaik limbo internally use UTF-16, and I expect string2c will convert it to UTF-8 (re2 need UTF-8). is that true? or string2c just convert it to char* but leave in UTF-16? (13:35:23) wrtp: powerman-asdf: i don't think you should try making it *exactly* the same module interface (13:35:54) wrtp: just that you should be able to recompile an existing regex-using module with re2 and not change the source (13:36:17) wrtp: so no need for an arena type (13:36:48) wrtp: (because it's private anyway (not mentioned in the regex man page) (13:36:51) wrtp: ) (13:38:31) wrtp: powerman-asdf: about hidden state: that's a problem that all C-implemented modules with hidden state face (Draw->Context, Sys->FD, Tk->Toplevel, etc etc) (13:38:48) wrtp: the way they deal with it is by checking that the type is the same (13:39:04) wrtp: that way a user can't forge a legitimate hidden-state adt (13:40:33) wrtp: for instance, look at checkdisplay(), checkscreen, etc in libinterp/draw.c (13:40:33) powerman: wrtp: you speak about something like this? (13:40:33) powerman: #define fdchk(x) ((x) == (Sys_FD*)H ? -1 : (x)->fd) (13:41:59) wrtp: hmm, one moment. it's a long time since i looked at this (13:42:32) powerman: oh, I see, you mean checks like that one: D2H(ds)->t != TScreen (13:42:37) wrtp: it's lookupdisplay you're after (13:42:46) wrtp: and the crucial line is: if(di == H || D2H(di)->t != TImage) (13:42:48) powerman: looks like Sys doesn't do it for Sys->FD (13:43:06) wrtp: i.e. it checks that the type descriptor is identical (13:43:31) wrtp: sys only needs to do it for Sys->FD if it uses the pgrp field. fd is inside the limbo-declared adt. (13:45:38) powerman: ok, thanks. but in this case I think it doesn't matter much is user will create adt manually with match(s, RE("...")) or user will use imported function to compile regex with match(s, re("...")) (13:47:12) wrtp: it does if the compiled representation of the re is not a string (13:47:25) powerman: it's same for user code readability and it's ease for module implementation to avoid checking adt types... hmm. i see, i anyway have to check adt type to protect against faking adt by user (13:47:47) wrtp: yup (13:48:06) wrtp: if you don't check, the system can crash (13:48:42) powerman: ok, what about string2c()? (13:48:47) wrtp: BTW, sys doesn't have to check because the only place the private field is used is inside freeFD which is only called on the sys-created types. (13:48:58) wrtp: powerman-asdf: what about string2c? (13:49:06) powerman: is it convert to utf-8? (13:49:10) wrtp: yes (13:49:16) powerman: good (13:49:22) wrtp: that's the whole point of it (13:49:38) powerman: I expect that, but... just to be sure :) (13:49:48) wrtp: and AFAIR the c string get hung off the limbo string, so you can forget about freeing it. (13:49:59) wrtp: but i'll just check that (13:50:13) powerman: also, I think it's good idea to release() before calling real functions from libre2wrap. (13:50:21) powerman: re2 lib is thread-safe (13:50:54) powerman: my wrapper probably too, it use only local variable for now (13:51:37) wrtp: i wouldn't release (13:51:44) wrtp: releasing is relatively costly (13:51:49) wrtp: and the functions won't block (13:52:24) wrtp: apart from for huge strings, you'll probably lose more time by releasing than you spend in the actual call (13:52:28) powerman: executing regex costly too. without release() dis will execute only one regex at a time... no? (13:52:30) wrtp: (that's my hunch anyway) (13:53:16) wrtp: it's fast enough that i very much doubt you'd get any speedup on two processors by releasing (13:53:53) wrtp: you can try it both ways - it's easy to time it, put in some release/acquire calls and time it again (13:54:40) wrtp: currently the interpreter is single threaded anyway, and it's gonna be faster than now (13:56:04) wrtp: the main thing to worry about is the maximum time that might be spent compiling or matching a regex, because everything else is shut out for that time. (13:56:41) wrtp: but you've got some pretty heavyweight calls going on in tk and other kernel mods too, so it's not like it's without precedent. (13:57:41) powerman: wrtp: i don't free(c) after c=string2c(s), and free(c) after s=c2string(c). no leaks detected for now (13:59:03) wrtp: that seems right (13:59:39) wrtp: russ's writeup is talking about 14us for a compile. i think that's an acceptable pause. (14:00:16) wrtp: and for the match, you could maybe look at the length of the string and do a release if it's bigger than, say 2MB (14:00:37) powerman: sounds like premature optimization :) (14:00:42) wrtp: sure (14:00:56) wrtp: it's easy to put in afterwards (14:01:34) wrtp: i was just pointing out that even if you don't release for smaller calls, you can do so for calls that might be a problem. (14:02:17) wrtp: note that for pure-ascii strings, string2c does almost no work at all (14:04:28) powerman: yeah. also, looks like distributing opt-ional kernel module is really ugly. I don't like to make one more hype for "/opt for drivers", but patching several files is ugly and likely will conflict with another optional drivers (14:05:45) powerman: and I don't think Charles will like to include this driver in official tree - linking emu with libstdc++ and pthreads isn't something he dream about :) (14:09:25) wrtp: powerman-asdf: yeah, i'd agree with that (14:09:45) wrtp: powerman-asdf: dynamically loadable C-implemented limbo modules would be good (14:18:12) powerman: wrtp: what about *malloc family? I've checked their implementation, and at a glance looks like smalloc() is best for most tasks in kernel modules like this one. (14:18:39) powerman: i wonder is it possible to use it instead of malloc() in libre2wrap (14:56:09) wrtp: powerman-asdf: that's the question you've got to ask. it would be good if re2's allocations came from main. (14:56:24) wrtp: powerman-asdf: but it think it *might* not matter if they don't (14:56:41) wrtp: powerman-asdf: as i seem to remember that allocated chunks come from malloc anyway (14:58:13) powerman: I have checked, and vs also confirm, malloc() call in libre2wrap actually will call malloc function from emu/port/alloc.c (14:58:29) powerman: no matter libre2wrap build with g++ (14:59:35) powerman: so, I suppose, if I add smalloc() prototype to libre2wrap code, it will call smalloc() from emu/port/alloc.c too (15:00:10) powerman: (but if some will try to build libre2wrap to use in other C application, non-inferno, he'll need to provide smalloc implementation, I think)