[Novalug] Looking for sample system and event logs..

Scott Musman musman at aug-sys.com
Wed Apr 2 20:20:05 EDT 2008


Jay,

Thanks for your response. I think what you're saying makes sense, but I
don't think it's addressing what I need. Unless I'm misinterpreting it
of course :) 

For what I'm trying to do, obtaining the log events isn't the hard part.
As you say, I can tap into syslog for most if the unix system files. It
doesn't solve the problem of other non-syslog sources of course, but I
already have ways of accessing those entries also. 

What I think is the hard part of what I'm doing is coming up with a
clustering algorithm that models the stream of events that occur, so
that the anomaly detection program can figure out what is "normal" for
the system it's monitoring, and then report only the unusual events. 

Where I'm at is that I already get all of the logs we're producing here,
but like I said,  I'm looking for more diverse environments (a larger
variety of log entries, and activity) to also work with. I'm hoping that
someone who runs some servers, and/or has a more active environment than
we do would be willing to offer some of their logfiles so that I can
test the clustering algorithm against it. 

Does this make sense, and explain why "syslog" is not the answer to my
request?

		-- Scott

On Wed, 2008-04-02 at 20:09 -0400, Jay Hart wrote:
> Scott,
> 
> Will syslog not work for you?
> 
> I once wrote a script that would parse the logfiles of the different levels of
> logging (crit, kernel, emerg, etc) that you can separate.
> 
> For example, say I get a entry in the emerg logfile, well, that same entry
> also gets "logged" in all the lower level files, so you can diff out the files
> and see what is going on.
> 
> Does this make any sense to you?  I could dig up the old talk I once gave
> about this, if you need it.
> 
> Jay
> 
> >
> > Hi,
> >
> > Sorry if this an odd request, and I hope it's not inappropriate, but I'm
> > looking at developing a real-time on-line logfile anomaly detection
> > engine, and am hoping that some list members might be kind enough to
> > provide me with some samples of their logs. I can't develop the
> > algorithms without having typical logs to work with, and our own
> > environment just isn't that complex, and so I'm hoping for a wider
> > variety of activity characteristics to develop and evaluate against.
> >
> > Any system, web, or application logs you are willing to provide would
> > work fine (I'll even take Windoze if its offered..), so long as you can
> > provide at least a few 100k records that would be "representative" of
> > normal on your system. More ideally, disjoint samples of the same log
> > from different timeframes (i.e. a week or month apart) would be perfect.
> > I'm willing to sign an NDA if you're worried about disclosing private
> > information, or we can talk offline about how you could make your logs
> > anonymous before providing them.
> >
> > Even if you don't have logs to offer, if you're interested in trying the
> > thing out when it's done let me know. The real trick on my end is going
> > to be to account for the differences in the way log entries cluster
> > without forcing the user to be a machine learning expert to operate the
> > tool.
> >
> > Thanks for listening, and I hope someone can help out
> >
> > 	-- Scott
> >
> > _______________________________________________
> > Novalug mailing list
> > Novalug at calypso.tux.org
> > http://calypso.tux.org/cgi-bin/mailman/listinfo/novalug
> >
> 
> 



More information about the Novalug mailing list