[Novalug] Looking for sample system and event logs..
musman at aug-sys.com
Wed Apr 2 21:51:54 EDT 2008
I'm aware of splunk and while it's pretty kewl, I (perhaps incorrectly)
categorize it mainly as a interactive search tool. If you know what
you're looking for, then it's a great way to find it in all of the
mounds of log data you have. The key item however, is that it only finds
thing that match what you've asked it to look for.
It also has a fairly primitive "alerting" feature (where I describe it
as being primitive since last time I checked it was non real-time, and
couldn't recognize compound events that span multiple entries, or
correlate across log sources. So it can do more than just allow you to
search through data, but I don't think it does what I'm trying to do..
If these comments are "off the mark", I'd welcome the inputs of others
to educate me properly. I certainly do not pretend to be even remotely
a splunk expert, and I may be doing them a disservice, so please let me
know if splunk does more than I think it does.
Soo... What I think I'm doing that is different from splunk is that that
my anomaly detection piece will find for you those unusual, unexpected
events that occur. Unlike splunk you won't have to specifically look for
them, it will immediately make you aware that something odd (a rare
unusual event) is happening, and point out to you what it is.
An example might be something like the sudden appearance of "memory
errors" in your syslog. What makes them unusual is that you've never
seen them before! Often when memory fails, these will start to show up
intermittently, and then become more frequent just before the memory
Ordinarily you might not think to search your logs for memory errors,
and you might miss seeing what is only a few of them in the 100's of log
messages that occur during any day. But in this case, because my
(envisaged) little tool has profiled your machine it knows that it's
never seen these these messages before and report them to you.
If you wanted too.. I guess you could then use something like splunk to
research the unusual event more, or look for other possibly related
events around the time where the unusual events occur. So it's not
supposed to replace splunk, it's supposed to do something complimentary,
that splunk doesn't do.
Does this make any sense? There is an example of a similar type of tool
at http://www.estpak.ee/~risto/slct/ , (the same guy who developed sec,
which was the precursor to OSSEC) but it tells you what's normal, rather
than what's "not normal", so I think I can do something more useful.
Unlike Risto, unfortunately, I don't have access to all of the log files
he has, to be able to testing the algorithms out, so I'm asking for your
Does this help explain what I want to do, and why I don't think it's
something splunk already does? If not.. keep at me. I hate re-inventing
wheels, so I'd rather someone where point me to an existing solution
than waste my time just because I didn't know. Thanks,
On Wed, 2008-04-02 at 20:46 -0400, greg pryzby wrote:
> Not sure there is anything cooler than this app for processing ANY
> logs, not as fast.
> Of course if you have something that does, I will get you funding for
> a small piece of the action :P
> On Wed, Apr 2, 2008 at 8:20 PM, Scott Musman <musman at aug-sys.com>
> Thanks for your response. I think what you're saying makes
> sense, but I
> don't think it's addressing what I need. Unless I'm
> misinterpreting it
> of course :)
> For what I'm trying to do, obtaining the log events isn't the
> hard part.
> As you say, I can tap into syslog for most if the unix system
> files. It
> doesn't solve the problem of other non-syslog sources of
> course, but I
> already have ways of accessing those entries also.
> What I think is the hard part of what I'm doing is coming up
> with a
> clustering algorithm that models the stream of events that
> occur, so
> that the anomaly detection program can figure out what is
> "normal" for
> the system it's monitoring, and then report only the unusual
> Where I'm at is that I already get all of the logs we're
> producing here,
> but like I said, I'm looking for more diverse environments (a
> variety of log entries, and activity) to also work with. I'm
> hoping that
> someone who runs some servers, and/or has a more active
> environment than
> we do would be willing to offer some of their logfiles so that
> I can
> test the clustering algorithm against it.
> Does this make sense, and explain why "syslog" is not the
> answer to my
> -- Scott
> On Wed, 2008-04-02 at 20:09 -0400, Jay Hart wrote:
> > Scott,
> > Will syslog not work for you?
> > I once wrote a script that would parse the logfiles of the
> different levels of
> > logging (crit, kernel, emerg, etc) that you can separate.
> > For example, say I get a entry in the emerg logfile, well,
> that same entry
> > also gets "logged" in all the lower level files, so you can
> diff out the files
> > and see what is going on.
> > Does this make any sense to you? I could dig up the old
> talk I once gave
> > about this, if you need it.
> > Jay
> > >
> > > Hi,
> > >
> > > Sorry if this an odd request, and I hope it's not
> inappropriate, but I'm
> > > looking at developing a real-time on-line logfile anomaly
> > > engine, and am hoping that some list members might be kind
> enough to
> > > provide me with some samples of their logs. I can't
> develop the
> > > algorithms without having typical logs to work with, and
> our own
> > > environment just isn't that complex, and so I'm hoping for
> a wider
> > > variety of activity characteristics to develop and
> evaluate against.
> > >
> > > Any system, web, or application logs you are willing to
> provide would
> > > work fine (I'll even take Windoze if its offered..), so
> long as you can
> > > provide at least a few 100k records that would be
> "representative" of
> > > normal on your system. More ideally, disjoint samples of
> the same log
> > > from different timeframes (i.e. a week or month apart)
> would be perfect.
> > > I'm willing to sign an NDA if you're worried about
> disclosing private
> > > information, or we can talk offline about how you could
> make your logs
> > > anonymous before providing them.
> > >
> > > Even if you don't have logs to offer, if you're interested
> in trying the
> > > thing out when it's done let me know. The real trick on my
> end is going
> > > to be to account for the differences in the way log
> entries cluster
> > > without forcing the user to be a machine learning expert
> to operate the
> > > tool.
> > >
> > > Thanks for listening, and I hope someone can help out
> > >
> > > -- Scott
> > >
> > > _______________________________________________
> > > Novalug mailing list
> > > Novalug at calypso.tux.org
> > > http://calypso.tux.org/cgi-bin/mailman/listinfo/novalug
> > >
> Novalug mailing list
> Novalug at calypso.tux.org
More information about the Novalug