Super-Kamiokande I Upmu Data Analysis

Seeking Astrophysical Neutrinos in Super-Kamiokande

Preliminary Report, 7 August 2003

Somewhat updated, added section 8, 29 October 2003

John G. Learned

University of Hawaii


Abstract

This document presents a first report on my activities aiming towards a thorough analysis of the SuperK-I upgoing (through and stopping) muon data. This data set, which consists of 2561 recovered muon events with energy more than about 1.6 GeV in the detector, is searched for extraterrestrial point sources of neutrinos, clusters, and anisotropy of the data set. No convincing evidence for other than atmospheric neutrinos as the source of the events is found, but the world's best limits will be set for extraterrestrial neutrino fluxes in the energy range from a few GeV to 100 TeV. However, the main purpose of this report is to communicate with collaborators on various methods developed for analysis, stimulate discussion, and point out areas for further work (see summary at end).


Outline

  1. Introduction
  2. Data Set
  3. Monte Carlo Data Set
  4. Point Source Searches
  5. Galactic Center Region
  6. Diffuse Sources
  7. Summary
  8. Addendum: Ultimate Point Source Search
  9. References

1.0 Introduction

Since the very first underground experiments seeking natural neutrinos (\cite{CWI,KGF}) there have been attempts to find point and diffuse extraterrestrial sources of neutrinos, and initiate observational neutrino astronomy. The detection method has been mostly by looking at the source direction of muons traversing underground detectors, which muons travel in much the same direction as their parent neutrinos (to a few degrees for the data discussed herein). These muons may be produced some distance away by charged current interactions of muon neutrinos (and anti-neutrinos). The effective volume for such instruments is then roughly equal to the muon detection area times the range of the muon and density of the surrounding rock. One might detect cosmic neutrinos in events collected from within such instruments, but the effective volume is much less than for muons, and the angular resolution is not as good. Since even in the deepest mine, there are muons penetrating from the surface at rates greater than muons from neutrino interactions in the surrounding rock, we employ only those muons arriving from below the horizon, which constitute a source of pure neutrino origin.

The atmospheric neutrinos constitute an irremovable background of cosmic ray generated events. They are produced by primary cosmic ray interactions and subsequent secondary (and tertiary) particle decays in the atmosphere, producing a substantial neutrino flux everywhere on earth. Indeed the study of these neutrinos has been most fruitful, leading to the claim of muon neutrino oscillations (\cite{SK98}). Herein we focus upon the study of neutrino generated muons of energies greater than about 1.6 GeV at the detector, and we neglect the solar neutrino events (5-13 MeV) (\cite{SKsnu}) and those neutrinos interacting within the detector (mostly 100 MeV to 1 GeV) collected in SuperK (\cite{SKfc}). We also do not discuss neutrinos from supernovae, including the one example of detected neutrinos from beyond the solar system from SN1987A (\cite{KAM,IMB}), which searches continue in several instruments around the world. We also will concentrate upon the search for point sources of neutrinos, as opposed to diffuse sources.

One may ask if the sensitivity of this experiment is sufficient to see potential cosmic high energy neutrino sources. Many studies have been carried out over the last 25 years, examining the prospects for initiation of high energy neutrino astronomy (\cite{jl-km}). The general conclusion has been that it will require of the order of $10^6~m^2$ muon collection area to be very likely to see such sources. Given that cosmic rays exist, there is no question that substantial neutrino fluxes exist as well, but with orders of magnitude uncertainty in fluence. At the sensitivity of SuperK (about 1000 $m^2$ muon collecting area) while the chance to find a source seems not great, it is not disallowed either. For example, the radiated power from various cosmic objects is sufficient to produce detectable fluxes. This is particularly true of bursting, jet-like objects such as gamma ray bursts and blazars, for which the main mechanism is unknown and neutrino flux predictions highly speculative.

Experimental limits on cosmic neutrinos have increased slowly over the years. The sensitivity for muons goes with the area, and not the detector volume. Larger instruments began to be built about 25 years ago in the search for nucleon decay (\cite{IMBnus,KAMnus}). SuperK, the largest underground instrument yet constructed, has a further advantage over earlier instruments due to its thickness as well as large crossectional area, which permits setting a higher muon energy threshold. Extra-terrestrial signals, with expected relatively flat spectra compared to the atmospheric neutrinos, are expected to gain in signal-to-noise with greater than proportionality to the energy threshold cut. We further exploit that capability herein by employing a data set selecting high energy upcoming muons.

In the following we report on the analysis of the first five years of SuperK data and our as yet unsuccessful search for high energy neutrino sources with this data. We do report the best limits on neutrino fluxes thus far.


2.0 Data Set

The data being analyzed herein consists of the upcoming muon sample, both through-going (UM herein) and those that stopped within the fiducial volume (SM herein). Since the muon event rate at SuperK grows very fast with angular distance above the horizon, we utilize only events arriving with zenith angle (direction of origin) beyond 90 degrees. A cut is made on minimal pathlength, 7 m, 1.64 GeV for minimum ionizing muons, in order to have good reconstruction accuracy. There is some pollution close to the horizon, from downgoing muons scattered to slightly upcoming directions. These have been carefully studied for oscillations analysis (NN events thru and NN events stopping). However, since these events are a tiny fraction of the total, and (presumably) do not point to any spot on the sky, we need not be concerned with them for present purposes. [Indeed we could without much penalty consider adding events above the horizon, as the AMANDA group and other experiments have done in the past such as the Frejus experiment. For the moment this is impractical, since the upmu reduction is not efficient or well characterized much above the horizon.]

The data set was acquired during the period of 1 April 1996 through 17 July 2001. The total livetime was 1679.59 days, about 87% lifetime over the whole period. Detector conditions were sufficiently stable throughout this period that we may assume them constant for this analysis. The exposure in sidereal time was sufficiently isotropic for us to take it to be uniform for this period. This is evidenced by the lack of evident modulation in the plot of sidereal time of the event arrivals (see plots below).

Further details on data filtering and reconstruction may be found in the PhD theses of Andy Stachyra (2002), Choji Saji (2002) (available on the web at http://www-sk.icrr.u-tokyo.ac.jp/doc/sk/pub ), and soon that of Shantanu Desai (2003?).

Cuts: events with zenith angle of greater than 90.0 degrees accepted. Energy required to be greater than 1.7 GeV and less than about 200 GeV (175,000 photoelectrons to be precise), and pathlength greater than 7 m. Previous selection criteria have selected good fits, good data runs, etc. The efficiency is XX%. The livetime calculation includes good run selection. Starting with the official ntuple (of July 2003) we have:

The local coordinate projections of the data are shown in the following figure. The event rate falls steeply below the horizon (not just due to the lack of solid angle in this upper right hand panel), because we are mostly sensitive to atmospheric neutrinos of typically a hundred GeV in energy. The high energy horizontal flux is increased over vertical by the "secant theta effect", just the increased probability of pion decay before interaction for near horizontal cosmic rays. Note that, as we discuss later, the modulation for very much higher energies (viz. ~10 TeV) is even stronger, a fact we can put to good use in searching for sources.

The lower left hand panel shows the local azimuthal distribution of events, and one sees that it is satisfactorily flat, within our limited statistics, though there is a hint of some overall modulation. This latter is probably due to the feed-down events near the horizon, and will not hurt us for the present sky searches in any event.

The lower right hand panel shows the distribution of events with energy deposited in the detector. The typical minimum ionizing muon deposits 1.6 to 5.0 GeV in traversing the tank. The exponentially falling tail to the right is due to events which make bremmstrahlung, pair production or nuclear interactions in the water. We see a few events up to about 200 GeV. There is a cut at 175,000 photoelectrons (beyond which reconstruction is difficult since the phototubes are mostly in saturation). As we shall discuss later, these showering event may be useful in making a cut to achieve a sample of much greater mean muon energy.

The local detector coordinates are based upon a rectangular system, centered in the cylindrical detector tank. Z is up, X is in the direction of the access tunnel, and it is a RHS. There has been a little uncertainty (a few years ago) about the compass orientation of the X-axis. Herein I use a value from Andy Stachyra (4/97), based upon downgoing muons compared to topo map (see Stachyra thesis, 2002). The error is about +/- 1 degree according to Andy. The precise coordinate definitions used are:
sk_az = atan2d(-dy,-dx) = local azimuth angle, sk coords
az = sk_az - 130.2 = rotation, so E is zero, az going N
dx, dy and dz are the fitted direction cosines (of the unit velocity vector) and hence the negation (we are looking for where the neutrino came from not where it is going). For the record, I take the Kamiokanda location (from Oyama thesis, 1989):
detlat = 36.42 deg N
detlon = 222.69 deg W (or 137.31 deg e, if you prefer)

Coordinate transformations herein employ the timesub.f (also available in c) package, which I have used for many years (since IMB days) and which has been repeatedly tested. (It is based upon algorithms from the Duffet-Smith book "Practical Astronomy with Your Calculator", third edition, Cambridge 1987). The basic path is to take local time (JST) to Greenwich Mean Time to Greenwich Sidereal Time to local Sidereal Time. Then the angles are transformed from local zenith and azimuth to right ascension and declination. From there we make further transforms to Galactic Coordinates, Ecliptic, Solar, Lunar, etc.

The next figure, below, displays the distribution of muon path lengths in the detector versus the zenith angle. One sees a spike at the diameter of the detector (34 m), and a few events out to the diagonal (50 m). In the right hand panels we show the calculated effective are for SuperK versus cosine of the zenith angle, and below that the calculated muon flux (with a suppressed factor of 10^14 cm^2 sr sec). Through and stopping fluxes are overlayed.

In the following panel we show projections of the entry and exit points of the muons, as taken from the ntuple, in cylindrical and vertical projections. The exit point is calculated from the entry point with the pathlength added in the direction of the muon (fitted values). There seems to be some discrepancy in a few events which have reconstructed end points outside the detector. This is ignored for the moment, until the events are examined.

Finally in our examination of the given data set, we see below a very busy 12 plot panel, illustrating distributions of many of the quantities passed from the ntuple, and constituting a check. The distributions, left to right, top to bottom, are in the Julian Date, local time, time between events; local hour, minute and second; Greenwich Mean Time, sidereal time, day; month, and year. (The last panel is not relevant for this data sample). All show satisfactory uniform response as appropriate. In particular the sidereal time distribution is nicely flat (chisq of 75.27/95), which means that we do not need to compensate for any variation of exposure with right ascension.


3.0 Monte Carlo Data Set

A Monte Carlo simulation was employed to generate a 40 year equivalent data sample (14610.0 live days). This program employs the Bartol '96 atmospheric neutrino flux calculation, and the NUANCE Monte Carlo to generate event vectors, which are propagated through the detector with the ApDetsim program used by the SuperK group. This work was largely carried out by Alec Habig and students, and is described in ....... Another Monte Carlo sample is being prepared in Japan, which we will employ when it becomes available. Note, for newcomers, in the past we have employed several heritage analytic calculations for the upmu analysis. While the latter is viable for some purposes, the availability of the Monte Carlo sample is invaluable for the present work, as one will see below.

The following are the statistics on the present 40 year MC sample (as of 7/03). There are a number of events which are not classified as either through-going muon or stopping muon in the ntuple, but these are not employed in the present work (same cuts as for data sample, of course). One demonstration that the MC does well at representing the data sample is in the stop/thru ratio. The MC gives 0.355 +/- 0.004 (stat) for this with no oscillations, but after oscillating the MC sample with dm^2 = 0.002 eV^2 and sin^2(2 theta) = 1.0 (the present official fit value mostly from contained data, 7/03), the stop/thru becomes 0.247 +/- 0.003, very close to the data value given above.

3.1 Validation of MC Model

We have examined the MC data sample as with the data discussed earlier and find it satisfactory in all details. Note that the MC events are generated at irrelevant times, and herein we use the real event times to generate fake data samples. Perhaps the most important comparison is between the zenith angle distribution and the declination distribution, as illustrated below in the next figure. Clearly the shapes match well, as demonstrated by the difference test in PAW, which gives numerically comforting values. The zenith distribution does show some deficit in MC relative to the data in the first 2 degree bin, by about 30 events (consistent with the in-scattering estimates). As mentioned, herein we ignore this small pollution by downgoing muons. The declination distributions match with P=0.4.

The Monte Carlo graphs presented represent an oversampling, since we have made 100 background maps (in present trials), drawing upon the ensemble of 19856 muons in batches of 2561 events, and assigning them to the real (local) times of the 2561 data events. Thus on average each event gets re-used about ten times in this way. (And MC error bars are three times too large in the figures shown below, which I was too lazy to fix right now, but which is irrelevant for calculations to be presented below).

In the following plots we use the muon path lengths in the detector to measure the mean entering muon momentum. The idea, written up in a memo I circulated several years ago within SuperK (but at the moment cannot find), is simply that the muon trajectories traversing the detector randomly sample muon track segments. Such muon tracks should start uniformly within the earth and stop uniformly. One may thus regard each entering muon as a trial at whether the muon will come to rest or not. If, for example 100 muons enter the detector each making 10 m track in the water, except 20 which stop, then an estimate of the average track length (in water) is 1000m/20 = 50 m, hence the average energy would be of order 10 GeV. If one does this more carefully, one finds that a good estimate of the muon energy is

E_mu = {Sum(PL_thru) + 2*Sum(PL_stop)*{dE/dx}/N_stop + E_cut

The factor of 2 is from accounting for both ends of the track, starting and stopping, and estimating that in a given volume, the sum of "stops"(muon track endpoints) is equal to the sum of "starts" (ie. interactions). I also assume all tracks to be minimum ionizing, which is a good approximation for atmospheric neutrino generated events. In the top right panel we see the summed pathlengths versus cosine of zenith angle, while the lower left show the number of stoppers, and finally on the lower right the calculated entering muon energy. Overplotted are two analytic estimates of this quantity from Todor Stanev: lower without oscillations and upper with. Also overplotted is the mean energy gotten from the MC.

3.2 Characteristics of Point Sources

The MC affords us the opportunity to study the response of our detector in many nice ways. In particular we can calculate the angle between the incoming neutrino and the muon direction, which I will call Beta hereafter. In the following I have called the distribution of numbers of events versus angular distance from the "true" direction, the "Point Spread Function", or PSF for short. It includes the solid angle of that bin: it is integrated in azimuth around the point, and is dN/dBeta, not dN/dOmega, where dOmega is differential solid angle. (Astronomers would take the PSF to be a slice through the 2D distribution, in intensity and hence their PSF will be more like dN/dOmega. For our purposes the dN/dBeta is more useful.) This function includes all effects of physics vertex (dominating up to a few GeV) and reconstruction resolution. It is specific to a given detector and data sample. We can make cuts on the MC sample to examine their impact on resolution, for example.

The MC data has been generated for a particular calculated atmospheric neutrino spectrum (Bartol '96 in this case). This spectrum is, at high energies (hundreds of GeV), simply one power in the energy steeper than the input primary cosmic ray spectrum. This is due to the competition between pion absorption and decay, absorption winning ever more with energy. The primary spectrum is approximately a power law of the form E^-gamma, where gamma is about 2.76 in our range of interest. The differential neutrino spectrum thus goes as E^-3.76. Our fitting of the spectral index as a free parameter in the oscillations analysis drops this a little to closer to 3.7. (This needs to be checked, but a little one way or the other is not crucial to the following as it turns out).

There are many models for potential sources of high energy neutrinos. Most often the product spectrum is roughly E^-2, a spectrum that arises in a number of astrophysical circumstances, ranging from beam dump conditions to shock acceleration. Of course the power integral of such a spectrum diverges, so such a form will not go on to infinite energies. However, there are lots of examples of such relatively flat spectra seen in astrophysics, in particular in high energy gamma ray sources, certainly closely related phenomena to potential neutrino sources. On the other hand we know the cosmic ray spectrum at earth is a bit flatter, as stated above. One expects that this spectrum is steepened somewhat from the source, due to faster leakage of higher energy particles from the galaxy. So, for present purposes, we can take as a good bet that any astrophysical point source we find will have a spectral shape somewhere between the cosmic ray spectrum and 1/E^2.

We can use the MC to generate other spectra than those used initially by weighting the events. We can simply calculate a weight for each event which is:
w = (E/E_0)^dgamma

The dgamma can be 1.0 for a cosmic ray like spectrum, or 1.7 for simulating a 1/E^2 spectrum. The reference energy is also interesting, though arbitrary. If we adjust this reference energy, E_0, to be such that the weighted total number of events is preserved, then this energy is that at which the input atmospheric neutrino spectrum (averaged over the lower hemisphere) crosses the putative point source spectrum. These energies are 553.6 GeV for the CR spectrum, and 1431 GeV for the 1/E^2 spectrum.

Figure 3.2.1 above shows PSFs for the stopping and through upmus, plus (weighted) PSFs for the cosmic ray spectrum (E flatter than the atm neutrinos) and 1/E^2 distribution (atm nus times E^1.7). The results are in the attached .gif plots and in sum:

sample spectrum mean angle (deg)
------------- ---------------
stop 7.35
thru 3.03
CR spect 1.75
1/E^2 1.57
        psf for neutrinos with 1/E^2 spectrum

         angle        psf             raw histo

           0  0.439750242934067        9867.738    
           1  0.350739364805604        7870.386    
           2  6.952910178711852E-002   1560.192    
           3  6.702367388549178E-002   1503.972    
           4  3.470762304305584E-002   358.4710    
           5  1.703593081666297E-002   343.9483    
           6  1.007263799472843E-002   269.1197    
           7  5.880807397020261E-003   38.39635    
           8  3.167898678758865E-003   113.8176    
           9  1.624564224535101E-003   5.314149    
          10  4.681544329577152E-004   10.50511    

In the plots, the CR spect and E^-2 spectrum plots are not normalized and smoothed as are the ones for the through and stopping. (Smoothing is only at larger angles to dampen statistical jumping about.) Just to be clear these plots show the difference between the true neutrino direction in the MC truth information, and the reconstructed SK direction, as a function of the angular difference in degrees. This includes the solid angle (is dN/dangle, not dN/domega). It might be that the PSFs for the steeper spectra are a little underestimated (should be "pointier") because the statistics in the MC for very high energy neutrinos are poor. However, the results will not change much, since the angular distribution is presumably dominated by the fitter for those high energies.

The bottom line really is that for our searches for plausably expected cosmic neutrino sources, we need only search out to about two degrees. Sources with steeper spectra are of course possible, as would be expected near the cutoff in any astrophysical source. However, given what we know about gamma ray point sources (eg the blazars MRK 501 and 427), flatter spectra in our energy range seem more likely. And hence we can focus our efforts in searching our data set for clusters within 2 degree half angle cones.

3.3 Showering Events

It is interesting to see if we can find some cuts on the upmuon data set to reduce the atmospheric neutrino content while not losing too much of putative astrophysical high energy neutrinos. Shantanu Desai, at BU, has explored making a sample enhanced in showering events, which he has done from the original data sample. Here we take a little look at what might be done using the ntuple alone.

In the following panel we see MC data in three plots. The first shows the angular deviation (y, in degrees) between neutrino and fitted track as a function of energy loss rate, -dE/dx, of atmospheric neutrino events. What one sees is that not much is to be gained in angular resolution by a dE/dx cut. The second plot shows the dE/dx on the vertical axis versus the log10 of neutrino energy in GeV. Here we see a definite correlation... a cut a little above 2 MeV/cm will indeed shift the mean source energy to significantly higher values. However, it appears that an even better cut may be what I call "showering energy" which is just visible energy less track length in the detector times . This third plot indicates that a cut at about 5 GeV (vertical axis is log10(E_shower)), will shift the mean neutrino energy from around 100 GeV to around 1 TeV. I have not yet studied the "efficiency" of this sort of cut: we need to see if we gain more in "signal-to-noise" ratio than we lose in potential "signal events" (ie, great cut which results in excellent S/N but no signal remaining, is not useful!).

4.0 Point Source Searches

As discussed above, we want to search the data set for possible astrophysical point sources of neutrinos. Our best bet is for a rather hard spectrum source, with typically a power law spectrum somewhere between the primary cosmic ray spectrum and a canonical 1/E^2 spectrum. As we have seen, for both cases given a detector with the resolution of SuperK, we should expect most events from within a 2 degree half-angle cone about the true source direction. Of course, this may not be the case as there could be sources with steeper spectra (not expected, but not ruled out either), and indeed the flat spectrum sources must get steeper too as they hear their cutoff energies.

A thorny question we must answer in any of these searches is the question of "trials factor": if we make 10^4 trials we should not be surprised in finding clusterings which are individually unlikely at the level of 10^-4, or equivalently at the 3.9 sigma level. This is most annoying, since it means that our sensitivity is thus reduced by at least this amount. There are several things we may do however.

4.1 Flat Spectrum Sources

We have learned that a typical expected astrophysical neutrino source should appear for SuperK as a ~2 degree cluster of events. As mentioned, one may perhaps simply do a cone search: use the MC to estimate how many events are expected inside such a cone, as a function of declination, and then look at the Poisson probabilities for each cone. A source would be revealed by a non-exponential tail in the distribution of probabilities. One can oversample with abandon, as long as we can trust the MC data set to generate representative fake (random) maps.

I have chosen to try and implement what may be a slightly more powerful method, in employing the PSF as we have gotten from the MC. First, we assume that a true source will be uniformly distributed about a point. (Indeed, we might add a test of uniformity in azimuthal distribution about a point to such a point source search... not done yet, but I suspect there is not enough numbers of events to make it powerful). If this is so, then we only need examine the radial distribution about our test direction. The background (atmospheric neutrinos) will generally increase with the angle squared, while a source will decrease. We can thus set up a Max Liklihood test of this radial distribution, minimizing the liklihood for a variable number of point source events relative to a predicted background number and distribution.

This has been implemented in one degree bins. It could be done without binning, but the computation will become enormous. Hence while generating fake maps (currently 100), I go through each map making a matrix of the number distribution from a given point, for each one degree in declination. The fake maps are generated by using the true event times and randomly drawing event directions from the (much larger) MC sample. This accumulation of the matrix essentially integrates over right ascension, assuming that our sampling in sidereal time is smooth enough, and it would appear to be.

Having the predicted radial distribution of atmospheric neutrino induced muons, and the PSF for a test source, one can then do the Max Liklihood minimization, detecting the number of source events, and by backing off in log liklihood, set a limit on number of potential point source events. There is only one free parameter in this test, and that is how far out in angle to carry out the test. This simply depends upon the PSF.... one needs go out to an angle where the PSF is essentially zero. Various numbers of potential sources will not change the net liklihood out beyond that angle; so as long as the cutoff angle is great enough it will not effect the results. The following employs the PSF for a 1/E^2 distribution, and runs the test out to ten degrees. I take the probability at each radial one degree bin to be Poisson with the expected number being that from background plus hypothetical signal. The liklihood is the product of these probabilities, but actually (as usual) we work with the -log(liklihood).

The panel above shows the directions of the upmu data set in the lower panel, and a (2 degree) smoothed version in the upper panel to make a color plot of the event distribution on the sky. The galactic plane is superposed on this Mercator projection of the equatorial distribution. One see that most of our data is in the Southern sky and we cutoff in the North around +50 degrees in declination.

This next panel, above, presents the numbers of events found by the Max Lik method for each point on a one degree grid in the upper plot. Note that the map is fairly uniform, indicating that we have properly accounted for the significantly varying declination distribution of background. The highest point is an excess of seven events, and occurs around alpha = 188 degrees, delta = -11 degrees.

We call W_0 = -log(Lik) in the case of no signal, and W_min that for the min W with non-zero signal. The lower plot shows those points where the W_0 - W_min > 2. The values plotted are the W_0 -W_min + 1. The most interesting point is as above. The values in this map are shown in a histogram below, left side.

Here we see the distribution of W_0 - W_min values for the sky from declination -10 to +50. At this moment I do not understand the slope of this distribution. In any case, there are points towards the right out near W_0 - W_min = 9.5, which appear to be rather unlikely by simple extrapolation.

In order to make some limits which encompass the sky I have first calculated the exposure as a function of declination, as shown in the top graph of the panel above. Note that it is in units of 10^14 cm^2 sec. The second figure show the declination profile of the largest number of events in each declination band, degree by degree of declination. I am not sure this is the proper number to use, and hope for some discussion on this point. In any event, it is fairly flat at about 10 events at a ~2 sigma limit. Then, utilizing that number and the exposure, we can make the bottom plot, exhibiting the upper limit flux as a function of declination, and sees it is around 10^-14 muons/cm^2/sec. To check the scale of this, the underground muon flux overall is around 2 x 10^-12 muons/cm^2/sec/sr. In a 2 degree cone one expects a flux of about 0.12 in these units, so this limit corresponds to about 8 times the atmospheric neutrino induced background.

Not yet done is to translate this into a neutrino flux limit. This may be done by noting the crossing point of the fluxes as indicated earlier, or from calculation, as has been done by others (eg. Shige Matsuno). Better yet will be having a point source MC sample to make the translation more precise.

Follows a list of close pairs of events on the sky, within two degrees:


events closer than       2.00 degrees

     event   ra(hrs)  dec(deg) close pairs


        45      3.33     -56.4         4
        89     12.55     -11.0         6
       111     12.29       0.8         4
       122     12.49     -12.3         4
       250     20.83      33.3         4
       396     12.50     -12.3         4
       422      1.34     -38.5         4
       435     12.22       1.3         4
       463      1.28     -36.6         4
       656     18.79     -51.7         4
       830      1.24     -37.3         4
       836     14.22     -64.7         4
       892      3.13     -55.8         5
       934     11.81     -65.1         5
      1135      2.55     -19.3         4
      1389     20.83      33.2         4
      1462     18.73     -52.6         4
      1464     12.31      -0.2         4
      1567      3.07     -57.4         5
      1609     16.18     -61.3         4
      1610      1.42     -36.8         4
      1643      1.33     -38.4         5
      1737     17.24     -40.1         5
      1756     12.53     -11.4         5
      1825      2.56     -19.0         4
      1920     16.28     -51.2         4
      1939     17.33     -39.7         4
      1978     12.18       0.3         5
      2002      6.28     -62.9         4
      2292     17.22     -41.0         5
      2369     12.58     -12.2         4

4.2 Unusual Spectrum Sources

Tests such as using a fixed cone, or even the PSF require some assumption about incoming neutrino spectrum. This is OK for what it is... testing our presumptions about expected sources. And such permits us to extract a quantitative estimate of what we see (or do not see). However, given that we are fishing for new science, it would be nice to have a test which makes minimal assumptions about cosmic sources. I will not deal here with the question of what we do about seeking any peculiarities in the overall sky distribution (as we may study with N-point distributions, Fourier transforms, wavelet decompositions, etc., see below). What I would like is a point source test which makes no assumptions, or very few, about the source and which simply tells us if there is something unexpected in our data.

One can look at the distribution of events as a function of distance from given directions. In fact one way simply take the event directions themselves as test directions. We make only the reasonable assumptions of a source being pointlike on any scale relevant to our resolution, and that such sources will be symmetric about that direction. So, for each point we may look at the one dimensional distribution of numbers of events in angular distance. This suggests employing the minimal assumption Kolmogorov-Smirnov shape test. One needs a null hypothesis, or background to test against. That might be just phase space, solid angle that is, if we assumed that the events are distributed uniformly upon the sky. But this is not so, since our acceptance is strongly dependent upon declination. However we can just use the MC generated background, or we can use a bootstrapped data background, as discussed earlier. If one makes a 2D histogram of cumulative (number up to that angle) background events versus distance from a given trial event, versus declination, one has all that is needed to play this game.

Then at each event in the data sample one does the KS2 test out to some chosen angle. One gets a probability for finding the observed angular distribution then at each data point. This test ignores the total number of events, since each distribution is normalized (between 0 and 1). So I also do a Poisson test on the probability of getting the total number of events seen. These two probabilities are combined using the Wallace formula (see Ralph Becker-Szendy's thesis for a discussion and extension to this):

	      p2 = prob_pois*prob_ks
	      prob_joint = p2*(1.0-dlog(p2)) ! Wallace's formula
	      w_joint = -dlog10(prob_joint)

Plotting a histogram of the ensemble of data points one can then scan for unusual directions and clusterings. These probabilities are illustrated in the right hand figure three panels above. The solid curve is the joint distribution, dashed, the KS2, and dotted, the Poisson values. The test went out to 10 degrees.

This test will find unusual event groupings around data points, even if, say there should be some "haloing" or a peculiarly fat distribution (if one goes out far enough with the KS test). This test will find both hard and soft spectrum sources without prejudice. On the other hand, this test does not produce an estimate of the flux involved, just tags unusual points. Also, one must choose the maximum angle over which to search, which is not a priori obvious.

This test, which I do not know a name for, as I do not think astronomers use something similar, is somehow related to the twopoint correlation function and the multi-point functions. However, the twopoint correlation function tests the whole data set, looking for grand scale structure. Here we are seeking unusual individual points. I think thus that this test may well be included in our paper, along with the twopoint function, as they give different information.

4.3 Multiplicity versus Angle

We can make a general examination of the multiplicity of events within a given radial spread around test points. We already have all the information for this in the course of the calculations described above. In the first plot below we show the numbers of events along the y axis, versus the cone angle along the x axis. The box area is proportional to the -log of the probability of getting that number of events from a Poisson distribution out to that cone angle.

The next three figures in the panel above, show the multiplicity of numbers of pairs within a given angle cut, 2 degrees in this case, for data and the MC, and the ratio of the two. There is no hint therefore of an excess of weak signals clustering just around the level of fluctuations.

I am not really sure where to go with this test.

4.4 Weak Sources: Twopoint Correlation

The two point correlation function is shown below for data relative to MC fake maps for background estimates. One sees, as mentioned earlier, a let anti-clustering in our data relative to the MC. I do not understand this as yet.

5.0 Galactic Center Region

The following panel shows four plots, the left hand pair being the zenith angle dependence of events from a 10 degree square region around the galactic center, as a function of hour angle (angular distance from the meridian, the N-S line going straight overhead at any point). A given point on the sky rises to maximum zenith angle as shown, and in fact becomes invisible in the middle region as it is above the local horizon. The lower left panel with MC data shows how this correlation (purely geometrical) varies with time. Since the atmospheric neutrinos peak near the horizon, the background for a given sky region changes with time, peaking when the test region comes nearest to the local horizon. The right hand plots show the number of events versus hour angle. One sees that for this sample, there are 4 events in the box, and all come from close to the horizon.

This is a work in progress, and obviously the next thing to do is a statistical test setting a limit on a flat background being added to the top-right panel. (Actually it is not quite flat, because the effective area for the detector changes with zenith angle somewhat, so this needs to be divided out).

6.0 Diffuse Sources

At this time I have not much to report about diffuse source limits. A simpleminded way is to add a constant rate to the zenith angle distribution of flux (not rate). A better way is perhaps to use Max Liklihood on the distribution of rate versus hour-angle. Indeed this can be done for a region of the sky (such as the galactic plane, as above), or even point-by-point.

7.0 Summary

Work in progress and outstanding questions:

If you read through this whole note you deserve a medal. I apologize that it is not been refined, and that there are lots of loose ends. I am releasing it to the inner circle however, to generate discussion in the hope that we may proceed to get a real publication out in a few months.

8.0 Addendum: Ultimate Point Source Search

A Better Point Source Search for UPMUs
jgl 6 Sept '03, updated 29 Oct.

The idea, mentioned earlier, is to try to employ all information we have which may be used to discriminate between atmospheric neutrino induced muons, and muons from extraterrestrial point sources. The handles we explore are angular distance from a point, energy deposited in the detector, azimuthal distribution about the test point, and the zenith angle of the events. In each case I make up a proobability function which gives the (relative) probability of each event given a hypothetical signal-to-noise ratio at the test location. Varying the S/N I minimimze the -log(liklihood) to find any signal which may be present.

We discuss each component of the liklihood functioni in turn below.

a) Cluster Angular Distribution: In earlier work we have seen that any relatively hard spectrum source, flatter than the primary cosmic ray spectrum as generally expected for potential astrophysical neutrino sources, will result in a point spread function for UPMUs (even after reconstruction) which has something like 90% of the events inside a 2 degree half-angle cone. This suggests that in a simple first cut data scan we can just look across the sky with a simple "cone search", seeking unusual clusters.

As already implemented and discussed above, we can use the predicted point spread function at each point on the sky along with the background (nearly) uniform distribution (which depends only upon declination) to do a Max Lik analysis at each point on the sky, yielding excesses and limits. Actually I think this does not in itself improve much on the cone search, but does produce nicely quantifiable results and nice maps, as already shown.

For the probability distribution, I take the expected number as that given by the tables made from the Monte Carlo for number of events as a function of distance from a point at a given declination. Largely this will be just increasing as the angular distance squared (1-cosine(angle)), but using the MC takes into account the zenith angle and aperture induced variation with declination. For each test signal fraction I then calculate the probability of seeing the observed number of events in one degree bins in angular distance, and I assume that the Poisson distribution obtains for each trial (as it should). b) Energy Distribution:

A new thought (to me anyway) was to employ also the "shower energy" at each point in the Max Lik function. In the attached figure below I have plotted, from the MC obviously, number distributions for all atm events and for those from a 1/E^2 source, as a function of the log10(E_shwr/GeV). The third, lowest, panel shows the actual data. Note the rather nice separation... if one made a cut at around >6 GeV shower energy one would get rid of 92% of the atmospheric events and keeps 69% of the putative point source events. (Actually it will be a bit better than this since our MC did not go to high enough neutrino source energies.) For the liklihood function I use a fitted exponential as indicated in the panels. The log slope for the atmospheric neutrino data from the MC is -1.7, with the parameter being the log_10(shower ernergy in GeV). I am not sure the exponential is the best function to use (I have no idea what it should be analytically).

c) Azimuthal Distribution:

Previously I had dismissed the idea to utilize the azimuthal distribution of events around the trial point-source direction. A real point source should be uniform, and the atmospheric background should be nearly uniform too, so what's to gain? However, since we are walking the trial direction around the sky, if there is any cluster we will (and do) pick up multiple "hits", real or fluctuation, as the trial direction crosses the cluster. Perhaps a good way to think about this is as a peak finder... think of a peak finding algorith in time for a PMT pulse, for example. Adding to the liklihood a function which peaks when centered on the cluster in 2D angle space accomplishes this goal. What I remain confused about is whether this ought to be something like small center-of-mass offset for the events in the cluster, or just a small Rayleigh power (directions only). Right now I am using Rayleigh power, since we know this should produce a nice exponential PDF for random directions. Also, I have used radius already in the PSF, so using it twice may not be wise. The present implementation includes then the Rayleigh power of all the events in the test cone (now out to 5 degrees), but atmospheric background and possible signal. Thus this does not contribute to the minimization of the liklihood, but will help in looking at the liklihood surface to pin down possible source directions.

A small, and perhaps troublesome, problem for this game is the gradient of the background in declination, which of course will result in a slight preferential direction for the background events. Perhaps I can use the MC to calculate the net expected Rayleigh vector, which will be N-S, versus declination, and subtract this from the local calculation. This effect is at present ignored.

d) Zenith Angle Dependence:

As already discussed above, for the case of the galactic center region, each point on the sky follows a predictable trajectory in zenith angle at SuperK. And, since the atmospheric background rate depends strongly on zenith angle we have another way to discriminate against atmospheric events as opposed to astrophysical sources. For example, in the case of the GC we saw that the only 4 events nearby were indeed from very close to time when the GC crossed the SK horizon, and hence likely to be background. So, I cooked up a "penalty function", once again to stick into the Max Lik distribution which tests each point on the sky. It seems that the best way to accomplish this may be by making tables from the MC of hour-angle dependence of the background (rather than zenith angle), since the exposure for point sources will be (nearly, except for detector projected area versus zenith angle) flat in hour angle. We will need such tables for each declination.

This is illustrated in the following figure, where the top panel shows the real data and the bottom the MC data, box size indicating number of events as a function of declination and hour angle. Naturally, one sees lots of events near the left hand boundary, which is near the horizon. So, the probability distibution for a source at a given middling declination is largest near its smallest hour angle, and falls as the test source moves South (and down) relative to the detector. This lever arm will hae little effect, as one can see in the figure, for declinations in the far South (beyond about -60 degrees declination) and in the North (above around +30 degrees declination)

Calculating the Max Liklihood Function:

Hence, as stated earlier, the procedure is to march around the sky in one degree bins, not worrying about overlaps in trials at all at this stage. Yes, the trials get very close together near the South pole, but no matter. I make tables of the events as a function of distance from each test point, out to some prescribed maximum anlge, taken as 5 degrees at present (there are essentially no point source neutrinos expected beyond this angle for a hard source, and most are within two degrees). The I calculate the W = -log(lik) for each of the four tests (angle, energy, aximuth and hour angle), and vary the trial signal. I find the signal for minimum W, and the signal for having gone some value beyond min W (which I take as 2 now, but think this is not right, since the W distribution does not appear to be a nice chisquare). The latter then will give us some estimate of the upper limit signal at, say, 90% CL.

In the following figure one may see the four W = -log(Lik) for each of the angle test (Poisson in 5 angle bins), shower energy (event-by-event), azimuth (Rayleigh power of all events inside 5 degrees around test point), and the hour angle (zenith angle) test. This is for no trial signal, so these are the raw probability distributions.

However, the zenith angle distribution was initially much wider than the others, so I arbitrarily divided the Wz values by 6 to bring this distribution into rough agreement with the others. (Do not worry about the offsets between distributions, since we are working with the minima in the sum of the four, varying the test number of events). Perhaps I have a problem with the coding of the hour agnle test, but so far cannot find such. In any case I do not understand why the W distribution is wider for it than the other three.

Note also that the Wa, azimuthal distribution, which we would expect to be a nice exponential for a Rayleigh distribution, in not. They all look somewhat Gaussian rather than exponential. Maybe this is the oversampling, particularly near the pole, but I do not yet understand it.

In the following figure we see the distributions of differences between the initial (no signal) W values and minimum W (most likely signal) for all tests (W0-Wmin), angle (Poisson, Wpm-Wp0), shower energy (Wem-We0), and hour angle (zenith, Wzm-Wz0). There is no difference for the azimuthal distribution, which does not change with putative signal. So, we see that the dominant contribution comes from the point spread function (Wpm-Wp0, Pois). There is some contribution from shower energy, but none from the hour angle test. In fact the hour angle test might be negative in contribution if anything. This makes me think there is something wrong with the coding, but as stated I have not been able, despite looking fairly hard, to find a bug.

All told however, the final result in the top left panel, is not so bad. Aside from an expected peak at zero (where negative signals were not allowed), the distribution falls with a decent exponential, and slope of -0.535. Yes, there does seem still to be oone region giving some farily unlikely trials... the region near RA=188, DEC=-11.

The big question is how unlikely is it overall. At the moment I despair of being able to get the W distribution in absolute terms so we can read the probability right from the figure. But, one my hazzard a guess by extrapolation from the straight line ewxponential region and see that the point may be unlikely at the level of seeing one where 0.1 were expected. I think probably that we will ahve to resort to the blunt instrument approach of making many fake sky maps and testing for the probability of seeing such a point(s) in this manner.

In the following we see a plot of those test points with W0-Wmin greater than 10. The galactic plane is superposed as well. One sees the most interesting point nearly in the center and some other high points nearby. I cannot help but note the intriguing cluster of points near the famous Cygnus region (300, +40)... where we have had many hints over the years but nothing which is definitive. Without the overlay one would certianly not pick out the galactic plane from the unlikely points alone.

Summary of Max Liklihood Point Source Search:

I have described four tests applied to each trial direction on the sky, seeking point-source excesses of neutrino events. I think the assumptions are fairly robust and defensible, mainly being that any point source is likely to have a spectrum rather more flat than the primary cosmic rays. These four tests (scatter in direction from the peak and around it, energy, and zenith angle) seem to be fairly straight forward in principle, but need careful realization. At present I would only claim a first attempt, not a finished result by any means. By applying these tests with the Max Lik at each point and solving for the amount of test signal which minimizes the -logLik, and then backing away from this by a fixed (chisquare equivalent) increment, we get best values and errors or upper limits on point source fluxes. As described, these are directly convertable into neutrino fluxes (via the crossing point of the flat spectrum source with atmospheric flux). And we also can get an assessment of overall how unlikely are our highest peaks from making many fake maps and testing those, as yet to be done.

9.0 References

To be added.....